Skip to content

Commit

Permalink
Merge c0d5a45 into 750a8b3
Browse files Browse the repository at this point in the history
  • Loading branch information
dpark01 committed Jul 5, 2018
2 parents 750a8b3 + c0d5a45 commit 8ad9b06
Show file tree
Hide file tree
Showing 11 changed files with 98 additions and 212 deletions.
3 changes: 2 additions & 1 deletion docs/index.rst
Expand Up @@ -17,5 +17,6 @@ Contents
description
install
cmdline
pipeuse
pipes-wdl
pipes-snakemake
development
38 changes: 32 additions & 6 deletions docs/install.rst
Expand Up @@ -8,7 +8,10 @@ Cloud compute implementations
Docker Images
~~~~~~~~~~~~~

To facilitate cloud compute deployments, we publish a complete Docker image with associated dependencies to the Docker registry at `quay.io <https://quay.io/repository/broadinstitute/viral-ngs>`_. Simply ``docker pull quay.io/broadinstitute/viral-ngs`` for the latest stable version.
To facilitate cloud compute deployments, we publish a complete Docker
image with associated dependencies to the Docker registry at `quay.io
<https://quay.io/repository/broadinstitute/viral-ngs>`_. Simply ``docker
pull quay.io/broadinstitute/viral-ngs`` for the latest stable version.


DNAnexus
Expand All @@ -22,22 +25,45 @@ for the cloud analysis pipeline are available at
https://github.com/dnanexus/viral-ngs/wiki


Google Cloud Platform: deploy to GCE VM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The docker image referenced above can be directly `deployed to a Google
Compute Engine VM on startup
<https://cloud.google.com/compute/docs/containers/deploying-containers>`_.
The main things you will need to do are:

* Make sure to allocate a larger-than-default root disk for the VM.
Google's Container Optimized OS defaults to a very small disk which
is not large enough to unpack our Docker image. Increase to at least 20GB
(or more if you want to localize data).

* When setting up the VM for launch, make sure you open the "Advanced
container options" hidden options and select "Allocate a buffer for
STDIN" and "Allocate a pseudo-TTY" before launching. Otherwise you won’t
be able to ssh into them!

* Sometimes you will need to invoke "bash" manually upon login to get the
correct environment.


Google Cloud Platform: dsub
~~~~~~~~~~~~~~~~~~~~~~~~~~~

All of the command line functions in viral-ngs are accessible from the docker image_ and can be invoked directly using dsub_.
All of the command line functions in viral-ngs are accessible from the
docker image_ and can be invoked directly using dsub_.

.. _dsub: https://cloud.google.com/genomics/v1alpha2/dsub
.. _image: https://quay.io/repository/broadinstitute/viral-ngs

Here is an example invocation of ``illumina.py illumina_demux`` (replace the project with your GCP project, and the input, output-recursive, and logging parameters with URIs within your GCS buckets)::

dsub --project broad-sabeti-lab --zones "us-east1-*" \
dsub --project gcid-viral-seq --zones "us-central1-*" \
--image quay.io/broadinstitute/viral-ngs \
--name illumina_demux-test \
--logging gs://sabeti-temp-30d/dpark/test-demux/logs \
--input FC_TGZ=gs://sabeti-sequencing/flowcells/broad-walkup/160907_M04004_0066_000000000-AJH8U.tar.gz \
--output-recursive OUTDIR=gs://sabeti-temp-30d/dpark/test-demux \
--logging gs://viral-temp-30d/dpark/test-demux/logs \
--input FC_TGZ=gs://viral-sequencing/flowcells/broad-walkup/160907_M04004_0066_000000000-AJH8U.tar.gz \
--output-recursive OUTDIR=gs://viral-temp-30d/dpark/test-demux \
--command 'illumina.py illumina_demux ${FC_TGZ} 1 ${OUTDIR}' \
--min-ram 30 \
--min-cores 8 \
Expand Down
44 changes: 1 addition & 43 deletions docs/pipeuse.rst → docs/pipes-snakemake.rst
Expand Up @@ -13,7 +13,7 @@ Here is an overview of the Snakemake rule graph:
.. image:: rulegraph.png

Installation instructions
-------------------------------------------
-------------------------

It is recommended to install the viral-ngs conda package from the ``broad-viral`` channel, as detailed in the installation section of this documentation.

Expand Down Expand Up @@ -242,48 +242,6 @@ Running the pipeline directly
After the above setup is complete, run the pipeline directly by calling
``snakemake`` within the analysis directory.

Running the pipeline on GridEngine (UGER)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Within ``config.yaml``, set the "project" to one that exists on the
cluster system.

Inside the analysis directory, run the job submission command. Ex.:

::

use UGER
qsub -cwd -b y -q long -l m_mem_free=4G ./bin/pipes/Broad_UGER/run-pipe.sh

To kill all jobs that exited (qstat status "Eqw") with an error:

::

qdel $(qstat | grep Eqw | awk '{print $1}')

Running the pipeline on LSF
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Inside the analysis directory, run the job submission command. Ex.:

::

bsub -o log/run.out -q forest ./bin/pipes/Broad_LSF/run-pipe.sh

If you notice jobs hanging in the **PEND** state, an upstream job may
have failed. Before killing such jobs, verify that the jobs are pending
due to their dependency:

::

bjobs -al | grep -A 1 "PENDING REASONS" | grep -v "PENDING REASONS" | grep -v '^--$'

To kill all **PEND**\ ing jobs:

::

bkill `bjobs | grep PEND | awk '{print $1}'` > /dev/null

When things go wrong
~~~~~~~~~~~~~~~~~~~~

Expand Down
37 changes: 37 additions & 0 deletions docs/pipes-wdl.rst
@@ -0,0 +1,37 @@
Using the WDL pipelines
=======================

Rather than chaining together viral-ngs pipeline steps as series of tool
commands called in isolation, it is possible to execute them as a
complete automated pipeline, from processing raw sequencer output to
creating files suitable for GenBank submission. This utilizes the Workflow
Description Language, which is documented at:
https://github.com/openwdl/wdl

**This documentation is not yet complete**


Executing WDL workflows locally with Cromwell
---------------------------------------------

See example here: https://github.com/broadinstitute/viral-ngs/blob/master/travis/tests-cromwell.sh


Executing WDL workflows on Google Cloud Platform with Cromwell
--------------------------------------------------------------

This should help: https://github.com/broadinstitute/viral-ngs/blob/master/pipes/WDL/cromwell.gcid-viral-seq.conf


Executing WDL workflows on FireCloud
------------------------------------

More info later


Executing WDL workflows on DNAnexus
-----------------------------------

This is the primary mode of execution for many of our collaborators and
lab members. You can obtain the latest versions here:
https://platform.dnanexus.com/projects/F8PQ6380xf5bK0Qk0YPjB17P/data/
5 changes: 5 additions & 0 deletions pipes/Broad_LSF/README
@@ -0,0 +1,5 @@
This folder contains scripts related to running the viral-ngs pipeline
on the Broad's old LSF scheduler system. It is no longer supported as a
production analysis environment (as it no longer exists at the Broad),
but the original scripts are retained here for reference for anyone
wishing to adapt the Snakemake pipelines to an LSF scheduler.
6 changes: 5 additions & 1 deletion pipes/Broad_UGER/README
@@ -1 +1,5 @@
This folder contains scripts related to running the viral-ngs pipeline on the Broad's GridEngine system.
This folder contains scripts related to running the viral-ngs pipeline
on the Broad's Univa GridEngine system. It is no longer supported as
a production analysis environment for viral-ngs, but the original scripts
are retained here for reference for anyone wishing to adapt the Snakemake
pipelines to a GridEngine-like environment.
Expand Up @@ -10,8 +10,8 @@ backend {
JES {
actor-factory = "cromwell.backend.impl.jes.JesBackendLifecycleActorFactory"
config {
project = "broad-sabeti-lab"
root = "gs://sabeti-temp-30d/USERNAME/cromwell-test"
project = "gcid-viral-seq"
root = "gs://viral-temp-30d/USERNAME/cromwell-test"
genomics-api-queries-per-100-seconds = 1000

genomics {
Expand Down Expand Up @@ -59,7 +59,7 @@ backend {
disks: "local-disk 375 LOCAL, /mnt/tmp 375 LOCAL"
noAddress: false
preemptible: 1
zones: ["us-east1-b", "us-east1-c", "us-east1-d", "us-central1-a", "us-central1-b", "us-central1-c"]
zones: [ "us-central1-a", "us-central1-b", "us-central1-c", "us-east1-b", "us-east1-c", "us-east1-d" ]
}
}
}
Expand Down
26 changes: 13 additions & 13 deletions pipes/config.yaml
Expand Up @@ -310,16 +310,16 @@ genbank:

# |----------------- Cluster execution parameters----------------------------

# The project name passed to the cluster scheduler (currently unused)
project: "viral_ngs"

# Broad-specific LSF cluster scheduler parameters
LSF_queues:
short: "-W 4:00"
long: "-q forest"
bigmem: "-q flower"

# Broad-specific UGER cluster scheduler parameters
UGER_queues:
short: "-l h_rt=04:00:00"
long: "-l h_rt=36:00:00"
## The project name passed to the cluster scheduler (currently unused)
#project: "viral_ngs"

## Broad-specific LSF cluster scheduler parameters
#LSF_queues:
# short: "-W 4:00"
# long: "-q forest"
# bigmem: "-q flower"

## Broad-specific UGER cluster scheduler parameters
#UGER_queues:
# short: "-l h_rt=04:00:00"
# long: "-l h_rt=36:00:00"
119 changes: 0 additions & 119 deletions pipes/ref_assisted/Snakefile

This file was deleted.

14 changes: 0 additions & 14 deletions pipes/ref_assisted/config.yaml

This file was deleted.

12 changes: 0 additions & 12 deletions pipes/ref_assisted/ref_assisted

This file was deleted.

0 comments on commit 8ad9b06

Please sign in to comment.