Merge c0d5a45 into 750a8b3

broadinstitute · Jul 5, 2018 · 8ad9b06 · 8ad9b06
2 parents 750a8b3 + c0d5a45
commit 8ad9b06
Show file tree

Hide file tree

Showing 11 changed files with 98 additions and 212 deletions.
diff --git a/docs/index.rst b/docs/index.rst
@@ -17,5 +17,6 @@ Contents
    description
    install
    cmdline
-   pipeuse
+   pipes-wdl
+   pipes-snakemake
    development
diff --git a/docs/install.rst b/docs/install.rst
@@ -8,7 +8,10 @@ Cloud compute implementations
 Docker Images
 ~~~~~~~~~~~~~
 
-To facilitate cloud compute deployments, we publish a complete Docker image with associated dependencies to the Docker registry at `quay.io <https://quay.io/repository/broadinstitute/viral-ngs>`_. Simply ``docker pull quay.io/broadinstitute/viral-ngs`` for the latest stable version.
+To facilitate cloud compute deployments, we publish a complete Docker 
+image with associated dependencies to the Docker registry at `quay.io 
+<https://quay.io/repository/broadinstitute/viral-ngs>`_. Simply ``docker 
+pull quay.io/broadinstitute/viral-ngs`` for the latest stable version.
 
 
 DNAnexus
@@ -22,22 +25,45 @@ for the cloud analysis pipeline are available at
 https://github.com/dnanexus/viral-ngs/wiki
 
 
+Google Cloud Platform: deploy to GCE VM
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The docker image referenced above can be directly `deployed to a Google 
+Compute Engine VM on startup 
+<https://cloud.google.com/compute/docs/containers/deploying-containers>`_.
+The main things you will need to do are:
+
+* Make sure to allocate a larger-than-default root disk for the VM. 
+Google's Container Optimized OS defaults to a very small disk which
+is not large enough to unpack our Docker image. Increase to at least 20GB
+(or more if you want to localize data).
+
+* When setting up the VM for launch, make sure you open the "Advanced 
+container options" hidden options and select "Allocate a buffer for 
+STDIN" and "Allocate a pseudo-TTY" before launching. Otherwise you won’t 
+be able to ssh into them!
+
+* Sometimes you will need to invoke "bash" manually upon login to get the
+correct environment.
+
+
 Google Cloud Platform: dsub
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-All of the command line functions in viral-ngs are accessible from the docker image_ and can be invoked directly using dsub_.
+All of the command line functions in viral-ngs are accessible from the 
+docker image_ and can be invoked directly using dsub_.
 
 .. _dsub: https://cloud.google.com/genomics/v1alpha2/dsub
 .. _image: https://quay.io/repository/broadinstitute/viral-ngs
 
 Here is an example invocation of ``illumina.py illumina_demux`` (replace the project with your GCP project, and the input, output-recursive, and logging parameters with URIs within your GCS buckets)::
 
-  dsub --project broad-sabeti-lab --zones "us-east1-*" \
+  dsub --project gcid-viral-seq --zones "us-central1-*" \
     --image quay.io/broadinstitute/viral-ngs \
     --name illumina_demux-test \
-    --logging gs://sabeti-temp-30d/dpark/test-demux/logs \
-    --input FC_TGZ=gs://sabeti-sequencing/flowcells/broad-walkup/160907_M04004_0066_000000000-AJH8U.tar.gz \
-    --output-recursive OUTDIR=gs://sabeti-temp-30d/dpark/test-demux \
+    --logging gs://viral-temp-30d/dpark/test-demux/logs \
+    --input FC_TGZ=gs://viral-sequencing/flowcells/broad-walkup/160907_M04004_0066_000000000-AJH8U.tar.gz \
+    --output-recursive OUTDIR=gs://viral-temp-30d/dpark/test-demux \
     --command 'illumina.py illumina_demux ${FC_TGZ} 1 ${OUTDIR}' \
     --min-ram 30 \
     --min-cores 8 \

diff --git a/docs/pipeuse.rst → docs/pipes-snakemake.rst b/docs/pipeuse.rst → docs/pipes-snakemake.rst
@@ -13,7 +13,7 @@ Here is an overview of the Snakemake rule graph:
 .. image:: rulegraph.png
 
 Installation instructions
--------------------------------------------
+-------------------------
 
 It is recommended to install the viral-ngs conda package from the ``broad-viral`` channel, as detailed in the installation section of this documentation.
 
@@ -242,48 +242,6 @@ Running the pipeline directly
 After the above setup is complete, run the pipeline directly by calling
 ``snakemake`` within the analysis directory.
 
-Running the pipeline on GridEngine (UGER)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Within ``config.yaml``, set the "project" to one that exists on the
-cluster system.
-
-Inside the analysis directory, run the job submission command. Ex.:
-
-::
-
-    use UGER
-    qsub -cwd -b y -q long -l m_mem_free=4G ./bin/pipes/Broad_UGER/run-pipe.sh
-
-To kill all jobs that exited (qstat status "Eqw") with an error:
-
-::
-
-    qdel $(qstat | grep Eqw | awk '{print $1}')
-
-Running the pipeline on LSF
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Inside the analysis directory, run the job submission command. Ex.:
-
-::
-
-    bsub -o log/run.out -q forest ./bin/pipes/Broad_LSF/run-pipe.sh
-
-If you notice jobs hanging in the **PEND** state, an upstream job may
-have failed. Before killing such jobs, verify that the jobs are pending
-due to their dependency:
-
-::
-
-    bjobs -al | grep -A 1 "PENDING REASONS" | grep -v "PENDING REASONS" | grep -v '^--$'
-
-To kill all **PEND**\ ing jobs:
-
-::
-
-    bkill `bjobs | grep PEND | awk '{print $1}'` > /dev/null
-
 When things go wrong
 ~~~~~~~~~~~~~~~~~~~~
 

diff --git a/docs/pipes-wdl.rst b/docs/pipes-wdl.rst
@@ -0,0 +1,37 @@
+Using the WDL pipelines
+=======================
+
+Rather than chaining together viral-ngs pipeline steps as series of tool
+commands called in isolation, it is possible to execute them as a
+complete automated pipeline, from processing raw sequencer output to
+creating files suitable for GenBank submission. This utilizes the Workflow
+Description Language, which is documented at:
+https://github.com/openwdl/wdl
+
+**This documentation is not yet complete**
+
+
+Executing WDL workflows locally with Cromwell
+---------------------------------------------
+
+See example here: https://github.com/broadinstitute/viral-ngs/blob/master/travis/tests-cromwell.sh
+
+
+Executing WDL workflows on Google Cloud Platform with Cromwell
+--------------------------------------------------------------
+
+This should help: https://github.com/broadinstitute/viral-ngs/blob/master/pipes/WDL/cromwell.gcid-viral-seq.conf
+
+
+Executing WDL workflows on FireCloud
+------------------------------------
+
+More info later
+
+
+Executing WDL workflows on DNAnexus
+-----------------------------------
+
+This is the primary mode of execution for many of our collaborators and
+lab members. You can obtain the latest versions here:
+https://platform.dnanexus.com/projects/F8PQ6380xf5bK0Qk0YPjB17P/data/
diff --git a/pipes/Broad_LSF/README b/pipes/Broad_LSF/README
@@ -0,0 +1,5 @@
+This folder contains scripts related to running the viral-ngs pipeline 
+on the Broad's old LSF scheduler system. It is no longer supported as a 
+production analysis environment (as it no longer exists at the Broad), 
+but the original scripts are retained here for reference for anyone 
+wishing to adapt the Snakemake pipelines to an LSF scheduler. 
diff --git a/pipes/Broad_UGER/README b/pipes/Broad_UGER/README
@@ -1 +1,5 @@
-This folder contains scripts related to running the viral-ngs pipeline on the Broad's GridEngine system.
+This folder contains scripts related to running the viral-ngs pipeline 
+on the Broad's Univa GridEngine system. It is no longer supported as
+a production analysis environment for viral-ngs, but the original scripts
+are retained here for reference for anyone wishing to adapt the Snakemake
+pipelines to a GridEngine-like environment.
diff --git a/pipes/WDL/cromwell.broad-sabeti-lab.conf → pipes/WDL/cromwell.gcid-viral-seq.conf b/pipes/WDL/cromwell.broad-sabeti-lab.conf → pipes/WDL/cromwell.gcid-viral-seq.conf
@@ -10,8 +10,8 @@ backend {
     JES {
       actor-factory = "cromwell.backend.impl.jes.JesBackendLifecycleActorFactory"
       config {
-        project = "broad-sabeti-lab"
-        root = "gs://sabeti-temp-30d/USERNAME/cromwell-test"
+        project = "gcid-viral-seq"
+        root = "gs://viral-temp-30d/USERNAME/cromwell-test"
         genomics-api-queries-per-100-seconds = 1000
 
         genomics {
@@ -59,7 +59,7 @@ backend {
           disks: "local-disk 375 LOCAL, /mnt/tmp 375 LOCAL"
           noAddress: false
           preemptible: 1
-          zones: ["us-east1-b", "us-east1-c", "us-east1-d", "us-central1-a", "us-central1-b", "us-central1-c"]
+          zones: [ "us-central1-a", "us-central1-b", "us-central1-c", "us-east1-b", "us-east1-c", "us-east1-d" ]
         }
       }
     }

diff --git a/pipes/config.yaml b/pipes/config.yaml
@@ -310,16 +310,16 @@ genbank:
 
 #    |----------------- Cluster execution parameters----------------------------
 
-# The project name passed to the cluster scheduler (currently unused)
-project: "viral_ngs"
-
-# Broad-specific LSF cluster scheduler parameters
-LSF_queues:
-  short: "-W 4:00"
-  long: "-q forest"
-  bigmem: "-q flower"
-
-# Broad-specific UGER cluster scheduler parameters
-UGER_queues:
-  short: "-l h_rt=04:00:00"
-  long: "-l h_rt=36:00:00"
+## The project name passed to the cluster scheduler (currently unused)
+#project: "viral_ngs"
+
+## Broad-specific LSF cluster scheduler parameters
+#LSF_queues:
+#  short: "-W 4:00"
+#  long: "-q forest"
+#  bigmem: "-q flower"
+
+## Broad-specific UGER cluster scheduler parameters
+#UGER_queues:
+#  short: "-l h_rt=04:00:00"
+#  long: "-l h_rt=36:00:00"
diff --git a/pipes/ref_assisted/Snakefile b/pipes/ref_assisted/Snakefile
diff --git a/pipes/ref_assisted/config.yaml b/pipes/ref_assisted/config.yaml
diff --git a/pipes/ref_assisted/ref_assisted b/pipes/ref_assisted/ref_assisted