Chore: Updating docs for WES pipeline options. (#18)

OpenOmics · Feb 2, 2024 · 2f94263 · 2f94263
1 parent 6988c86
commit 2f94263
Show file tree

Hide file tree

Showing 3 changed files with 28 additions and 9 deletions.
diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 
   <h1>genome-seek 🔬</h1>
 
-  **_Whole Genome Clinical Sequencing Pipeline._**
+  **_Whole Genome and Exome Clinical Sequencing Pipeline._**
 
   [![tests](https://github.com/OpenOmics/genome-seek/workflows/tests/badge.svg)](https://github.com/OpenOmics/genome-seek/actions/workflows/main.yaml) [![docs](https://github.com/OpenOmics/genome-seek/workflows/docs/badge.svg)](https://github.com/OpenOmics/genome-seek/actions/workflows/docs.yml) [![GitHub issues](https://img.shields.io/github/issues/OpenOmics/genome-seek?color=brightgreen)](https://github.com/OpenOmics/genome-seek/issues)  [![GitHub license](https://img.shields.io/github/license/OpenOmics/genome-seek)](https://github.com/OpenOmics/genome-seek/blob/main/LICENSE) 
 
@@ -20,7 +20,7 @@ The **`./genome-seek`** pipeline is composed of several interrelated sub-command
  * [<code>genome-seek <b>unlock</b></code>](https://openomics.github.io/genome-seek/usage/unlock/): Unlocks a previous runs output directory.
  * [<code>genome-seek <b>cache</b></code>](https://openomics.github.io/genome-seek/usage/cache/): Cache software containers locally.
 
-**genome-seek** is a comprehensive clinical WGS pipeline that is focused on speed. Each tool in the pipeline was benchmarked and selected due to its low run times without sacrificing accuracy or precision. It relies on technologies like [Singularity<sup>1</sup>](https://singularity.lbl.gov/) to maintain the highest level of reproducibility. The pipeline consists of a series of data processing and quality-control steps orchestrated by [Snakemake<sup>2</sup>](https://snakemake.readthedocs.io/en/stable/), a flexible and scalable workflow management system, to submit jobs to a cluster.
+**genome-seek** is a comprehensive clinical WGS and WES pipeline that is focused on speed. Each tool in the pipeline was benchmarked and selected due to its low run times without sacrificing accuracy or precision. It relies on technologies like [Singularity<sup>1</sup>](https://singularity.lbl.gov/) to maintain the highest level of reproducibility. The pipeline consists of a series of data processing and quality-control steps orchestrated by [Snakemake<sup>2</sup>](https://snakemake.readthedocs.io/en/stable/), a flexible and scalable workflow management system, to submit jobs to a cluster.
 
 The pipeline is compatible with data generated from Illumina short-read sequencing technologies. As input, it accepts a set of FastQ files and can be run locally on a compute instance or on-premise using a cluster (recommended). A user can define the method or mode of execution. The pipeline can submit jobs to a cluster using a job scheduler like SLURM (more coming soon!). A hybrid approach ensures the pipeline is accessible to all users.
 
@@ -53,4 +53,4 @@ This site is a living document, created for and by members like you. genome-seek
 
 ## References
 <sup>**1.**  Kurtzer GM, Sochat V, Bauer MW (2017). Singularity: Scientific containers for mobility of compute. PLoS ONE 12(5): e0177459.</sup>  
-<sup>**2.**  Koster, J. and S. Rahmann (2018). "Snakemake-a scalable bioinformatics workflow engine." Bioinformatics 34(20): 3600.</sup>  
+<sup>**2.**  Koster, J. and S. Rahmann (2018). "Snakemake-a scalable bioinformatics workflow engine." Bioinformatics 34(20): 3600.</sup>  
diff --git a/docs/index.md b/docs/index.md
@@ -2,7 +2,7 @@
 
   <h1 style="font-size: 250%">genome-seek 🔬</h1>
 
-  <b><i>Whole Genome Clinical Sequencing Pipeline</i></b><br> 
+  <b><i>Whole Genome and Exome Clinical Sequencing Pipeline</i></b><br> 
   <a href="https://github.com/OpenOmics/genome-seek/actions/workflows/main.yaml">
     <img alt="tests" src="https://github.com/OpenOmics/genome-seek/workflows/tests/badge.svg">
   </a>
@@ -24,7 +24,7 @@
 
 
 ## Overview
-Welcome to genome-seek's documentation! This guide is the main source of documentation for users who are getting started with the OpenOmics [whole genome sequencing pipeline](https://github.com/OpenOmics/genome-seek/). 
+Welcome to genome-seek's documentation! This guide is the main source of documentation for users who are getting started with the OpenOmics [genome-seek pipeline](https://github.com/OpenOmics/genome-seek/). 
 
 The **`./genome-seek`** pipeline is composed of several interrelated sub-commands to set up and run the pipeline across different systems. Each of the available sub-commands performs different functions: 
 
@@ -58,7 +58,7 @@ The **`./genome-seek`** pipeline is composed of several interrelated sub-command
 
 </section>
 
-**genome-seek** is a comprehensive clinical WGS pipeline that is focused on speed. Each tool in the pipeline was benchmarked and selected due to its low run times without sacrificing accuracy or precision. It relies on technologies like [Singularity<sup>1</sup>](https://singularity.lbl.gov/) to maintain the highest level of reproducibility. The pipeline consists of a series of data processing and quality-control steps orchestrated by [Snakemake<sup>2</sup>](https://snakemake.readthedocs.io/en/stable/), a flexible and scalable workflow management system, to submit jobs to a cluster.
+**genome-seek** is a comprehensive clinical WGS and WES pipeline that is focused on speed. Each tool in the pipeline was benchmarked and selected due to its low run times without sacrificing accuracy or precision. It relies on technologies like [Singularity<sup>1</sup>](https://singularity.lbl.gov/) to maintain the highest level of reproducibility. The pipeline consists of a series of data processing and quality-control steps orchestrated by [Snakemake<sup>2</sup>](https://snakemake.readthedocs.io/en/stable/), a flexible and scalable workflow management system, to submit jobs to a cluster.
 
 The pipeline is compatible with data generated from Illumina short-read sequencing technologies. As input, it accepts a set of FastQ files and can be run locally on a compute instance or on-premise using a cluster (recommended). A user can define the method or mode of execution. The pipeline can submit jobs to a cluster using a job scheduler like SLURM (more coming soon!). A hybrid approach ensures the pipeline is accessible to all users.
 

diff --git a/docs/usage/run.md b/docs/usage/run.md
@@ -11,9 +11,10 @@ Setting up the genome-seek pipeline is fast and easy! In its most basic form, <c
 ```text
 $ genome-seek run [--help] \
       [--mode {slurm,local}] [--job-name JOB_NAME] [--batch-id BATCH_ID] \
-      [--call-cnv] [--call-sv] [--call-hla] [--skip-qc] [--open-cravat] \
-      [--oc-annotators OC_ANNOTATORS] [--oc-modules OC_MODULES] \
-      [--tmp-dir TMP_DIR] [--silent] [--sif-cache SIF_CACHE] \ 
+      [--call-cnv] [--call-sv] [--call-hla] [--call-somatic] [--open-cravat] \
+      [--skip-qc] [--oc-annotators OC_ANNOTATORS] [--oc-modules OC_MODULES] \
+      [--pairs PAIRS] [--pon PANEL_OF_NORMALS] [--wes-mode] [--wes-bed WES_BED] \
+      [--tmp-dir TMP_DIR] [--silent] [--sif-cache SIF_CACHE] \
       [--singularity-cache SINGULARITY_CACHE] \
       [--dry-run] [--threads THREADS] \
       --input INPUT [INPUT ...] \
@@ -96,6 +97,23 @@ Each of the following arguments are optional, and do not need to be provided.
 >
 > ***Example:*** `--skip-qc`
 
+---  
+  `--wes-mode`            
+> **Runs the whole exome pipeline.**  
+> *type: boolean flag*
+> 
+> By default, the whole genome sequencing (WGS) pipeline is run. This option allows a user to process and analyze whole exome sequencing data. Please note when this mode is enabled, a sub-set of the WGS rules will run. Please see the option below for more information about providing a custom exome targets/capture-kit BED file.
+>
+> ***Example:*** `--wes-mode`
+
+---  
+  `--wes-bed WES_BED`            
+> **Path to exome targets/capture-kit BED file.**  
+> *type: BED file*
+>
+> This file can be obtained from the manufacturer of the target capture kit that was used. By default, a set of BED files generated from GENCODE's exon annotation for protein coding gene's exon is used. Please note: This BED file should contain at least 6 columns.
+>
+> ***Example:*** `--wes-bed Agilent_SS_AllExons_V7_Regions.bed`
 
 ---  
   `--batch-id BATCH_ID`            
@@ -107,6 +125,7 @@ Each of the following arguments are optional, and do not need to be provided.
 >
 > ***Example:*** `--batch-id WGS_2022-04-19`
 
+
 ### 2.3 Anotation options
 
 Each of the following arguments are optional, and do not need to be provided.