From 316292af9257cb043b377553bc2436530136eb76 Mon Sep 17 00:00:00 2001 From: Gavin Ha Date: Tue, 7 Aug 2018 13:05:44 -0400 Subject: [PATCH] update snakemake instructions --- README.md | 45 ++++++++++++++++++++++++++++++++------------- 1 file changed, 32 insertions(+), 13 deletions(-) diff --git a/README.md b/README.md index 86a1882..31a00ac 100644 --- a/README.md +++ b/README.md @@ -51,27 +51,46 @@ pairings: ## snakefiles -1. `moleculeCoverage.snakefile` -2. `getPhasedAlleleCounts.snakefile` -3. `TitanCNA.snakefile` +1. [moleculeCoverage.snakefile](moleculeCoverage.snakefile) +2. [getPhasedAlleleCounts.snakefile](getPhasedAlleleCounts.snakefile) +3. [TitanCNA.snakefile](TitanCNA.snakefile) -Invoking the full snakemake workflow for TITAN +# Run the analysis + +## 1. Invoking the full snakemake workflow for TITAN +This will also run both [moleculeCoverage.snakefile](moleculeCoverage.snakefile) and [getPhasedAlleleCounts.snakefile](getPhasedAlleleCounts.snakefile) which generate the necessary inputs for [TitanCNA.snakefile](TitanCNA.snakefile). ``` # show commands and workflow snakemake -s TitanCNA.snakefile -np # run the workflow locally using 5 cores snakemake -s TitanCNA.snakefile --cores 5 -# run the workflow on qsub using a maximum of 50 jobs. Broad UGER cluster parameters can be set directly in config/cluster.sh. -snakemake -s TitanCNA.snakefile --cluster-sync "qsub -l h_vmem={params.mem},h_rt={params.runtime}" -j 50 --jobscript config/cluster.sh ``` -This will also run both `moleculeCoverage.snakefile` and `getPhasedAlleleCounts.snakefile` which generate the necessary inputs for `TitanCNA.snakfile`. - -`moleculeCoverage.snakefile` and `getPhasedAlleleCounts.snakefile` can also be invoked separately. If only one but not both results are needed, then you can invoke the snakefiles independently. +Users can use launch the jobs on a cluster. +An implementation that works with Broad UGER (qsub) is provided. Parameters for memory, runtime, and parallel environment can be specified directly in the snakemake files; default values for each rule has already been set in `params` within the [config.yaml](config/config.yaml) and the command below can be used as-is. Other cluster parameters can be set directly in [cluster.sh](config/cluster.sh). +Note: users will need to adjust these for use with their cluster-specific settings +``` +snakemake -s TitanCNA.snakefile --cluster-sync "qsub -l h_vmem={params.mem},h_rt={params.runtime} {params.pe}" -j 50 --jobscript config/cluster.sh ``` -snakemake -s moleculeCoverage.snakefile --cores 5 -# OR -snakemake -s getPhasedAlleleCounts.snakefile --cores 5 -``` + + +## 2. Invoking individual steps in the workflow +Users can run the snakemake files individually. This can be helpful for testing each step or if you only wish to generate results for a particular step. The snakefiles need to be run in this same order since input files are generated by the previous steps. + # a. [moleculeCoverage.snakefile](moleculeCoverage.snakefile) + This part of the workflow + ``` + snakemake -s moleculeCoverage.snakefile -np + snakemake -s moleculeCoverage.snakefile --cores 5 + # OR + snakemake -s moleculeCoverage.snakefile --cluster-sync "qsub -l h_vmem={params.mem},h_rt={params.runtime} {params.pe}" -j 50 --jobscript config/cluster.sh + ``` + + # b. [getPhasedAlleleCounts.snakefile](getPhasedAlleleCounts.snakefile) + ``` + snakemake -s getPhasedAlleleCounts.snakefile -np + snakemake -s getPhasedAlleleCounts.snakefile --cores 5 + # OR + snakemake -s getPhasedAlleleCounts.snakefile --cluster-sync "qsub -l h_vmem={params.mem},h_rt={params.runtime} {params.pe}" -j 50 --jobscript config/cluster.sh + ```