v0.21

Hoohm · Mar 16, 2017 · af18656 · af18656
1 parent 5c1083d
commit af18656
Show file tree

Hide file tree

Showing 6 changed files with 74 additions and 30 deletions.
diff --git a/CHANGELOG b/CHANGELOG
@@ -0,0 +1,29 @@
+# Change Log
+All notable changes to this project will be documented in this file.
+
+The format is based on [Keep a Changelog](http://keepachangelog.com/)
+and this project adheres to [Semantic Versioning](http://semver.org/).
+
+
+## [0.21]
+### Added
+- Changelog file to track changes
+- --rerun option to force a rerun
+- Multiple steps allowd now
+
+## [0.2] - 2017-03-14
+### Changed
+- The pipeline is now a python package being called as an executable
+- Went from json to yaml for config files
+
+### Added
+- setup.py and dependencies
+- Species plot available
+
+### Removed
+- primer handling, went to default: AAGCAGTGGTATCAACGCAGAGTAC
+
+
+## [0.1] - 2017-02-13
+### First release
+- Allows for preprocessing, alignement with STAR, post align processing until knee-plot
diff --git a/README.md b/README.md
@@ -1,25 +1,15 @@
-NEWS
+NEWS -- Version 0.21
 --------------
 A.
-As it was getting a bit complicated to run everything in only one "mode", I adapted the pipeline. Now there are 4 different steps.
-All the steps involve some feedback from you before continuing. the knee-plot allows you to decide where you want the inflection point and the species plot helps deciding which STAMPs are mixed and which are pure.
-
-1. pre-process: Go from sample_R1.fastq.gz to the final.bam file containing the aligned sorted data.
-2. knee-plot: Make the knee plot. Needs step 1.
-3. species-plot: Make the species plot. Needs step 1 and 2. Won't run if you have only one species. (specific for mixing experiments)
-4. extract-expression: Extract the expression data. Needs step 1 and 2 (3 for mixing experiment)
+Rerun option. Allows to force a rerun of a step. This means you don't have to delete files anymore to rerun a particular step, just add `--rerun`
 
 B.
-I also added a setup.py in order to make the install a bit easier. You can now install the pipeline using the classic: `python3 setup.py install`
-
-C.
-I switched to yaml for config files. It will probably be easier to read in the long run.
+Possbility to run multiple steps at once in the --mode option:`-m pre-process knee-plot extract-expression`
 
-D.
-Primers are no longer asked for. For the time being it is hard coded in the post_align.snake file.
+Description
+------------------
+This pipeline is based on [snakemake](https://snakemake.readthedocs.io/en/stable/) and the dropseq tools provided by the [McCarroll Lab](http://mccarrolllab.com/dropseq/). It allows to handle raw data from your dropseq experiment until the count of UMI counts.
 
-E.
-Dependencies added. Snakemake and pyyaml
 
 Installation
 --------
@@ -33,7 +23,12 @@ Before using it you will need to install some softwares/packages:
 4. [Picard tools](https://broadinstitute.github.io/picard/)
 5. [yaml R package](https://cran.r-project.org/web/packages/yaml/index.html)
 
-Once you have everything just run: `python3 setup.py install`
+Once you have everything just run:
+```
+git clone https://github.com/Hoohm/Drop-seq
+cd Drop-seq
+python3 setup.py install
+```
 
 Summary
 -------
@@ -107,6 +102,17 @@ Once everything is in place, you can run the pipeline using the following comman
 
 `python3 dropSeqPip -f /path/to/your/samples/ -c /path/to/local/config/file.yaml -m mode`
 
+You can choose from four different modes to run:
+
+1. pre-process: Go from sample_R1.fastq.gz to the final.bam file containing the aligned sorted data.
+2. knee-plot: Make the knee plot. Needs step 1.
+3. species-plot: Make the species plot. Needs step 1 and 2. Won't run if you have only one species. (specific for mixing experiments)
+4. extract-expression: Extract the expression data. Needs step 1 and 2 (3 for mixing experiment)
+
+If you don't need to change values in the config files for the different steps, you can also simply run multiple modes at a time. ie:
+
+`python3 dropSeqPip -f /path/to/your/samples/ -c /path/to/local/config/file.yaml -m pre-process knee-plot extract-expression`
+
 This is the folder structure you get in the end:
 ```
 /path/to/your/samples/

diff --git a/dropSeqPip/Snakefiles/extract_expression.snake b/dropSeqPip/Snakefiles/extract_expression.snake
@@ -26,4 +26,4 @@ rule gunzip:
 	input: 'summary/{sample}_{species}_expression_matrix.txt.gz'
 	output: 'summary/{sample}_{species}_expression_matrix.txt'
 	shell:
-		"""gunzip {input}"""
+		"""gunzip -q {input}"""
diff --git a/dropSeqPip/Snakefiles/extract_expression_single.snake b/dropSeqPip/Snakefiles/extract_expression_single.snake
@@ -24,4 +24,4 @@ rule gunzip:
 	input: 'summary/{sample}_expression_matrix.txt.gz'
 	output: 'summary/{sample}_expression_matrix.txt'
 	shell:
-		"""gunzip {input}"""
+		"""gunzip -q {input}"""
diff --git a/dropSeqPip/__main__.py b/dropSeqPip/__main__.py
@@ -26,7 +26,12 @@ def get_args():
                         help='Which mode to run.',
                         choices=['pre-process', 'knee-plot', 'species-plot', 'extract-expression'],
                         action='store',
+                        nargs='+',
                         required=True)
+    parser.add_argument('--rerun',
+                        action='store_true',
+                        help='forces a rerun of the selected modes',
+                        default=False)
     args = parser.parse_args()
     return args
 
@@ -42,46 +47,50 @@ def main():
         joined = os.path.join(args.folder_path, folder)
         if(not os.path.isdir(joined)):
             os.mkdir(joined)
+    if(args.rerun):
+        rerun = '--forceall'
+    else:
+        rerun = ''
     #Load config files
     with open(args.config_file_path) as config_yaml:
         yaml_data = yaml.load(config_yaml)
     with open(os.path.join(args.folder_path, 'config.yaml')) as samples_config:
         samples_yaml = yaml.load(samples_config)
     #Select step and run
     step_list = []
-    if(args.mode == "pre-process"):
+    if("pre-process" in args.mode):
         print("Mode is {}.".format(args.mode))
-        pre_align = ('snakemake -s {}/Snakefiles/pre_align.snake --cores {} -pT -d {} --configfile {}'.format(scripts_dir, yaml_data['CORES'], args.folder_path, args.config_file_path),'Pre-processing before alignement')
-        star_align = ('snakemake -s {}/Snakefiles/star_align.snake --cores {} -pT -d {} --configfile {}'.format(scripts_dir, yaml_data['CORES'], args.folder_path, args.config_file_path),'Star alignement')
-        post_align = ('snakemake -s {}/Snakefiles/post_align.snake --cores {} -pT -d {} --configfile {}'.format(scripts_dir, yaml_data['CORES'], args.folder_path, args.config_file_path),'Post alignement')
+        pre_align = ('snakemake -s {}/Snakefiles/pre_align.snake --cores {} -pT -d {} --configfile {} {}'.format(scripts_dir, yaml_data['CORES'], args.folder_path, args.config_file_path,rerun),'Pre-processing before alignement')
+        star_align = ('snakemake -s {}/Snakefiles/star_align.snake --cores {} -pT -d {} --configfile {} {}'.format(scripts_dir, yaml_data['CORES'], args.folder_path, args.config_file_path, rerun),'Star alignement')
+        post_align = ('snakemake -s {}/Snakefiles/post_align.snake --cores {} -pT -d {} --configfile {} {}'.format(scripts_dir, yaml_data['CORES'], args.folder_path, args.config_file_path, rerun),'Post alignement')
 
         print('Running pre-processing')
         subprocess.call(pre_align, shell=True)
         print('Running Alignement')
         subprocess.call(star_align, shell=True)
         print('Running post-alignement')
         subprocess.call(post_align, shell=True)
-    if(args.mode == "knee-plot"):
+    if("knee-plot" in args.mode):
         knee_plot = 'Rscript {}/Rscripts/knee_plot.R {}'.format(package_dir, args.folder_path)
         print('Plotting knee plots')
         subprocess.call(knee_plot, shell=True)
-    if(args.mode == "species-plot"):
+    if("species-plot" in args.mode):
         if(len(samples_yaml['SPECIES']) == 2):
-            extract_species = ('snakemake -s {}/Snakefiles/extract_species.snake --cores {} -pT -d {} --configfile {}'.format(scripts_dir, yaml_data['CORES'], args.folder_path, args.config_file_path), 'Extracting species')
+            extract_species = ('snakemake -s {}/Snakefiles/extract_species.snake --cores {} -pT -d {} --configfile {} {}'.format(scripts_dir, yaml_data['CORES'], args.folder_path, args.config_file_path, rerun), 'Extracting species')
             print('Extracting species')
             subprocess.call(extract_species, shell=True)
             species_plot = 'Rscript {}/Rscripts/species_plot.R {}'.format(package_dir, args.folder_path)
             print('Plotting species plots')
             subprocess.call(species_plot, shell=True)
         else:
             print('You cannot run this with a number of species different than 2.\nPlease change the config file')
-    if(args.mode == "extract-expression"):
+    if("extract-expression" in args.mode):
         if(len(samples_yaml['SPECIES']) == 2):
-            extract_expression = ('snakemake -s {}/Snakefiles/extract_expression.snake --cores {} -pT -d {} --configfile {}'.format(scripts_dir, yaml_data['CORES'], args.folder_path, args.config_file_path), 'Extracting species')  
+            extract_expression = ('snakemake -s {}/Snakefiles/extract_expression.snake --cores {} -pT -d {} --configfile {} {}'.format(scripts_dir, yaml_data['CORES'], args.folder_path, args.config_file_path, rerun), 'Extracting species')  
             print('Extracting expression')
             subprocess.call(extract_expression, shell=True)
         if(len(samples_yaml['SPECIES']) == 1):
-            extract_expression_single = ('snakemake -s {}/Snakefiles/extract_expression_single.snake --cores {} -pT -d {} --configfile {}'.format(scripts_dir, yaml_data['CORES'], args.folder_path, args.config_file_path), 'Extracting species')  
+            extract_expression_single = ('snakemake -s {}/Snakefiles/extract_expression_single.snake --cores {} -pT -d {} --configfile {} {}'.format(scripts_dir, yaml_data['CORES'], args.folder_path, args.config_file_path, rerun), 'Extracting species')  
             print('Extracting expression')
             subprocess.call(extract_expression_single, shell=True)
     print('Pipeline finished')

diff --git a/setup.py b/setup.py
@@ -3,7 +3,7 @@
 from setuptools import find_packages
 
 setup(name='dropSeqPip',
-      version='0.2',
+      version='0.21',
       description='A drop-seq pipeline',
       url='http://github.com/hoohm/Drop-seq',
       author='Roelli Patrick',