Skip to content

Commit

Permalink
v0.21
Browse files Browse the repository at this point in the history
  • Loading branch information
Hoohm committed Mar 16, 2017
1 parent 5c1083d commit af18656
Show file tree
Hide file tree
Showing 6 changed files with 74 additions and 30 deletions.
29 changes: 29 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Change Log
All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](http://keepachangelog.com/)
and this project adheres to [Semantic Versioning](http://semver.org/).


## [0.21]
### Added
- Changelog file to track changes
- --rerun option to force a rerun
- Multiple steps allowd now

## [0.2] - 2017-03-14
### Changed
- The pipeline is now a python package being called as an executable
- Went from json to yaml for config files

### Added
- setup.py and dependencies
- Species plot available

### Removed
- primer handling, went to default: AAGCAGTGGTATCAACGCAGAGTAC


## [0.1] - 2017-02-13
### First release
- Allows for preprocessing, alignement with STAR, post align processing until knee-plot
40 changes: 23 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,15 @@
NEWS
NEWS -- Version 0.21
--------------
A.
As it was getting a bit complicated to run everything in only one "mode", I adapted the pipeline. Now there are 4 different steps.
All the steps involve some feedback from you before continuing. the knee-plot allows you to decide where you want the inflection point and the species plot helps deciding which STAMPs are mixed and which are pure.

1. pre-process: Go from sample_R1.fastq.gz to the final.bam file containing the aligned sorted data.
2. knee-plot: Make the knee plot. Needs step 1.
3. species-plot: Make the species plot. Needs step 1 and 2. Won't run if you have only one species. (specific for mixing experiments)
4. extract-expression: Extract the expression data. Needs step 1 and 2 (3 for mixing experiment)
Rerun option. Allows to force a rerun of a step. This means you don't have to delete files anymore to rerun a particular step, just add `--rerun`

B.
I also added a setup.py in order to make the install a bit easier. You can now install the pipeline using the classic: `python3 setup.py install`

C.
I switched to yaml for config files. It will probably be easier to read in the long run.
Possbility to run multiple steps at once in the --mode option:`-m pre-process knee-plot extract-expression`

D.
Primers are no longer asked for. For the time being it is hard coded in the post_align.snake file.
Description
------------------
This pipeline is based on [snakemake](https://snakemake.readthedocs.io/en/stable/) and the dropseq tools provided by the [McCarroll Lab](http://mccarrolllab.com/dropseq/). It allows to handle raw data from your dropseq experiment until the count of UMI counts.

E.
Dependencies added. Snakemake and pyyaml

Installation
--------
Expand All @@ -33,7 +23,12 @@ Before using it you will need to install some softwares/packages:
4. [Picard tools](https://broadinstitute.github.io/picard/)
5. [yaml R package](https://cran.r-project.org/web/packages/yaml/index.html)

Once you have everything just run: `python3 setup.py install`
Once you have everything just run:
```
git clone https://github.com/Hoohm/Drop-seq
cd Drop-seq
python3 setup.py install
```

Summary
-------
Expand Down Expand Up @@ -107,6 +102,17 @@ Once everything is in place, you can run the pipeline using the following comman

`python3 dropSeqPip -f /path/to/your/samples/ -c /path/to/local/config/file.yaml -m mode`

You can choose from four different modes to run:

1. pre-process: Go from sample_R1.fastq.gz to the final.bam file containing the aligned sorted data.
2. knee-plot: Make the knee plot. Needs step 1.
3. species-plot: Make the species plot. Needs step 1 and 2. Won't run if you have only one species. (specific for mixing experiments)
4. extract-expression: Extract the expression data. Needs step 1 and 2 (3 for mixing experiment)

If you don't need to change values in the config files for the different steps, you can also simply run multiple modes at a time. ie:

`python3 dropSeqPip -f /path/to/your/samples/ -c /path/to/local/config/file.yaml -m pre-process knee-plot extract-expression`

This is the folder structure you get in the end:
```
/path/to/your/samples/
Expand Down
2 changes: 1 addition & 1 deletion dropSeqPip/Snakefiles/extract_expression.snake
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,4 @@ rule gunzip:
input: 'summary/{sample}_{species}_expression_matrix.txt.gz'
output: 'summary/{sample}_{species}_expression_matrix.txt'
shell:
"""gunzip {input}"""
"""gunzip -q {input}"""
2 changes: 1 addition & 1 deletion dropSeqPip/Snakefiles/extract_expression_single.snake
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,4 @@ rule gunzip:
input: 'summary/{sample}_expression_matrix.txt.gz'
output: 'summary/{sample}_expression_matrix.txt'
shell:
"""gunzip {input}"""
"""gunzip -q {input}"""
29 changes: 19 additions & 10 deletions dropSeqPip/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,12 @@ def get_args():
help='Which mode to run.',
choices=['pre-process', 'knee-plot', 'species-plot', 'extract-expression'],
action='store',
nargs='+',
required=True)
parser.add_argument('--rerun',
action='store_true',
help='forces a rerun of the selected modes',
default=False)
args = parser.parse_args()
return args

Expand All @@ -42,46 +47,50 @@ def main():
joined = os.path.join(args.folder_path, folder)
if(not os.path.isdir(joined)):
os.mkdir(joined)
if(args.rerun):
rerun = '--forceall'
else:
rerun = ''
#Load config files
with open(args.config_file_path) as config_yaml:
yaml_data = yaml.load(config_yaml)
with open(os.path.join(args.folder_path, 'config.yaml')) as samples_config:
samples_yaml = yaml.load(samples_config)
#Select step and run
step_list = []
if(args.mode == "pre-process"):
if("pre-process" in args.mode):
print("Mode is {}.".format(args.mode))
pre_align = ('snakemake -s {}/Snakefiles/pre_align.snake --cores {} -pT -d {} --configfile {}'.format(scripts_dir, yaml_data['CORES'], args.folder_path, args.config_file_path),'Pre-processing before alignement')
star_align = ('snakemake -s {}/Snakefiles/star_align.snake --cores {} -pT -d {} --configfile {}'.format(scripts_dir, yaml_data['CORES'], args.folder_path, args.config_file_path),'Star alignement')
post_align = ('snakemake -s {}/Snakefiles/post_align.snake --cores {} -pT -d {} --configfile {}'.format(scripts_dir, yaml_data['CORES'], args.folder_path, args.config_file_path),'Post alignement')
pre_align = ('snakemake -s {}/Snakefiles/pre_align.snake --cores {} -pT -d {} --configfile {} {}'.format(scripts_dir, yaml_data['CORES'], args.folder_path, args.config_file_path,rerun),'Pre-processing before alignement')
star_align = ('snakemake -s {}/Snakefiles/star_align.snake --cores {} -pT -d {} --configfile {} {}'.format(scripts_dir, yaml_data['CORES'], args.folder_path, args.config_file_path, rerun),'Star alignement')
post_align = ('snakemake -s {}/Snakefiles/post_align.snake --cores {} -pT -d {} --configfile {} {}'.format(scripts_dir, yaml_data['CORES'], args.folder_path, args.config_file_path, rerun),'Post alignement')

print('Running pre-processing')
subprocess.call(pre_align, shell=True)
print('Running Alignement')
subprocess.call(star_align, shell=True)
print('Running post-alignement')
subprocess.call(post_align, shell=True)
if(args.mode == "knee-plot"):
if("knee-plot" in args.mode):
knee_plot = 'Rscript {}/Rscripts/knee_plot.R {}'.format(package_dir, args.folder_path)
print('Plotting knee plots')
subprocess.call(knee_plot, shell=True)
if(args.mode == "species-plot"):
if("species-plot" in args.mode):
if(len(samples_yaml['SPECIES']) == 2):
extract_species = ('snakemake -s {}/Snakefiles/extract_species.snake --cores {} -pT -d {} --configfile {}'.format(scripts_dir, yaml_data['CORES'], args.folder_path, args.config_file_path), 'Extracting species')
extract_species = ('snakemake -s {}/Snakefiles/extract_species.snake --cores {} -pT -d {} --configfile {} {}'.format(scripts_dir, yaml_data['CORES'], args.folder_path, args.config_file_path, rerun), 'Extracting species')
print('Extracting species')
subprocess.call(extract_species, shell=True)
species_plot = 'Rscript {}/Rscripts/species_plot.R {}'.format(package_dir, args.folder_path)
print('Plotting species plots')
subprocess.call(species_plot, shell=True)
else:
print('You cannot run this with a number of species different than 2.\nPlease change the config file')
if(args.mode == "extract-expression"):
if("extract-expression" in args.mode):
if(len(samples_yaml['SPECIES']) == 2):
extract_expression = ('snakemake -s {}/Snakefiles/extract_expression.snake --cores {} -pT -d {} --configfile {}'.format(scripts_dir, yaml_data['CORES'], args.folder_path, args.config_file_path), 'Extracting species')
extract_expression = ('snakemake -s {}/Snakefiles/extract_expression.snake --cores {} -pT -d {} --configfile {} {}'.format(scripts_dir, yaml_data['CORES'], args.folder_path, args.config_file_path, rerun), 'Extracting species')
print('Extracting expression')
subprocess.call(extract_expression, shell=True)
if(len(samples_yaml['SPECIES']) == 1):
extract_expression_single = ('snakemake -s {}/Snakefiles/extract_expression_single.snake --cores {} -pT -d {} --configfile {}'.format(scripts_dir, yaml_data['CORES'], args.folder_path, args.config_file_path), 'Extracting species')
extract_expression_single = ('snakemake -s {}/Snakefiles/extract_expression_single.snake --cores {} -pT -d {} --configfile {} {}'.format(scripts_dir, yaml_data['CORES'], args.folder_path, args.config_file_path, rerun), 'Extracting species')
print('Extracting expression')
subprocess.call(extract_expression_single, shell=True)
print('Pipeline finished')
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from setuptools import find_packages

setup(name='dropSeqPip',
version='0.2',
version='0.21',
description='A drop-seq pipeline',
url='http://github.com/hoohm/Drop-seq',
author='Roelli Patrick',
Expand Down

0 comments on commit af18656

Please sign in to comment.