Skip to content

Commit

Permalink
README
Browse files Browse the repository at this point in the history
  • Loading branch information
Hoohm committed Feb 5, 2017
1 parent 54f406c commit efb80a4
Show file tree
Hide file tree
Showing 3 changed files with 93 additions and 4 deletions.
83 changes: 83 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
This small pipeline allows you to run the basic steps to align and extract expression from a drop-seq experiment.

Before using it you will need some programs:

1. [Snakemake](https://snakemake.readthedocs.io/en/latest/) (based on python)
2. [R](https://cran.r-project.org/)
3. [STAR aligner](https://github.com/alexdobin/STAR)
4. [Drop-seq tools (1.12)](http://mccarrolllab.com/dropseq/)
5. [Picard tools](https://broadinstitute.github.io/picard/)
6. [jsonlite R package](https://cran.r-project.org/web/packages/jsonlite/index.html)

Before running the pipeline you will also need a reference genome as well as the GTF (needed for the ReFlat) and the ReFlat. This is not explained here but you can get the info [here](http://mccarrolllab.com/dropseq/).

In order to run the pipeline you will need to write two config json files.
One in the root folder of the pipeline which will contain the paths to your executables as well as the Drop-Seq tools.
```
{
"TMPDIR":"/path/to/temp",
"PICARD":"/path/to/picard/dist/picard.jar",
"DROPSEQ":"/path/to/Drop-seq_tools-1.12",
"STAREXEC":"/path/to/STAR/bin/Linux_x86_64/STAR"
}
```

I had some issues because I had not enough space on / so I added a temp folder to fix that. Note: If you have the same problem, you have to manually edit all the *.sh files in the drop-seq tools to use this TMPDIR variable.
* TMPDIR is the temp folder on the disk with enough space
* PICARD is the path to the picard.jar
* DROPSEQ is the path to the folder of Drop-Seq tools
* STAREXEC is the path to the STAR executable

The other json file should be in the folder containing all your fastq files and should look like that.
```
{
"Samples": {
"Sample1":"N701",
"Sample2":"N702",
"Sample3":"N703"
},
"Primers":{
"N701":"TCGCCTTA",
"N702":"CTAGTACG",
"N703":"TTCTGCCT",
"N704":"GCTCAGGA"
},
"Barcodes":200,
"GENOMEREF": "/path/to/reference.fa",
"REFFLAT": "/path/to/reference.refFlat",
"METAREF": "/path/to/STAR_REF"
}
```

Samples contains a list of the names of your samples. In this example the samples in the folder should look like:
* Sample1_R1.fastq.gz
* Sample1_R2.fastq.gz
* Sample2_R1.fastq.gz
* Sample2_R2.fastq.gz
* Sample3_R1.fastq.gz
* Sample3_R2.fastq.gz

Primers are the common primer used in [the nextera kit](http://seq.liai.org/204-2/).
* Barcodes should be double the amount of cells you expect from your experiment.
* GENOMEREF is the reference fasta of your genome.
* REFLAT is the reference refFlat file needed the pipeline. You can check how to create it in the [Drop-Seq alignement cookbook](http://mccarrolllab.com/dropseq/).
* METAREF is the folder of the STAR index

Once everything is in place, you can run the pipeline using the following command:

`python3 dropseq.py /path/to/your/samples/`


This will create necessary folders in the sample folder and run the pipeline until the knee plot.
Note: The reason why I run the script in three parts is because of the way STAR handles the loading of the reference genome.
The main idea is that I want to load the reference once, process all the samples and then unload the reference.

Future implementations:
* Automated Species plot or Barnyard plot
* Cluster version (One of the reasons it's based on snakemake)


I hope it can help you out in your drop-seq experiment!

Feel free to comment and point out potential improvements.
11 changes: 8 additions & 3 deletions Rscripts/knee_plot.R
Original file line number Diff line number Diff line change
@@ -1,10 +1,15 @@
library(jsonlite)
args = commandArgs(TRUE)

# You can tweak here the fraction of reads you expect
# when a cell is captured. I've found that this value often
# corresponds to a good minimum to validate a STAMP
fraction = 0.001
numCells = 100

path= args[1]
#You can tweak the multiplier of barcodes you need here
# to change the value on the X axis
xlim = 2.5 * fromJSON(paste0(path,'/config.json'))$Barcodes
path = args[1]

samples = names(fromJSON(paste0(path,'/config.json'))$Samples)

Expand All @@ -23,7 +28,7 @@ plotCumulativePlot = function(file_path, title, fraction, x_scale){

for(i in 1:length(samples)){
pdf(file = paste0(path,"plots/",samples[i],"_knee_plot.pdf"), width = 5, height = 5)
temp = plotCumulativePlot(file_path = paste0(path,samples[i],"_hist_out_cell.txt"), title = paste0(samples[i],"\nMinCellFraction = ", fraction), fraction = fraction, x_scale = 500)
temp = plotCumulativePlot(file_path = paste0(path,samples[i],"_hist_out_cell.txt"), title = paste0(samples[i],"\nMinCellFraction = ", fraction), fraction = fraction, x_scale = xlim)
dev.off()
write.table(temp, file = paste0(path,"summary/",samples[i],"_barcodes.csv"),col.names = F, quote = F, row.names = F)
}
3 changes: 2 additions & 1 deletion dropseq.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
try:
sys.argv[1]
except:
print('You need to provid a folder on which to run on. Exiting')
print('You need to provide a folder on which to run on. Exiting')
exit()
wrkdir = sys.argv[1]
configfile = os.path.join(wrkdir, 'config.json')
Expand All @@ -18,6 +18,7 @@
joined = os.path.join(wrkdir, folder)
if(not os.path.isdir(joined)):
os.mkdir(joined)

#Optimized Run for dropseq
first = 'snakemake -s snakefiles/Dropseq_pre_align.snake --cores 6 -pT -d {} --configfile local.json'.format(sys.argv[1])
second = 'snakemake -s snakefiles/Star_align.snake --cores 6 -pT -d {} --configfile local.json'.format(sys.argv[1])
Expand Down

0 comments on commit efb80a4

Please sign in to comment.