This Drop-seq pipeline is designed to process data from fastq files to a digital expression matrix (dge). The pipeline is based on Snakemake and is currently designed to run on the uchicago rcc cluster midway2. Below is a description of how to set up the project folders and to start the analysis. Each experiment differs and the pipeline might need to be adjusted to accommodate such individual differences.
Below are steps that are common to all experiments and outlines of analysis that are commonly used.
The environment needs to be created only once. It will be activated when running the dropseq pipeline.
module load Anaconda3
conda env create --file .../dropseq_pipeline/environment.yaml
To update the environment, you can run the following command:
conda env update --file .../dropseq_pipeline/environment.yaml
Edit the config_hg38.yaml file so that Snakemake can find the right files
mkdir your_project
cd your_project
mkdir data
cd data/
mkdir fastq
cd fastq/
#only include the fastq files included in a single run, both read 1 and read2
cp path/to/fastq/*fastq.gz .
cd ../../
This command will run the Submit_snakemake.sh and pass the location of your project directory.
.../dropseq_pipeline/snakemake.batch "--config proj_dir=/project2/PI/CNETID/Path/to/your/dir/"