GitHub - aselewa/dropseq_pipeline: A Snakemake pipeline for processing Drop-seq reads into an expression matrix

dropseq_pipeline

This Drop-seq pipeline is designed to process data from fastq files to a digital expression matrix (dge). The pipeline is based on Snakemake and is currently designed to run on the uchicago rcc cluster midway2. Below is a description of how to set up the project folders and to start the analysis. Each experiment differs and the pipeline might need to be adjusted to accommodate such individual differences.

Below are steps that are common to all experiments and outlines of analysis that are commonly used.

Preparing Environment

1. Create a compute environment using conda (This step can be skipped when re-running an analysis)

The environment needs to be created only once. It will be activated when running the dropseq pipeline.

module load Anaconda3

conda env create --file .../dropseq_pipeline/environment.yaml

To update the environment, you can run the following command:

conda env update --file .../dropseq_pipeline/environment.yaml

2. Edit the config_hg38.yaml file

Edit the config_hg38.yaml file so that Snakemake can find the right files

Prepare data:

1. Create a project directory in your directory on midway2

mkdir your_project

2. Create directory fastq in 'your_project' directory and add fastq files

cd your_project

mkdir data
cd data/

mkdir fastq
cd fastq/
#only include the fastq files included in a single run, both read 1 and read2
cp path/to/fastq/*fastq.gz .

cd ../../

Run dropseq pipeline

This command will run the Submit_snakemake.sh and pass the location of your project directory.

.../dropseq_pipeline/snakemake.batch "--config proj_dir=/project2/PI/CNETID/Path/to/your/dir/"

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Scripts		Scripts
.gitignore		.gitignore
Dropseq_pipeline_setup.Rmd		Dropseq_pipeline_setup.Rmd
LICENSE		LICENSE
Preparation_of_annotation_data.md		Preparation_of_annotation_data.md
Readme.md		Readme.md
Rplots.pdf		Rplots.pdf
Snakefile		Snakefile
Submit_snakemake.sh		Submit_snakemake.sh
cluster.json		cluster.json
config_hg38.yaml		config_hg38.yaml
environment.yaml		environment.yaml
prepare_STAR_index.batch		prepare_STAR_index.batch
snakemake.batch		snakemake.batch
update_env.yaml		update_env.yaml

License

aselewa/dropseq_pipeline

Folders and files

Latest commit

History

Repository files navigation

dropseq_pipeline

Preparing Environment

1. Create a compute environment using conda (This step can be skipped when re-running an analysis)

2. Edit the config_hg38.yaml file

Prepare data:

1. Create a project directory in your directory on midway2

2. Create directory fastq in 'your_project' directory and add fastq files

Run dropseq pipeline

About

Resources

License

Stars

Watchers

Forks

Languages