Skip to content

illumina sequencing data analysis pipelines

Notifications You must be signed in to change notification settings

NYU-Molecular-Pathology/sns

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Seq-N-Slide: illumina sequencing data analysis pipelines

Usage overview

Navigate to a clean new project directory. This is where all the results will end up.

cd <project dir>

Download the code from GitHub, which will create the sns sub-directory with all the code.

git clone --depth 1 https://github.com/igordot/sns

Scan a directory that contains FASTQ files to be used as input. This can be run multiple times if there are FASTQs in different directories.

sns/gather-fastqs <fastq dir>

All found files will be added to the samples.fastq-raw.csv file. It can be modified to change sample names, remove samples, or manually add samples. The first column is the sample name, the second column is the R1 FASTQ, and the third column is the R2 FASTQ (if available). Each line contains a single FASTQ (or FASTQ pair for paired-end experiments). If one sample has multiple FASTQs, each one will be on a different line. Multiple FASTQs for the same sample will be merged based on sample name.

Specify a reference genome (only hg19/mm10/dm3/dm6 are currently guaranteed to work).

sns/generate-settings <genome>

Run the analysis using a specific route (a set of analysis steps).

sns/run <route>

Check for potential problems.

grep "ERROR:" logs-qsub/*

There should be no matches. If there are any results, there was a problem. Check the specific log files where the errors are found for more info.

Routes

Routes are different analysis workflows. Generic routes are sample-centric (same analysis is performed for each sample). Available routes:

  • rna-star: RNA-seq using STAR. Generates BAMs, normalized bigWigs, counts matrix, and various QC metrics.
  • rna-rsem: RNA-seq using RSEM. Generates FPKM/TPM/counts matrix and various QC metrics.
  • rna-snv: RNA-seq variant detection. Generates BAMs, VCFs, and various QC metrics.
  • wgbs: WGBS methylation analysis.
  • rrbs: RRBS methylation analysis.
  • wes: Whole genome/exome/targeted variant detection. Generates BAMs, VCFs, and various QC metrics.
  • atac: ATAC-seq. Generates BAMs, bigWigs, peaks, nucleosome positions, and various QC metrics.
  • species: Species/metagenomics/contamination analysis.

There are additional routes for comparing groups of samples after individual samples are processed with a generic route. They depend on the output of the generic routes and must be run from the same directory. Before running, manually add proper group names or pairs to the samples.groups.csv or samples.pairs.csv files (depending on the comparison type). Available comparison routes:

Output

  • Directories for different output types (such as BAMs or bigWigs) containing files for each sample.
  • summary-combined.*.csv: Combined segment summaries table that provides a comprehensive overview of the project.
  • logs-* directories: Most stdout/stderr output will be placed here. The information can be used for tracking progress and troubleshooting.

Each route has a description with more specific details.

About

SNS is designed to work on NYULMC HPC cluster using the Sun Grid Engine job scheduler. It may require significant modifications to work in other environments.

SNS consists of multiple routes (or workflows). Each route contains multiple segments (or steps).

If there is a problem with any of the results, delete the broken files and re-run SNS. It will generate any missing output. Similarly, you can add additional samples and only the new ones will be processed when the route is re-run.

Most output and sample sheets are in a CSV format for macOS Quick Look (spacebar file preview) compatibility.

There is a copy of the code in each project directory for reproducibility. If you modify the code, the changes will not affect other projects. If you repeat the analysis with more samples in the future, same code will be used.

FAQs

Coming soon.

About

illumina sequencing data analysis pipelines

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 83.9%
  • Perl 8.1%
  • R 8.0%