Skip to content

Single Cell RNAseq pipeline readtransforming using umis, alignment, gene-deduplication by umi-tools and counting

License

Notifications You must be signed in to change notification settings

MarinusVL/scRNApipe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scRNApipe

The scRNApipe pipeline was originally designed to preprocess and analyse scRNA-Seq data, following the CEL-Seq2 protocol, on the Illumina platforms. Nevertheless, by data transformation, provided by UMIS package, will allow most of single cell protocols to be run through this pipeline.

Read transformation will combine reads into one containing cell, sample and umi barcode sequences incorporated in the read name + a unique identifier (UID) created by concatenating those three barcodes. This UID will allow UMI-tools to remove PCR duplicates on bamfiles containing multiple cells.

In principle, the raw data will be readtransformed and filtered (by UMIs), aligned, gene-deduplicate (by UMI-tools) and counted.

  • Quality metrics (optional)
  • Preprocessing
  • Aligning
  • Main Analysis
  • Expression Matrix

Main options to tune in this pipeline:

  • The Main Analysis can run in count/dedup per contig or default mode (instead of gene)
  • Skip deduplication

scRNApipe In Details

1. Quality metrics from FastQC

Detailed reports will be generated for each sample by FastQC. An the end a summarised report will be available for an overall review of all samples at once.

Preprocessing the reads

  1. umis fastqtrasnform (read transformation)
  2. cb_filter (filtering reads with non-matching CELLULAR barcodes (CB) | 1 mismatch is allowed)
  3. sb_filter (filtering reads with non-matching SAMPLE barcodes (SB) | 1 mismatch is allowed)
  4. mb_filter (removing reads with ambiguous (e.g N) bases in the UMI barcodes)
  5. add_uid (add the UID and save as fastq.gz)

The read name after preprocessing will include CELL_BARCODE:UMI_BARCODE:SAMPLE_BARCODE:UID_[[samplebarcode][cellbarcode][umi]]

7. Alignment using STAR aligner

Aligning the preprocessed reads against the reference genome by the use of the STAR aligner.

Main Analysis

  1. Counting reads using featureCounts
  2. Adding XF:Z: tag to the BAM file containing the GeneID
  3. Deduplication using UMI-Tools

Generation of Expression Matrix

Generate the Expression Matrix based on the GeneID tags


Installation and Info

If you'd like to work directly from the git repository:

$ git clone https://github.com/MarinusVL ...

Enter repository and run:

$ python setup.py install

Executing

After installation the pipeline can be used:

$ scRNApipe <configuration_file.txt>

Help

For further information about each compartment of the pipeline you can run:

$ scRNApipe --help

Dependencies

scRNApipe is dependent on umis, umi_tools, numpy, pysam, STAR, featureCounts, fastqc and multiqc and Python 2.7

About

Single Cell RNAseq pipeline readtransforming using umis, alignment, gene-deduplication by umi-tools and counting

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages