Decode-seq

Description

Decode-seq is an easy and effective RNA-seq approach using molecular barcoding to enable profiling of a large number of replicates simultaneously. This approach significantly improves the performance of differential gene expression analysis.

The experimental protocol of Decode-seq can be found in decode-seq_protocol.pdf.

The computational analysis pipeline of Decode-seq contains 3 python scripts: decode_barcode.py, decode_gene.py, decode_quant.py. We need to run these three scripts in sequential order on the input fastq files and the output is the count based quantification matrix, which is ready for downstream analysis by edgeR or DEseq2.

decode_barcode.py: take Read1 as the input, generate a barcode table (read name, USI, and UMI).
decode_gene.py: take Read2 as the input, generate a gene table (read name, transcript).
decode_quant.py: take two tables as the input, generate the count matrix (transcript X sample).

Dependencies

Python 3
STAR

Usage

decode_barcode.py -i|--input <input> -u|--usi <usifile> [-q|--qscutoff <int>] [-b|--boundary GGG] [-c|--countonly]

decode_gene.py -f|--fastq <input> -d|--outdir <outdir> [-x|--STARIDX <STARIDX>] [-g|--gtf gtf] [-t|--thread threads] [-b|--bam bam]

decode_quant.py -g|--genetable <gene_table> -b|--barcodetable <barcode_table> -u|--usi <usifile>

Arguments

decode_barcode.py

-i,--input Read1 fastq file
-u,--usi A plain text file caontaining 2 columns: USI name and 6bp sequence.
-q,--qscutoff Minimum sequencing quality score of 1-6bp of Read1 (the USI position). Default 20
-b,--boundary 3-6 base between UMI sequence and cDNA sequence, usually three Guanines. Default GGG
-c,--countonly Only output the barcode filter summary in the standard output. Default output barcode sequence, cDNA sequence and barcode filter summary to standard output

decode_gene.py

Note: take either '-f-d[-x-g-t]' (run star then process bam) or '-b' (process bam directly) as the input
-f, --fastq Read2 fastq file
-d, --outdir Run star directory
-x, --STARIDX Reference genome STAR index
-g, --gtf <file.gtf> Reference genome annotation
-t, --thread Number of threads to use for STAR mapping. Default 1.
-b, --bam <file.bam> STAR output bam file, default file name is Aligned.toTranscriptome.out.bam. This bam output requires the parameter --quantMode TranscriptomeSAM GeneCounts when running STAR.

decode_quant.py

-g, --genetable <gene_table> output of decode_barcode.py
-b, --barcodetable <barcode_table> output of decode_gene.py
-u, --usi A plain text file containing 2 columns: USI name and 6bp sequence.

Examples

decode_barcode.py  -i sample_R1.fq -u usi.txt > sample.barcode.tab

decode_gene.py     -f sample_R2.fq -d star_output_dir -t 30 -x star_index -g annotation.gtf > sample.gene.tab

decode_quant.py    -b sample.barcode.tab -g sample.gene.tab -u usi.txt > sample.quant.tab

Citation

Li Y*, Yang H, Zhang H, Liu Y, Shang H, Zhao H, Zhang T, Tu Q#. 2020. Decode-seq: a practical approach to improve differential gene expression analysis. Genome Biol 21: 66. PMID: 32200760 DOI: 10.1186/s13059-020-01966-9

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
LICENSE		LICENSE
README.md		README.md
decode-seq_protocol.pdf		decode-seq_protocol.pdf
decode_barcode.py		decode_barcode.py
decode_gene.py		decode_gene.py
decode_quant.py		decode_quant.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Decode-seq

Description

Dependencies

Usage

Arguments

Examples

Citation

About

Releases

Packages

Contributors 2

Languages

License

QTuLab/Decode-seq

Folders and files

Latest commit

History

Repository files navigation

Decode-seq

Description

Dependencies

Usage

Arguments

Examples

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages