Navigation Menu

Skip to content

Differential Expression for RNA Seq

mhg-cipf edited this page Jan 21, 2015 · 8 revisions

##Introduction

When performing an experiment with different conditions, scientists are expecting to determine the changes occurring in the cells at the expression level of genes or transcripts. This is achieved by searching for differentially expressed genetic features, that is, features with levels of expression significantly different in the two conditions. In this way, genetic features which are over-expressed or under-expressed in one of the conditions with respect to the other may indicate a dependence on the condition under study.

##Statistical methods

The differential expression tool for RNA-Seq in Babelomics allows us to compare the expression of a genetic feature (gene, transcript, etc.) between two different cases.

The pipeline uses the limma package (Ritchie et al., 2015) and includes the application of the voom function (Law et al., 2014). The limma package applies linear models to microarray and RNA-Seq data to analyze gene expression and differential expression among others.

The input data of this tool must be the matrix of raw counts, that is, the data should not have been previously normalized. However, Babelomics allows us to include a normalization step in the Differential expression tool to normalize our data.

Available normalization methods in Babelomics are:

  • Trimmed Means of M values (TMM) (Robinson et al. 2010): A correction factor of the depth library is computed for each gene, in order to correct the RNA composition bias. The implementation uses the edgeR package (Robinson et al., 2009).

  • Quantiles (Bolstad 2001, Bolstad et al. 2003): The method assumes that there is an underlying common distribution of intensities across all the samples, and forces the quantiles of each sample to have the same value. Therefore, all samples end up having a common distribution.

After the normalization, if any, the voom function prepares the raw count data to be treated with the limma differential expression functions by estimating the mean-variance relationship and computing appropriate weights for each observation. Finally, the limma package performs the differential expression in the usual way and genes differentially expressed according to a given p-value are stored.

References

  • Bolstad B, Probe Level Quantile Normalization of High Density Oligonucleotide Array Data. Unpublished manuscript. 2001. http://bmbolstad.com/stuff/qnorm.pdf

  • Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003 Jan 22;19(2):185-93. PubMed PMID: 12538238.

  • Law CW, Chen Y, Shi W, Smyth GK. voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 2014 Feb 3;15(2):R29.

  • Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010 Jan 1;26(1):139-40. doi: 10.1093/bioinformatics/btp616. Epub 2009 Nov 11. PubMed PMID: 19910308; PubMed Central PMCID: PMC2796818.

  • Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, and Smyth GK. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research. 2015; 43 (accepted 6 January 2015).

Clone this wiki locally