Preprocessing for RNA Seq
Pages 113
- Home
- Affymetrix
- affymetrix_expression_normalization_with_apt
- Agilent
- Association Analysis
- Association Analysis doc
- Babelomics version
- Babelomics web structure
- Burden test
- Cancer
- CDF
- Changes in this version
- Class comparison. Worked examples and exercises
- Class prediction
- Class prediction. Worked examples and exercises
- Clustering
- Clustering. Worked examples and exercises
- Cross hybridization
- data matrix expression
- Data types
- Define your comparison
- Detailed example of analysis of expression data in Babelomics: from raw data to expression differential and functional profiling
- Differential Expression for arrays
- Differential Expression for RNA Seq
- Dye bias
- Edit
- Edit your data
- example data
- Expression
- Expression array pipeline
- FAQ
- Functional
- Functional Gene Set Network Enrichment
- Functional GO Enrichment
- GAL
- Gene Set Enrichment
- Gene Set Network Enrichment (Network Miner)
- Gene vs annotation
- Genepix
- Genomics
- Genomics doc
- How to cite babelomics
- Id
- Logging in
- Main areas. Cancer
- Main areas. Expression
- Main areas. Functional
- Main areas. Genomics
- Main areas. Processing
- Main areas: Cancer
- Main areas: Expression
- Main areas: Functional
- Main areas: Genomics
- Main areas: Processing
- Network Enrichment (SNOW)
- Other biological data
- Overview and pipelines
- p values adjusted for multiple testing
- PED
- PED_MAP zipped
- Pipelines
- plink.assoc
- plink.assoc.linear
- plink.assoc.logistic
- plink.fisher
- plink.hh
- plink.log
- plink.tdt
- Preprocessing for data matrix
- Preprocessing for microarrays
- Preprocessing for RNA Seq
- Processing
- Ranked
- Requirements
- RNA Seq Normalization
- RNA Seq pipeline
- SDK (Software Development Kit)
- Single Enrichment
- Single Enrichment. Options
- SNPs array pipeline
- Software and databases used
- Technical Info
- The Babelomics Team
- tut_SNP_association
- Tutorial
- Tutorial Affymetrix Expression Microarray Normalization
- Tutorial Agilent One Color Microarray Normalization
- Tutorial Agilent Two Colors Microarray Normalization
- Tutorial Burden test
- Tutorial Class prediction
- Tutorial Clustering
- Tutorial Data matrix preprocessing
- Tutorial Differential Expression for arrays
- Tutorial Differential Expression for RNA Seq
- Tutorial Expression
- Tutorial Expression. Class comparison
- Tutorial Expression. Correlation
- Tutorial Expression. Survival
- Tutorial Functional
- Tutorial Genepix One Color Microarray Normalization
- Tutorial Genepix Two Colors Microarray Normalization
- Tutorial Genomics
- Tutorial OncodriveClust
- Tutorial OncodriveFM
- Tutorial Processing
- Tutorial SNP Association Analysis
- Tutorial SNP stratification
- Upload your data
- VCF 4.0
- VCF file pipeline
- Visualization tools
- Worked examples
- Workflow
- Show 98 more pages…
General
Tutorial
Analysis tools
Worked examples
-
Expression
-
Functional
Clone this wiki locally
Biases
Depending on the biases present in our data, a normalization method or other should be applied. Babelomics allows us to correct three different kinds of biases:
-
Library depth bias: The number of counts of the genes is proportional to the library length. Deeper libraries give rise to genes with more counts. For samples with the same library depth there is no such bias.
-
Gene length bias: The number of counts of the genes is proportional to the gene length. Habitually, longer genes accumulate a greater number of transcripts.
-
RNA composition: This bias occurs when some of the genes are hugely expressed in some samples but not so much in others. Since the total number of counts is the same for every sample, the genes equally expressed in every sample will not have a similar number of counts.
Normalization methods
Babelomics' normalization methods are:
-
Reads-Per-Kilobase-per-Million (RPKM) (Mortazavi et al. 2008): Gene counts are divided by the gene length and by the total number of mapped reads in millions. This normalization corrects the library depth bias and the gene length bias. However, it is not recommended for differential expression.
-
Trimmed Means of M values (TMM) (Robinson et al. 2010): A correction factor of the depth library is computed for each gene, in order to correct the RNA composition bias . Although this method does not usually correct the gene length bias, the implementation of package NOISeq with this option is used to correct also this kind of bias.
Babelomics also allows us to run automatically the normalization method which is best fitted for our particular data.
Automatic optimal method
Babelomics allows us to compute automatically the normalization method which best fits our particular data. To determine the optimal method the following procedure is applied.
-
If the gene length information of each of the genes is not available, TMM method is recommended.
-
If the gene length information is available, the diagnostic test for differences in RNA composition from the NOISeq package (Tarazona et al., 2011) is applied. For data passing the test, RPKM method is recommended. For data failing the test, TMM method with gene length correction is recommended.
References
-
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008 Jul;5(7):621-8. doi: 10.1038/nmeth.1226. Epub 2008 May 30. PubMed PMID: 18516045.
-
Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11(3):R25. doi: 10.1186/gb-2010-11-3-r25. Epub 2010 Mar 2. PubMed PMID: 20196867; PubMed Central PMCID: PMC2864565.
-
Tarazona S, Garcia-Alcalde F, Dopazo J, Ferrer A and Conesa A (2011). “Differential expression in RNA-seq: a matter of depth.” Genome research, 21(12), pp. 4436.
Find the Babelomics suite at http://babelomics.org