Main areas. Expression
Pages 113
- Home
- Affymetrix
- affymetrix_expression_normalization_with_apt
- Agilent
- Association Analysis
- Association Analysis doc
- Babelomics version
- Babelomics web structure
- Burden test
- Cancer
- CDF
- Changes in this version
- Class comparison. Worked examples and exercises
- Class prediction
- Class prediction. Worked examples and exercises
- Clustering
- Clustering. Worked examples and exercises
- Cross hybridization
- data matrix expression
- Data types
- Define your comparison
- Detailed example of analysis of expression data in Babelomics: from raw data to expression differential and functional profiling
- Differential Expression for arrays
- Differential Expression for RNA Seq
- Dye bias
- Edit
- Edit your data
- example data
- Expression
- Expression array pipeline
- FAQ
- Functional
- Functional Gene Set Network Enrichment
- Functional GO Enrichment
- GAL
- Gene Set Enrichment
- Gene Set Network Enrichment (Network Miner)
- Gene vs annotation
- Genepix
- Genomics
- Genomics doc
- How to cite babelomics
- Id
- Logging in
- Main areas. Cancer
- Main areas. Expression
- Main areas. Functional
- Main areas. Genomics
- Main areas. Processing
- Main areas: Cancer
- Main areas: Expression
- Main areas: Functional
- Main areas: Genomics
- Main areas: Processing
- Network Enrichment (SNOW)
- Other biological data
- Overview and pipelines
- p values adjusted for multiple testing
- PED
- PED_MAP zipped
- Pipelines
- plink.assoc
- plink.assoc.linear
- plink.assoc.logistic
- plink.fisher
- plink.hh
- plink.log
- plink.tdt
- Preprocessing for data matrix
- Preprocessing for microarrays
- Preprocessing for RNA Seq
- Processing
- Ranked
- Requirements
- RNA Seq Normalization
- RNA Seq pipeline
- SDK (Software Development Kit)
- Single Enrichment
- Single Enrichment. Options
- SNPs array pipeline
- Software and databases used
- Technical Info
- The Babelomics Team
- tut_SNP_association
- Tutorial
- Tutorial Affymetrix Expression Microarray Normalization
- Tutorial Agilent One Color Microarray Normalization
- Tutorial Agilent Two Colors Microarray Normalization
- Tutorial Burden test
- Tutorial Class prediction
- Tutorial Clustering
- Tutorial Data matrix preprocessing
- Tutorial Differential Expression for arrays
- Tutorial Differential Expression for RNA Seq
- Tutorial Expression
- Tutorial Expression. Class comparison
- Tutorial Expression. Correlation
- Tutorial Expression. Survival
- Tutorial Functional
- Tutorial Genepix One Color Microarray Normalization
- Tutorial Genepix Two Colors Microarray Normalization
- Tutorial Genomics
- Tutorial OncodriveClust
- Tutorial OncodriveFM
- Tutorial Processing
- Tutorial SNP Association Analysis
- Tutorial SNP stratification
- Upload your data
- VCF 4.0
- VCF file pipeline
- Visualization tools
- Worked examples
- Workflow
- Show 98 more pages…
General
Tutorial
Analysis tools
Worked examples
-
Expression
-
Functional
Clone this wiki locally
Clustering
There are two classes of problems extensively addressed in the genomic data field: supervised and unsupervised problems. We talk about unsupervised problems when we do not know beforehand (or it is not part of our hypothesis) the structure of classes that our data set has.
Typical examples of unsupervised problems would be the definition of the molecular variability of a population of bacteria, or finding the groups of co-expressing genes. In none of these cases we know beforehand if we are going to find one unique group of homogeneous individuals or many groups, and we do not have an idea of how many individuals per group we are going to find (in case of discovering groups). This type of problems are known as unsupervised or class discovery problems.
References
- Dopazo, J. (2007) Clustering - Class discovery in the post-genomic era in Fundamentals of data mining in genomics and proteomics. Springer-Verlag, New York Eds. W. Dubitzky, M. Granzow and D.P. Berrar, (http://bioinfo.cipf.es/node/477).
- Herrero, J., Valencia, A. and Dopazo, J. (2001) A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics, 17, 126-136.
- Dopazo, J. and Carazo, J.M. (1997) Phylogenetic reconstruction using an unsupervised growing neural network that adopts the topology of a phylogenetic tree. J Mol Evol, 44, 226-233
## Class prediction Supervised learning is the machine learning task of inferring a function from labeled training data. This tool builds prediction rules and allows using them for further sample classification. An example: we have two groups of samples relates to control and lung cancer people. We want to build a predictor based on the expression (or other response variable) and classify new samples as control or lung cancer.
- [More information](Class prediction)
References
- Tibshirani, R., Hastie, T., Narasimhan, B., and Chu, G., (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci USA, 99:6567-6572.
- Vapnik, V (1999) Statistical learning theory. John Wiley and Sons. New York.
- Wessels LF, Reinders MJ, Hart AA, Veenman CJ, Dai H, He YD, van't Veer LJ(2005) A protocol for building and evaluating predictors of disease state based on microarray data. Bioinformatics.21(19):3755-62.
## Differential Expression
When performing an experiment with different conditions, scientists are expecting to determine the changes occurring in the cells at the expression level of genes or transcripts. This is achieved by searching for differentially expressed genetic features, that is, features with levels of expression significantly different.
Babelomics allows us to perform differential expression analyses for Microarray and RNA-Seq data.
Arrays
Babelomics' set of tools for differential gene expression analysis for microarrays allows us to adress three different experimental contexts :
-
You can find genes differentially expressed in one, between two or more than two classes.
-
Another kind of data that you can handle with our tools are those concerned with differential gene expression related to a continuous variable.
-
Other analysis that can be done is to explore gene expression related to a survival time.
- [More information](Differential Expression for arrays)
References
-
Bolstad, B, Irizarry, R, Astrand, M, & Speed, T. (2003). A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19(2), 185-193. doi: 10.1093/bioinformatics/19.2.185.
-
Benjamini, Y, & Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 289-300. doi: 10.2307/2346101.
-
Benjamini, Y, & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. The Annals of Statistics. Volume 29, Number 4, 1165-1188.
-
Smyth GK. (2004). Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments. Statistical Applications in Genetics and Molecular Biology, 3(1). doi: 10.2202/1544-6115.1027.
### RNA-Seq
The differential expression tool for RNA-Seq in Babelomics allows us to compare the expression of a genetic feature (gene, transcript, etc.) between two different cases.
The pipeline uses the limma package (Ritchie et al., 2015) and includes the application of the voom function (Law et al., 2014). This function prepares the raw count data to be treated with the limma differential expression functions by estimating the mean-variance relationship and computing appropriate weights for each observation.
- [More information](Differential Expression for RNA-Seq)
References
-
Law CW, Chen Y, Shi W, Smyth GK. voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 2014 Feb 3;15(2):R29.
-
Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, and Smyth GK. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research. 2015; 43 (accepted 6 January 2015).
Find the Babelomics suite at http://babelomics.org