Main areas: Expression

Francisco García edited this page Jan 20, 2015 · 29 revisions
Clone this wiki locally

Clustering

There are two classes of problems extensively addressed in the genomic data field: supervised and unsupervised problems. We talk about unsupervised problems when we do not know beforehand (or it is not part of our hypothesis) the structure of classes that our data set has.

Typical examples of unsupervised problems would be the definition of the molecular variability of a population of bacteria, or finding the groups of co-expressing genes. In none of these cases we know beforehand if we are going to find one unique group of homogeneous individuals or many groups, and we do not have an idea of how many individuals per group we are going to find (in case of discovering groups). This type of problems are known as unsupervised or class discovery problems.

Class prediction

Supervised learning is the machine learning task of inferring a function from labeled training data. This tool builds prediction rules and allows using them for further sample classification. An example: we have two groups of samples relates to control and lung cancer people. We want to build a predictor based on the expression (or other response variable) and classify new samples as control or lung cancer.

  • [More information](Class prediction)

Differential Expression

When performing an experiment with different conditions, scientists are expecting to determine the changes occurring in the cells at the expression level of genes or transcripts. This is achieved by searching for differentially expressed genetic features, that is, features with levels of expression significantly different.

Babelomics allows us to perform differential expression analyses for Microarray and RNA-Seq data.

Arrays

Babelomics' set of tools for differential gene expression analysis for microarrays allows us to adress three different experimental contexts :

  1. You can find genes differentially expressed in one, between two or more than two classes.

  2. Another kind of data that you can handle with our tools are those concerned with differential gene expression related to a continuous variable.

  3. Other analysis that can be done is to explore gene expression related to a survival time.

  • [More information](Differential Expression for arrays)

RNA-Seq

The differential expression tool for RNA-Seq in Babelomics allows us to compare the expression of a genetic feature (gene, transcript, etc.) between two different cases.

The pipeline uses the limma package (Ritchie et al., 2015) and includes the application of the voom function (Law et al., 2014). This function prepares the raw count data to be treated with the limma differential expression functions by estimating the mean-variance relationship and computing appropriate weights for each observation.

  • [More information](Differential Expression for RNA-Seq)

References

  • Law CW, Chen Y, Shi W, Smyth GK. voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 2014 Feb 3;15(2):R29. doi: 10.1186/gb-2014-15-2-r29. PubMed PMID: 24485249; PubMed Central PMCID: PMC4053721.

  • Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, and Smyth GK. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research. 2015; 43 (accepted 6 January 2015).