Main areas: Functional

Francisco García edited this page Jan 19, 2015 · 18 revisions
Clone this wiki locally

Tools for the functional interpretation of the genomic data.

GO Enrichment

The functional interpretation of genomic data is usually performed by studying the enrichment of any type of biologically relevant annotation in the genes or proteins selected by the experiment with respect to the corresponding distribution of the annotation in the background, typically the rest of genes or proteins in the genome.

Single enrichment analysis is less sensitive than gene set analysis and it is reccommended only in situations in which the genes are selected in the experiment in a categorical way (for example, because they are present in amplified or deleted regions or they are targets of regulatory factors, etc.). In many cases this selection of genes is performed by multiple individual, gene-wise tests. This testing strategy is quite conservative and produces, at the end, a loss of testing power in the whole procedure because a large number of false negatives are sacrificed in order to preserve a low ratio of false positives.

The GO Enrichment method (Al-Shahrour et al., 2004) was the first proposal for functional enrichment that took into account the multiple testing problem. GO Enrichment works as follows:

  1. GO Enrichment takes two lists of genes. Ideally a group of interest and the rest of the genes in the experiment, although any two groups formed in any way, can be tested against each other.
  2. These two lists are converted into two lists of functional terms using the corresponding gene or protein - term annotation table.
  3. Then a Fisher's exact test for 2×2 contingency tables is used to check for significant over-representation of functional terms in one of the lists with respect to the other one.
  4. Multiple testing correction to account for the multiple hypothesis tested (one for each functional term) is applied. GO Enrichment uses the FDR B&H method.

References

  • Al-Shahrour, F., Minguez, P., Tárraga, J., Medina, I., Alloza, E., Montaner, D., & Dopazo, J. (2007). FatiGO+: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments. Nucleic Acids Research 35 (Web Server issue): W91-96
  • Al-Shahrour, F., Minguez, P., Tárraga, J., Montaner, D., Alloza, E., Vaquerizas, J.MM., Conde, L., Blaschke, C., Vera, J. & Dopazo, J. (2006). BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments. Nucleic Acids Research (Web Server issue) 34: W472-W476
  • Al-Shahrour, F., Minguez, P., Vaquerizas, J.M., Conde, L. & Dopazo, J. (2005). BABELOMICS: a suite of web-tools for functional annotation and analysis of group of genes in high-throughput experiments. Nucleic Acids Research, 33 (Web Server issue): W460-W464
  • Al-Shahrour, F., Díaz-Uriarte, R. & Dopazo, J. (2004). FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics 20: 578-580

Gene Set GO Enrichment

Gene set methods are much more sensitive than single enrichment methods in detecting gene sets (defined as sets of genes with a common annotation) with a collective behaviour in a genomic experiment. These methods very efficiently detect gene sets (annotations) that are consistently associated to high or low values in a ranked list of genes.

Here a logistic regression method has been implemented, which detects asymmetrical distributions of annotations within ranked lists of genes.

References

  • Al-Shahrour F, Arbiza L, Dopazo H, Huerta J, Minguez P, Montaner D, & Dopazo J (2007). From genes to functional classes in the study of biological systems. BMC Bioinformatics 8: 114
  • Al-Shahrour, F., Minguez, P., Tárraga, J., Montaner, D., Alloza, E., Vaquerizas, J.MM., Conde, L., Blaschke, C., Vera, J. & Dopazo, J. (2006). BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments. Nucleic Acids Research (Web Server issue) 34: W472-W476
  • Al-Shahrour, F., Minguez, P., Vaquerizas, J.M., Conde, L. & Dopazo, J. (2005). BABELOMICS: a suite of web-tools for functional annotation and analysis of group of genes in high-throughput experiments. Nucleic Acids Research, 33 (Web Server issue): W460-W464
  • Al-Shahrour, F., Díaz-Uriarte, R. & Dopazo, J. (2005). Discovering molecular functions significantly related to phenotypes by combining gene expression data and biological information. Bioinformatics 21: 2988-2993

Network Enrichment

The Network Enrichment tool introduces protein-protein interaction data into the functional profiling of genomic data. Network Enrichment performs two different and complementary types of analysis to the list of proteins/genes submitted:

  • Evaluates the role of the list within the interactome. Network Enrichment identifies hubs in the list of proteins/genes (nodes) and evaluates the global degree of connections, centrality and clustering by comparing the distributions of nodes of the list versus the complete distribution of these parameters into the interactive.

  • Evaluates the list’s cooperative behaviour as a functional module. Network Enrichment calculates the MCN, the minimum network that connects the proteins/genes in the list using or without using an external nodes (a non-listed protein) to connect nodes in the list. The topology of this network is evaluated by comparing distributions of node parameters of this MCN against a set of random MCNs with same size range. This approach is similar to other’s tools for functional enrichment analysis such as GO Enrichment with the difference of not having pre-annotated functional modules to evaluate, instead Network Enrichment has to build it, that is the MCN.

References

Gene Set Network Enrichment

You have obtained a ranked list of proteins or genes ordered from some particular experiment (e.g. they are the result of a differential expression analysis, from a GWAS or they have an interesting impact on phenotype like cell cycle, etc). From this list of proteins/genes, you want to use protein-protein interaction data to find out their possible role as a protein complex, as a signaling pathway, etc. Gene Set Network Enrichment looks for significant subnetworks of protein-protein interactions within a list of ranked genes/proteins to find out their possible role as a protein complex, as a signalling pathway, etc.

The program also offers the option of defining seed genes. These seed genes are genes known to be associated with the phenotype of interest and therefore you want to analyse the network relationship between you ranked list and them.

Moreover, the result is presented in a network viewer box. Different layouts are possible that implement different algorithms for distributing the net components in different ways. Furthermore, different backgrounds, including different cell views over which the network can be represented.

References