Gene Set Network Enrichment (Network Miner)

luzgaral edited this page Apr 7, 2015 · 8 revisions
Clone this wiki locally

NetworkMiner. As SNOW, this one introduces protein-protein interaction data in the functional profiling of high-throughput experiments results. The method detects gene sets (forming a protein-protein interaction subnetwork) that are consistently associated to high or low values in a ranked list of genes.


Input data

The input for the Gene Set Network Enrichment Analysis (also known as NetworkMiner) is a list of genes, transcripts or proteins ordered according to a phenotypic parameter (as well as for Gene Set Enrichment Analysis or FatiScan). The file must contain one column with the id of your gene and a second column (which is optional) with a numeric value that the tool will use to order the genes.

Example:

Gene1    0.01
Gene2    0.04
Gene3    0.09
Gene4    0.2

Optionally, a list of seed molecules may be incorporated. This list of seed genes may represent genes that are of interest because they have already been associated to the disease you are studying. Then, you want to know how well the genes at the top of your ranked genes can be connected to the genes (seed list) that are already implicated in the condition under study.

If you have a non-ranked list of genes or proteins, because you have already preselected those that are of interest (for example, because they are differentially expressed in case/control studies, mutated in a disease, etc), we recommend you to use Network Enrichment Analysis

Key network concepts

  • Network: collection of protein-protein interactions modelled as an undirected graph where the nodes represent the proteins and the edges the interactions.
  • Minimal Connected Network (MCN): Given a list of proteins or genes, the MCN represents smallest network that connects the elements in the list.
  • Component: Group of nodes connected between them but separated from the rest of nodes.

Steps

  1. Select your input list. Browse your own ranked list. The ranked list must be a plain text file and should contain with two columns. The first column must contain your genes, transcripts or proteins. The second column (which is optional), can contain a numeric value. If a second column file is provided, then you can tell Network Miner how to rank your list. If not, the default order will be considered. Optional: If you have decided to include a seed list of genes in your analysis, then browse it. The seed list must be a plain text file with one column. This column must contain your genes, transcripts or proteins.

  2. Select your method parameters

    • Nature of your list. Tell the program whether you submit a list of proteins, transcripts or genes. This is important since a gene can code more than one proteins, and the topology of the interactome can be slightly different.
    • Select your specie. Choose the organism that you are studying.
    • Select interactome. Choose the interactome you want to use. There are two possibilities: a non-curated interactome (all ppis) or only ppis detected by at least two methods (curated). See SNOW paper for more information about their generation.
    • Sort ranked list (ascending or descending): If you have provided a ranked list with a second column (numerical), tell the tool how you want to order your ids according the values in the second column.
    • Allow one external intermediate in the subnetwork (yes or not). This option is to indicate how the Minimal Connected Network (MCN) should be generated in the test 2. The MCN is generated calculating the shortest paths among all the pairs of proteins/genes in your list. Only some of the shortest path will be added to the MCN, the ones that join two elements in the list directly (select ´no´) or the ones that join two elements by an external node (select ´yes´). External nodes are genes that are not in your list, but are direct intermediates between them.
  3. Job information Give a name to the job and tell the folder where the resulting files should be saved.

  4. Launch job Press launch button and wait until the analysis is finished. See the state of your job by clicking the jobs button on the top right at the panel menu. A box will appear at the right side listing all your jobs. When the analysis is finished, it will be labelled as "Ready". Then, click on it and you will be redirected to the results page. A normal job may last approximately less than two minutes but the time may vary depending on the size of the list.

Interpreting the output results

Gene Set Network Analysis looks for significant subnetworks of protein-protein interactions within a list of ranked genes/proteins to find out their possible role as a protein complex, as a signalling pathway, etc. Specifically, the tool subdivides the ranked list into a sequence of additives partitions. For any partition Network Miner maps the proteins onto the interactome scaffold and finds the Minimum Connected Network (MCN). The partition of interest is the sublist that provides a new protein capable to improve the connectivity of the subnetwork formed by the precedent proteins. This can be identified through the relative maxima for a parameter that account for the connectivity of proteins in the MCN (average nodes per component). These relative maxima are used to select the MCNs of interest and, for each of them, a score is calculated (a balance between the connectivity of the network and the position in the ranked). This score account how well you subnetwork represents a top ranked proteins in your ranked list as well as how interconnected are them. Based on the score, we select the MCN of interest and obtain a p-value that accounts how likely is to expect such MCN by chance. MCN p-value is obtained by comparing the MCNs selected versus 1000 random MCNs with the same number of proteins/genes (which corrects the effect of size).

The program also offers the option of defining, a part from the ranked list, seed genes (optional). These seed genes are genes known to be associated with the phenotype of interest and the tool analyses the network relationship between you ranked list and them. In this second type of analysis, the selection procedure is the same than describe above, but keeping always the seed molecules within the list i.e. the ranked list of n molecules is subdivided into a sequence of additives partitions that always contains the seed list.

The results page contains the following sections:

Job information and input parameters

A short description of your analysis.

Results

MCN selected size tells you the position of the last protein from your ranked list that has been added to the MCN. MCN selected p-value indicates the probability of finding a MCN with equal or higher interconnectivity (average number of nodes per component) as yours. Figures in the results page shows the parameters used by the tool, average nodes per component and the score, as a function of the number of proteins that are incorporated in the analysis (sublist size). The selected cut-off is represented as a red point.

  • Network viewer: The selected MCN is presented in the a Network Viewer box. By using the network toolbar, you can select nodes or edges and change its visual properties such as the color, size, shape, etc. For a detailed information on how to use the network viewer, you can have a look at the network viewer documentation. Note that the nodes from your list are rounded and coloured in grey, whereas the external nodes added as intermediate are squares with white color.

The network viewer is a module of CellMaps, a web platform for visualising biological networks developed in our lab. Then, if you aim to integrate the subnetwork obtained in this analysis with additional data from a more sophisticated and complex study, you can download the subnetwork (result_mcn.sif file), the protein attributes (result_mcn_interactors.txt) and move to CellMaps to continue your analysis.

  • Interactors information Finally you will see a table with information regarding your genes/transcripts/proteins in the MCN. The topological values account for the subnetwork represented in the network viewer (that is, the MCN). In the 'list' column you have the information telling you which protein is from your list or is an external intermediate.

Continue processing

Once you have identified your MCN within your ranked list, next step is to describe it in terms of function and structure. By sending your MCN to Gene Enrichment Analysis (fatiGO), you can identify biological functions overrepresented in your MCN. By sending your MCN to Network Enrichment Analysis (SNOW), you can identify patterns in the topology of your MCN that can can give you an idea of the type of module you have identify (protein complex-like or signalling pathway-like).

  • Gene Enrichment Analysis (FatiGO) The FatiGO default analysis tests for over-representation of functional terms in the subnetwork compared against the same interactome selected in Network Miner analysis. The input list to FatiGO tool contains every subnetwork node, excluding isolated interactors (interactors with any connection in the subnetwork). If external nodes have been incorporated, FatiGO input list will include also these external nodes. The available functional terms to search for enrichment are GO, Kegg, Reactome, Biocarta, Interpro, miRNA targets and jaspar TFBS databases.

  • Network Enrichment Analysis (SNOW) The SNOW default analysis tests for over/under-representation of topological parameters in the subnetwork compared against the same interactome, allowing the same number of external nodes selected in Network Miner analysis. The input list to SNOW tool is the sublist selected in Network Miner, including isolates nodes. If external nodes have been incorporated in Network Miner, SNOW input list will not include them in order to preserve the same characteristics than Network Mine analysis.

Worked examples and exercises

We downloaded a microarray experiment from GEO GDS715. It describes a set of Acute Myeloid Leukemia (AML) samples treated with a panel of compounds inducing, with different success, their differentiation to mature cells. The gene expression data of each AML sample treated with a compound was compared to the expression data of the negative controls, AML cells and AML cells treated with compounds that do not alter gene expression. For the comparison between both conditions, we applied a Student t-test to every pair of classes: AML+compound and control.

The output of Expression Differencial Analysis is a set of lists of genes, sorted by the t statistic or, in other words, by their importance in the difference between the compound action versus AML status. Then we wanted to give different functional annotation to these lists using Gene Set Network Enrichment.

Examples of Gene Set Network Enrichment input list files:

  1. AML + sulmazole
  2. AML + fluorouridine
  3. AML + phenanthroline

For each input list file, create a new project (e.g. AML_sulmazole) and start a new Gene Set Network Enrichment Analysis job in the Functional analysis >> Gene Set Network Enrichment section of the tools. Theses are the steps:

  • Define your input file
  • Nature of the lists: Genes
  • Choose the organism: Homo sapiens
  • Interactome confidence: Curated
  • Sort ranked list: Descending (checking the order of your rank)
  • Allow one external intermediate in the subnetwork: Yes
  • Submit the job: press the launch button


How to cite

  • Discovering the hidden sub-network component in a ranked list of genes or proteins derived from genomic experiments. García-Alonso L, Alonso R, Vidal E, Amadoz A, de María A, Minguez P, Medina I, Dopazo J. Nucleic Acids Res. 2012 Nov 1;40(20):e158 - Pubmed - NAR - Google Scholar

Go back to the Functional page
Go back to the Home page
Go back to the Worked examples for all tools page