SigMod2: identify disease-associated gene module closely interact with known disease genes
Genome-wide association studies (GWASs) have achieved considerable success in genetic analysis of complex traits. However, GWASs have innate limitations that they are underpowered to detect genetic variants of small marginal effects. Network-based analyses, which integrate GWAS results with gene connectome (including protein-protein interaction, gene co-expression, gene regulation etc.) to study the joint effect of a set of functionally related genes, have the power to identify more genetic factors underlying a trait. The general working pipeline of network-based analysis can be summarized as follows:
- 1.Annotate SNPs to genes
- 2.Compute gene-level P-values from SNP-level P-values
- 3.Transform gene-level P-values into scores
- 4.Overlay gene scores to gene connectome to build a scored gene connectome
- 5.Search gene modules enriched in high score genes
There are many algorithms aimed for identify high score gene modules from a scored gene connectome. Most of them use greedy and heuristic searching strategies, for example jActiveModule, dmGWAS, etc. However, none of them have considered the robustness of the outcomes, whereas noise can exist in both the GWAS results and the gene connectome, making the result less reliable. To overcome such limitations, we first proposed SigMod, to identify strongly interconnected gene module enriched with high association signals, which is exact, efficient, and robust (https://github.com/YuanlongLiu/sigMod). We compare our method with state-of-the-art methods on both simulated and real data, and showed increase power, decrease false positives of our method [paper coming soon].
We also proposed SigMod2, which allows integrating additional information, i.e., genes that are previously known or reported to be associated with the disease, to guide the selection of modules. The selected module has the following property: (1) it is strongly interconnected; (2) it is enriched with high association signals; (3) it is closely connected with the previously reported genes. The "reported genes" can be resulted from any type of study, not limited to the GWAS. Such information can be retrieved from PubMed literatures, or from curated databases, such as the Malacards database http://www.malacards.org/ and Disgenet database http://www.disgenet.org/.
We implemented three algorithms for searching modules. The first is an exact algorithm based on max-flow min-cut, as implemented in SigMod. This algorithm is exact and efficient. The second is a step-forward algorithm, which adds one gene to the selection pool in one step. In each step, the gene that can increase maximumly the value of the objective function (please refer to: https://github.com/YuanlongLiu/sigMod) is chosen. The third algorithm is the step-backward algorithm. It performs in the reverse direction of the step-forward algorithm, by stepwise removing genes that lead to the minimum lose of the objective function. The advantage of the stepwise algorithm compared to the max-flow min-cut algorithm are (1) it is computationally more efficient; (2) it allows to select a given amount of genes (for example to select 100 genes).
Usage and examples will be added soon
- Author: Yuanlong LIU
- Affiliation: French National Institute of Health and Medical Research, Unit 946, Paris, France
- Mail: firstname.lastname@example.org