Merge pull request #5 from aaronkw/usersnap-181

Add description for TERM_Q_VALUE
aaronkw · Feb 7, 2020 · d50879f · d50879f
2 parents 9813242 + ada0ff5
commit d50879f
Showing 1 changed file with 12 additions and 4 deletions.
diff --git a/docs/modules.rst b/docs/modules.rst
@@ -6,12 +6,20 @@ HumanBase applies community detection to find cohesive gene clusters from a prov
 
 Method
 ------
-The approach\ :sup:`1` is based on shared k-nearest-neighbors (SKNN) and the Louvain community-finding algorithm to cluster the user-selected tissue network into distinct modules of tightly connected genes. The SKNN-based strategy has the advantages of alleviating the effect of high-degree genes and accentuating local network structure by connecting genes that are likely to be functionally clustered together. First, we create a subset of the (user-selected) network containing only the (user-provided) genes and all the edges between them. Given the resulting graph G with V nodes (user-provided genes) and E edges, with each edge between genes i and j associated with a weight p\ :sub:`ij`, this technique proceeds as follows:
+The approach\ :sup:`1` is based on shared k-nearest-neighbors (SKNN) and the Louvain community-finding algorithm to cluster the user-selected tissue network into distinct modules of tightly connected genes. The SKNN-based strategy has the advantages of alleviating the effect of high-degree genes and accentuating local network structure by connecting genes that are likely to be functionally clustered together. 
+
+This technique proceeds as follows:
+  (i) First, we create a subset of the (user-selected) network containing only the (user-provided) genes and all the edges between them. Given the resulting graph G with V nodes (user-provided genes) and E edges, with each edge between genes i and j associated with a weight p\ :sub:`ij`, 
+  (ii) Calculate a new weight for the edge between each pair of nodes i and j that is equal to the number of k nearest neighbors (based on the original weights p\ :sub:`ij`) shared by i and j;
+  (iii) Choose the top 5% of the edges based on the new edge weights, and apply a graph clustering algorithm.
 
-  (i) calculate a new weight for the edge between each pair of nodes i and j that is equal to the number of k nearest neighbors (based on the original weights p\ :sub:`ij`) shared by i and j;
-  (ii) choose the top 5% of the edges based on the new edge weights, and apply a graph clustering algorithm.
+This approach has two key desirable characteristics: 
+  (i) Choosing the highest k values instead of all edges deemphasizes high-degree 'hub' nodes and brings equal attention to highly specific edges between low-degree nodes; 
+  (ii) Emphasizing local network-structure by connecting nodes that share a number of local neighbors automatically links genes that are highly likely to be part of the same cluster. 
+
+We use a dynamic :code:`k = min(50, 0.2 * |V|)` to obtain the shared-nearest-neighbor tissue-specific network and apply the Louvain algorithm to cluster this network into distinct modules. To stabilize clustering across different runs of the Louvain algorithm, we run the algorithm 100 times and calculate cluster comembership scores for each pair of genes that was equal to the fraction of times (out of 100) the pair was assigned to the same cluster. Genes are assigned to clusters where their comembership score ≥ 0.9.
 
-This approach has two key desirable characteristics: (i) choosing the highest k values instead of all edges deemphasizes high-degree 'hub' nodes and brings equal attention to highly specific edges between low-degree nodes; and (ii) emphasizing local network-structure by connecting nodes that share a number of local neighbors automatically links genes that are highly likely to be part of the same cluster. We use a dynamic :code:`k = min(50, 0.2 * |V|)` to obtain the shared-nearest-neighbor tissue-specific network and apply the Louvain algorithm to cluster this network into distinct modules. To stabilize clustering across different runs of the Louvain algorithm, we run the algorithm 100 times and calculate cluster comembership scores for each pair of genes that was equal to the fraction of times (out of 100) the pair was assigned to the same cluster. Genes are assigned to clusters where their comembership score ≥ 0.9.
+Resulting modules are then tested for functional enrichment using genes annotated to Gene Ontology biological process terms. Representative processes and pathways enriched within each cluster are presented alongside of the cluster with their resulting Q value. The Q value of each term associated to the modules is calculated using one-sided Fisher's exact tests and Benjamini–Hochberg corrections to correct for multiple tests.
 
 
 1. Krishnan A*, Zhang R*, Yao V, Theesfeld CL, Wong AK, Tadych A, Volfovsky N, Packer A, Lash A, Troyanskaya OG.(2016) Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nature Neuroscience.