# Mathematical Principles of WGCNA (Weighted Correlation Network Analysis)

## Objective: Based on the gene expression matrix, identify groups of genes (modules) with similar expression patterns and relate them to external trait data (e.g., disease, treatment, and time).

## 1. Input Gene Expression Matrix

Let the gene expression matrix be:

$$
X \in \mathbb{R}^{n \times p}
$$

- $n$: number of samples
- $p$: number of genes
- The $i$-th row $x_i = (x_{i1}, x_{i2}, \ldots, x_{ip})$ is the expression vector of sample $i$

---

## 2. Similarity Matrix

Define the Pearson correlation coefficient between any two genes $i$ and $j$:

$$
s_{ij} = \text{cor}(x_{\cdot i}, x_{\cdot j}) = \frac{\sum_{k=1}^n (x_{ki} - \bar{x}_{\cdot i})(x_{kj} - \bar{x}_{\cdot j})}{\sqrt{\sum_{k=1}^n (x_{ki} - \bar{x}_{\cdot i})^2} \sqrt{\sum_{k=1}^n (x_{kj} - \bar{x}_{\cdot j})^2}}
$$

where $\bar{x}_{\cdot i}$ is the sample mean of gene $i$.

- The similarity matrix $S = (s_{ij})$ is symmetric, and $s_{ii} = 1$.

---

## 3. Adjacency Matrix

Define the adjacency matrix $A = (a_{ij})$:

$$
a_{ij} = |s_{ij}|^\beta
$$

- $\beta > 1$ is the soft-thresholding power
- Typically, $\beta$ is chosen to make the network approximate a scale-free topology

### Scale-free Topology

In an undirected graph $G = (V, E)$:

- $V$ is the set of nodes, with $|V| = N$
- $E$ is the set of edges
- Each node $v \in V$ has a degree $k_v$, representing the number of connections

Define the degree distribution $P(k)$ as:

$$
P(k) = \frac{\text{number of nodes with degree } k}{N}
$$

A network is said to have **scale-free topology** if its degree distribution satisfies the following power-law:

$$
P(k) \propto k^{-\gamma}, \quad k \geq k_{\text{min}}
$$

where:
- $\gamma > 1$ is the power-law exponent, usually in the range $(2,3)$
- $k_{\text{min}}$ is the minimum degree for which the power-law holds

In other words, there exists a constant $C > 0$ such that for all $k \geq k_{\text{min}}$:

$$
P(k) = C k^{-\gamma}
$$

#### Log-log Transformation

Taking the logarithm of both sides:

$$
\log P(k) = -\gamma \log k + \log C
$$

Thus, in log-log coordinates, $P(k)$ versus $k$ appears as a straight line with slope $-\gamma$.

#### Properties of Power-law Distributions

Power-law distributions have the following mathematical properties:

1. **Scale-free**: No characteristic scale; the degree can vary across several orders of magnitude. For power-law distributions, the expectation and variance may diverge depending on $\gamma$:
   - When $2 < \gamma \leq 3$, the expectation is finite but variance is infinite
   - When $1 < \gamma \leq 2$, both expectation and variance diverge

2. **Heavy-tailed**: High-degree nodes (hub nodes) are rare but occur more frequently than in exponential or normal distributions

#### Normalization Constant of Degree Distribution

To make $P(k)$ a valid probability distribution, normalization is required:

$$
\sum_{k = k_{\text{min}}}^{\infty} P(k) = 1
$$

Substitute the power-law form:

$$
C \sum_{k = k_{\text{min}}}^{\infty} k^{-\gamma} = 1
$$

So the normalization constant $C$ is:

$$
C = \left( \sum_{k = k_{\text{min}}}^{\infty} k^{-\gamma} \right)^{-1}
$$

In the continuous approximation (with $k$ as a continuous variable), normalization can be approximated by integration:

$$
\int_{k_{\text{min}}}^{\infty} C k^{-\gamma} \, dk = 1
$$

Solving gives:

$$
C = (\gamma - 1) k_{\text{min}}^{\gamma - 1}
$$

#### Summary Definition of a Scale-free Network

**Mathematically defined**:

A graph $G = (V, E)$ satisfies:

- There exists $\gamma > 1$ such that the degree distribution follows $P(k) \sim k^{-\gamma}$
- For $k \geq k_{\text{min}}$
- And typically $\gamma$ lies within $[2,3]$

Then $G$ is called a **scale-free network**.

---

## 4. Topological Overlap Matrix (TOM)

Define the number of shared neighbors:

$$
l_{ij} = \sum_{u=1}^{p} a_{iu} a_{ju}
$$

- $l_{ij}$ is the total strength of shared neighbors between nodes $i$ and $j$

Degree of node $i$:

$$
k_i = \sum_{u=1}^{p} a_{iu}
$$

- $k_i$ is the total weight of edges connecting node $i$ to others

### Elements of the Topological Overlap Matrix:

$$
\text{TOM}_{ij} = \frac{l_{ij} + a_{ij}}{\min(k_i, k_j) + 1 - a_{ij}}
$$

- Numerator: $l_{ij} + a_{ij}$, i.e., shared neighbor strength plus direct connection
- Denominator: $\min(k_i, k_j) + 1 - a_{ij}$, a normalization factor to avoid bias from high-degree hubs

### Properties of the Topological Overlap Matrix:

- $\text{TOM}_{ii} = 1$: maximum similarity with itself
- $0 \leq \text{TOM}_{ij} \leq 1$: standardized between 0 and 1

### Interpretation:

- **Higher TOM** (close to 1): nodes $i$ and $j$ share more neighbors and may be directly connected
- **Lower TOM** (close to 0): nodes $i$ and $j$ share few or no common neighbors, more distant relationship

### Concrete Example of TOM Calculation

#### Suppose we have a small adjacency matrix $A$ with 4 nodes ($p = 4$):

$$
A = \begin{pmatrix}
0 & 1 & 1 & 0 \\
1 & 0 & 1 & 1 \\
1 & 1 & 0 & 0 \\
0 & 1 & 0 & 0 \\
\end{pmatrix}
$$

Explanation:
- Node 1 is connected to 2 and 3 (weight 1)
- Node 2 is connected to 1, 3, and 4 (weight 1)
- Node 3 is connected to 1 and 2
- Node 4 is connected to 2

#### Step 1: Compute Node Degrees $k_i$

- $k_1 = a_{12} + a_{13} = 1 + 1 = 2$
- $k_2 = a_{21} + a_{23} + a_{24} = 1 + 1 + 1 = 3$
- $k_3 = a_{31} + a_{32} = 1 + 1 = 2$
- $k_4 = a_{42} = 1$

#### Step 2: Compute Shared Neighbors $l_{ij}$

$$
l_{ij} = \sum_{u=1}^{p} a_{iu} a_{ju}
$$

- $l_{12} = (0)(1) + (1)(0) + (1)(1) + (0)(1) = 1$
- $l_{13} = (0)(1) + (1)(1) + (1)(0) + (0)(0) = 1$
- $l_{14} = (0)(0) + (1)(1) + (1)(0) + (0)(0) = 1$
- $l_{23} = (1)(1) + (0)(1) + (1)(0) + (1)(0) = 1$
- $l_{24} = (1)(0) + (0)(1) + (1)(0) + (1)(0) = 0$
- $l_{34} = (1)(0) + (1)(1) + (0)(0) + (0)(0) = 1$

#### Step 3: Compute $\text{TOM}_{ij}$

$$
\text{TOM}_{ij} = \frac{l_{ij} + a_{ij}}{\min(k_i, k_j) + 1 - a_{ij}}
$$

- $\text{TOM}_{12} = \frac{1 + 1}{2 + 1 - 1} = \frac{2}{2} = 1$
- $\text{TOM}_{13} = \frac{1 + 1}{2 + 1 - 1} = \frac{2}{2} = 1$
- $\text{TOM}_{14} = \frac{1 + 0}{1 + 1 - 0} = \frac{1}{2} = 0.5$
- $\text{TOM}_{23} = \frac{1 + 1}{2 + 1 - 1} = \frac{2}{2} = 1$
- $\text{TOM}_{24} = \frac{0 + 1}{1 + 1 - 1} = \frac{1}{1} = 1$
- $\text{TOM}_{34} = \frac{1 + 0}{1 + 1 - 0} = \frac{1}{2} = 0.5$

#### Final TOM Matrix:

$$
\text{TOM} = \begin{pmatrix}
1 & 1 & 1 & 0.5 \\
1 & 1 & 1 & 1 \\
1 & 1 & 1 & 0.5 \\
0.5 & 1 & 0.5 & 1 \\
\end{pmatrix}
$$

---


## 5. Gene Module Detection

Define the distance matrix:

$$
d_{ij} = 1 - \text{TOM}_{ij}
$$

- Use hierarchical clustering
- Apply Dynamic Tree Cut to automatically cut the clustering tree
- Each module $M_k$ is a set of genes

### 5.1 Define the Topological Overlap Matrix (TOM)

Given the adjacency matrix $A = (a_{ij})$, the node degree is:

$$
k_i = \sum_{u=1}^p a_{iu}
$$

The number of shared neighbors is:

$$
l_{ij} = \sum_{u=1}^p a_{iu} a_{ju}
$$

The element of the topological overlap matrix is defined as:

$$
\text{TOM}_{ij} = \frac{l_{ij} + a_{ij}}{\min(k_i, k_j) + 1 - a_{ij}}
$$

Properties:
- $\text{TOM}_{ii} = 1$
- $0 \leq \text{TOM}_{ij} \leq 1$

### 5.2 Define the Distance Matrix $D$

Based on the topological overlap matrix, define the distance matrix $D = (d_{ij})$:

$$
d_{ij} = 1 - \text{TOM}_{ij}
$$

Properties:
- $d_{ii} = 0$
- $0 \leq d_{ij} \leq 1$

### 5.3 Hierarchical Clustering

Perform hierarchical clustering on the distance matrix $D$ to build a dendrogram $\mathcal{T}$:

- Initial state: each gene is an independent cluster
- Merging strategy: minimize the inter-cluster distance according to the linkage function $\mathcal{L}(C_1, C_2)$
- Typically use average linkage:

$$
\mathcal{L}(C_1, C_2) = \frac{1}{|C_1||C_2|} \sum_{i \in C_1} \sum_{j \in C_2} d_{ij}
$$

### 5.4 Dynamic Tree Cut

Apply the Dynamic Tree Cut algorithm to the dendrogram $\mathcal{T}$ to identify modules:

- Define a subtree $M_k \subseteq V$ such that:
  - Internal similarity is high (TOM high, $d$ low)
  - Heterogeneity between subtrees is high (TOM low, $d$ high)
- Dynamic Tree Cut automatically determines the number and size of modules

Finally, obtain a set of modules:

$$
\{ M_1, M_2, \ldots, M_K \}
$$

where $M_k \subseteq V$, and:

$$
\bigcup_{k=1}^{K} M_k = V
$$

(all genes are completely assigned to at least one module without overlap)

### Mathematical Workflow Summary

1. Input: adjacency matrix $A$
2. Calculate the topological overlap matrix $\text{TOM}$
3. Calculate the distance matrix $d_{ij} = 1 - \text{TOM}_{ij}$
4. Build the hierarchical clustering tree $\mathcal{T}$ based on $D$
5. Apply dynamic tree cutting to obtain the module partition $\{ M_k \}$

### ✨ Additional Mathematical Properties

- During clustering, the dendrogram $\mathcal{T}$ guarantees the ultrametric property:

  For any three nodes $i, j, k$:

  $$
  d_{ij} \leq \max(d_{ik}, d_{jk})
  $$

- Dynamic Tree Cut allows modules of unequal sizes, avoiding the limitation of fixed-radius cuts

### First Part: Why Does the Clustering Tree $\mathcal{T}$ Satisfy the Ultrametric Property?

Recall the **mathematical definition** of ultrametric:

For any three points $i, j, k$:

$$
d_{ij} \leq \max(d_{ik}, d_{jk})
$$

which is stricter than the standard triangle inequality ($d_{ij} \leq d_{ik} + d_{kj}$).

**Hierarchical clustering** naturally ensures the ultrametric property because:
- Hierarchical clustering merges the two closest clusters step-by-step
- Each merge updates the distances between all nodes inside the new cluster
- The distance between any two nodes is defined by the height at which their clusters merged

In **average linkage clustering**, for example, the distance between merged clusters is the average distance between all elements.

Thus, in a dendrogram:
- The distance between any two nodes is determined by the height of their lowest common ancestor (LCA)

Thus, for any three points $i, j, k$:
- $d_{ij}$ is the height of the LCA of $i$ and $j$
- $d_{ik}$ is the height of the LCA of $i$ and $k$
- $d_{jk}$ is the height of the LCA of $j$ and $k$

Since common ancestors can only be higher (not lower), it follows that:

$$
d_{ij} \leq \max(d_{ik}, d_{jk})
$$

**Ultrametricity holds naturally.**

✅ **Core summary in one sentence**:

> **In a tree, the distance between two nodes is determined by the height of their lowest common ancestor (LCA), naturally satisfying the ultrametric property!**


### Second Part: Why Does Dynamic Tree Cut Allow Modules of Unequal Sizes?

**What is the traditional tree-cutting method?**

- A fixed cutting height is set (e.g., $h = 0.25$), and all branches above that height are cut.
- This results in modules of very uniform sizes because all are cut using the same global threshold.

⚡ **But the real biological gene networks are not uniform!**
- Some modules are very tight (low internal distance) and can be cut early.
- Some modules are more loose (higher internal distance), and need to be cut at a higher level.
- Different modules naturally have different densities!

**The mathematical core of Dynamic Tree Cut:**

It determines the cutting points adaptively based on **local tree structure**, rather than using a single global height.

- Small modules: if a subtree is tightly connected, Dynamic Tree Cut will cut it at a lower height.
- Large modules: if a subtree is loose, Dynamic Tree Cut will allow it to merge further until internal consistency is reached.

In other words:
- Dynamic Tree Cut allows the **cutting thresholds to vary dynamically** between modules.
- It does **not enforce uniform module sizes**.

---

## 6. Module Eigengenes (ME)

### 6.1 Basic Setup

Given a module $M_k$:

- It contains $|M_k|$ genes.
- The corresponding gene expression matrix is:

$$
X_k \in \mathbb{R}^{n \times |M_k|}
$$

where:
- $n$ is the number of samples (i.e., each row is a sample)
- $|M_k|$ is the number of genes in the module (i.e., each column is gene expression)

### 6.2 Definition of Module Eigengene

The module eigengene $\text{ME}_k \in \mathbb{R}^n$ is defined as the first principal component of $X_k$:

$$
\text{ME}_k = X_k v_1
$$

where:

- $v_1 \in \mathbb{R}^{|M_k|}$ is the eigenvector corresponding to the largest eigenvalue of the covariance matrix of $X_k$

More formally:

- Covariance matrix:

$$
\Sigma_k = \frac{1}{n-1} X_k^\top X_k \in \mathbb{R}^{|M_k| \times |M_k|}
$$

- Solve the eigenvalue problem:

$$
\Sigma_k v_1 = \lambda_1 v_1
$$

where:
- $\lambda_1$ is the largest eigenvalue ($\lambda_1 \geq \lambda_2 \geq \dots$)
- $v_1$ is the corresponding unit eigenvector ($\|v_1\| = 1$)

Then:

$$
\text{ME}_k = X_k v_1
$$

### 6.3 Why Is It Defined This Way?

- $v_1$ gives the **main direction of variation** in gene expression within the module.
- Projecting onto $v_1$ gives $\text{ME}_k$, which is the coordinate of each sample along that direction.
- $\text{ME}_k$ represents the **dominant expression trend** of the whole module across samples.

In plain terms:

> **Too many genes in a module? No worries—compress them into a single vector that best captures their overall trend: the ME.**

### Optimization Interpretation

The module eigengene $\text{ME}_k$ can also be understood via an optimization problem:

$$
v_1 = \arg\max_{\|v\|=1} \operatorname{Var}(X_k v)
$$

i.e., find the projection direction that gives the largest variance.

### 📚 Pure Math Derivation: Module Eigengene

#### 1. Problem Setup

Assume the centered module expression matrix is:

$$
X_k \in \mathbb{R}^{n \times p}
$$

Example:

$$
X_k = \begin{pmatrix}
2 & 3 & 5 \\
4 & 6 & 8 \\
1 & 1 & 2 \\
5 & 7 & 10 \\
3 & 4 & 6 \\
\end{pmatrix} \in \mathbb{R}^{5 \times 3}
$$

#### 2. Projection and Variance

Project onto a direction $v \in \mathbb{R}^p$, yielding:

$$
z = X_k v
$$

Then the sample variance of $z$ is:

$$
\operatorname{Var}(z) = \frac{1}{n - 1} z^\top z
$$

Substituting $z = X_k v$ gives:

$$
\operatorname{Var}(z) = \frac{1}{n - 1} v^\top X_k^\top X_k v
$$

#### 3. Covariance Matrix

Define:

$$
\Sigma_k = \frac{1}{n - 1} X_k^\top X_k
$$

Then:

$$
\operatorname{Var}(z) = v^\top \Sigma_k v
$$

#### 4. Optimization Problem

Find the direction of maximum projection variance:

$$
v_1 = \arg\max_{\|v\| = 1} v^\top \Sigma_k v
$$

#### 5. Analytical Solution

By linear algebra, the optimal solution $v_1$ is the unit eigenvector corresponding to the largest eigenvalue of $\Sigma_k$.

#### 6. Final Definition of Module Eigengene

$$
\text{ME}_k = X_k v_1
$$



### Mathematical Example of Module Eigengene (ME)

#### Assume a Small Module Expression Matrix $X_k$

Suppose module $M_k$ has 3 genes and 5 samples, with expression matrix:

$$
X_k = \begin{pmatrix}
2 & 3 & 5 \\
4 & 6 & 8 \\
1 & 1 & 2 \\
5 & 7 & 10 \\
3 & 4 & 6 \\
\end{pmatrix}
\in \mathbb{R}^{5 \times 3}
$$

- Each row represents a sample
- Each column represents a gene

#### Step 1: Mean Centering

PCA requires subtracting the mean of each column to center the data.

Calculate the mean of each column:

- Gene 1 mean:

$$
\bar{x}_1 = \frac{2 + 4 + 1 + 5 + 3}{5} = 3
$$

- Gene 2 mean:

$$
\bar{x}_2 = \frac{3 + 6 + 1 + 7 + 4}{5} = 4.2
$$

- Gene 3 mean:

$$
\bar{x}_3 = \frac{5 + 8 + 2 + 10 + 6}{5} = 6.2
$$

Center the matrix:

$$
\tilde{X}_k = X_k - \text{mean}(X_k)
$$

Resulting in the centered matrix:

$$
\tilde{X}_k = \begin{pmatrix}
-1 & -1.2 & -1.2 \\
1 & 1.8 & 1.8 \\
-2 & -3.2 & -4.2 \\
2 & 2.8 & 3.8 \\
0 & -0.2 & -0.2 \\
\end{pmatrix}
$$

#### Step 2: Calculate Covariance Matrix

The covariance matrix is defined as:

$$
\Sigma_k = \frac{1}{n-1} \tilde{X}_k^\top \tilde{X}_k
$$

where $n = 5$, so divide by 4.

First, calculate $\tilde{X}_k^\top \tilde{X}_k$:

- (1,1) element:

$$
(-1)^2 + (1)^2 + (-2)^2 + (2)^2 + (0)^2 = 10
$$

- (1,2) element:

$$
(-1)(-1.2) + (1)(1.8) + (-2)(-3.2) + (2)(2.8) + (0)(-0.2) = 15
$$

- (1,3) element:

$$
(-1)(-1.2) + (1)(1.8) + (-2)(-4.2) + (2)(3.8) + (0)(-0.2) = 19
$$

- (2,2) element:

$$
(-1.2)^2 + (1.8)^2 + (-3.2)^2 + (2.8)^2 + (-0.2)^2 = 22.8
$$

- (2,3) element:

$$
(-1.2)(-1.2) + (1.8)(1.8) + (-3.2)(-4.2) + (2.8)(3.8) + (-0.2)(-0.2) = 28.8
$$

- (3,3) element:

$$
(-1.2)^2 + (1.8)^2 + (-4.2)^2 + (3.8)^2 + (-0.2)^2 = 36.8
$$

Since the covariance matrix is symmetric, fill in accordingly:

Thus:

$$
\tilde{X}_k^\top \tilde{X}_k = \begin{pmatrix}
10 & 15 & 19 \\
15 & 22.8 & 28.8 \\
19 & 28.8 & 36.8 \\
\end{pmatrix}
$$

Divide by 4 to get the covariance matrix:

$$
\Sigma_k = \begin{pmatrix}
2.5 & 3.75 & 4.75 \\
3.75 & 5.7 & 7.2 \\
4.75 & 7.2 & 9.2 \\
\end{pmatrix}
$$

#### Step 3: Find the First Principal Component

Now find the largest eigenvalue $\lambda_1$ and corresponding eigenvector $v_1$ of $\Sigma_k$.

(**Details omitted**; usually solved with numerical software like Python.)

Approximate normalized first eigenvector:

$$
v_1 \approx \begin{pmatrix}
0.39 \\
0.59 \\
0.71 \\
\end{pmatrix}
$$

#### Step 4: Calculate Module Eigengene $\text{ME}_k$

Directly left-multiply:

$$
\text{ME}_k = \tilde{X}_k v_1
$$

Compute the dot product row by row to get the $\text{ME}_k$ values for 5 samples. For example:

- Sample 1:

$$
(-1)(0.39) + (-1.2)(0.59) + (-1.2)(0.71) \approx -1.95
$$

(and so forth for each sample)

### 🎯 Why Must PCA Be Centered? (Detailed Explanation)

#### Objective of PCA

Principal Component Analysis (PCA) seeks to find a direction $v$ such that the variance of the data projected onto $v$ is maximized:

$$
v_1 = \arg\max_{\|v\| = 1} \operatorname{Var}(Xv)
$$

#### How Is Variance Defined?

Standard sample variance:

$$
\operatorname{Var}(x) = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2
$$

Note: the mean $\bar{x}$ must be subtracted—**centering** is necessary.

### What If We Don't Center?

Using raw data to calculate:

$$
X^\top X
$$

would include information about the mean shift. For example, if all samples are shifted up/right, the covariance would artificially inflate, even though the true relationships among variables remain unchanged.

Thus, **uncentered PCA** could mistakenly find directions related to "overall data drift" rather than "maximum variation." 

#### Correct Procedure ✅

First center each column (subtract its mean):

$$
\tilde{X} = X - \bar{X}
$$

Then compute the covariance matrix:

$$
\Sigma = \frac{1}{n-1} \tilde{X}^\top \tilde{X}
$$

Eigen-decompose $\Sigma$ to find the true direction $v_1$ of maximum variance.

### Why Specifically the First Principal Component $v_1$?

#### 1. **Initial Motivation**

Remember our goal:

- Module $M_k$ has dozens or hundreds of genes
- Each sample has a high-dimensional gene expression profile
- Directly analyzing these is **high-dimensional, redundant, and noisy**, making downstream analysis difficult.

We want to:

✨ **Summarize module expression into a single, simple value!**

#### 2. **Why the First Principal Component?**

Because the first principal component has these powerful mathematical properties:

| Property | Description |
|:--|:--|
| Maximum Variance | It captures the direction along which the module's samples have the greatest spread. |
| Most Information Preserved | It retains the most information among all possible 1D summaries. |
| Noise Reduction | It focuses on the main trend and filters out minor noise. |

In short:

- First principal component = **maximum retention of the module's expression trend**!
- No random or naive choice of direction retains more information.

#### 3. **What If We Don't Use the First Principal Component?**

- Random direction: projection variance is low; samples become indistinguishable.
- Single gene: very vulnerable to noise or outliers.
- Simple average: might obscure the dominant trend.

The first principal component is **the only mathematically proven way** to:

> **Maximally preserve information while reducing high-dimensional modules to a single vector.**

#### Two Rigorous Mathematical Properties of the First Principal Component

- Maximum explained variance
- Minimum reconstruction error in low dimension

---

## 7. Module-Trait Association

Define the phenotype matrix:

$$
Y \in \mathbb{R}^{n \times q}
$$

where:
- $n$: number of samples
- $q$: number of traits
- The $j$-th trait is denoted $Y_{\cdot j} \in \mathbb{R}^n$

For each module $k$, with module eigengene:

$$
\text{ME}_k \in \mathbb{R}^n
$$

Define the correlation between module $k$ and trait $j$ as:

$$
r_{kj} = \text{cor}(\text{ME}_k, Y_{\cdot j})
$$

using Pearson correlation.

### Significance Testing

To determine whether correlations are significant:

- Null hypothesis: no relationship, $H_0: r_{kj} = 0$
- Alternative hypothesis: significant linear relationship, $H_1: r_{kj} \neq 0$

Compute the $t$-statistic:

$$
t = \frac{r_{kj} \cdot \sqrt{n - 2}}{\sqrt{1 - r_{kj}^2}}
$$

with degrees of freedom:

$$
\text{df} = n - 2
$$

thus obtaining a p-value.

### Multiple Testing Correction

Since multiple modules and multiple traits are tested, a total of $K \times q$ tests are conducted. False positive rates must be controlled.

Common methods:
- Benjamini–Hochberg FDR control
- Bonferroni correction (more conservative)

After adjusting p-values, significant module-trait relationships can be selected.

---

# Overall Workflow Summary

1. Input expression matrix $X$
2. Compute gene similarity matrix $S$
3. Construct adjacency matrix $A$ via soft-thresholding
4. Compute topological overlap matrix $T$
5. Cluster based on $1-T$ to identify modules $M_k$
6. Extract module eigengenes $\text{ME}_k$
7. Correlate modules with traits $Y$

---

