# Intuition



![figure](./cartoons/5_3.svg)

# Notations

## Mediator definition

A **mediator** is a variable that lies on the causal pathway between the independent variable (genotype) and the dependent variable (trait). It explains how or why the independent variable affects the dependent variable.

- **Example in Statistical Genetics:**  
  Suppose we are studying the effect of a genetic variant (genotype) on a trait (e.g., height). A potential **mediator** could be **Growth Hormone Levels**, which mediates the relationship between the genetic variant and the trait. In this case, growth hormone levels helps explain how the genetic variant influences height.

- **Graphical Representation:**
  $$ \text{SNP} \to \textbf{Growth Hormone Levels} \to \text{Height} $$  


Understanding mediation allows us to estimate both the **direct effect** of the genotype on the trait and the **indirect effect** through the mediator. Depending on the research question, mediators may or may not be included in the model.


# Example

In [None]:
rm(list=ls())
set.seed(21)  # For reproducibility

# Genotype matrix for 100 individuals and 3 variants
N <- 100  # Number of individuals
M <- 3    # Number of SNPs (variants)

# Create a random genotype matrix (0, 1, 2 values for each SNP)
X_raw <- matrix(sample(0:2, N * M, replace = TRUE), nrow = N, ncol = M)

# Adding row and column names
rownames(X_raw) <- paste("Individual", 1:N)
colnames(X_raw) <- paste("Variant", 1:M)

# Standardize genotype matrix (mean = 0, sd = 1 for each SNP)
X <- scale(X_raw, scale = TRUE)


In [2]:
# Simulate Growth Hormone Levels (Mediator)
# Growth hormone level is influenced by SNPs, for simplicity use Variant 2 for effect
true_effect_snp_to_hormone <- 1.5   # SNP (Variant 2) affects growth hormone levels
growth_hormone <- true_effect_snp_to_hormone * X[, 2] + rnorm(N, mean = 0, sd = 0.5)


In [3]:
# Simulate Height (Trait)
# Height is influenced by both SNP and growth hormone levels
true_effect_snp_to_height <- 2    # Direct effect of SNP on height
true_effect_hormone_to_height <- 3  # Mediator effect: growth hormone affects height
height <- true_effect_snp_to_height * X[, 2] + true_effect_hormone_to_height * growth_hormone + rnorm(N, mean = 0, sd = 2)


In [None]:
# Now, estimate the total effect and direct effect
# Estimate Total Effect (SNP → Height) without controlling for the mediator
model_total_effect <- lm(height ~ X[, 2])

summary(model_total_effect)


Call:
lm(formula = height ~ X[, 2])

Residuals:
   Min     1Q Median     3Q    Max 
-5.875 -1.526 -0.285  1.371  8.098 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.08926    0.25283   0.353    0.725    
X[, 2]       6.68051    0.25411  26.290   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.528 on 98 degrees of freedom
Multiple R-squared:  0.8758,	Adjusted R-squared:  0.8746 
F-statistic: 691.2 on 1 and 98 DF,  p-value: < 2.2e-16


In [5]:
# Estimate Direct Effect (SNP → Height controlling for Growth Hormone)
model_direct_effect <- lm(height ~ X[, 2] + growth_hormone)

summary(model_direct_effect)


Call:
lm(formula = height ~ X[, 2] + growth_hormone)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.9033 -1.3465  0.1348  1.2326  3.8371 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     0.01725    0.18305   0.094    0.925    
X[, 2]          1.01182    0.62424   1.621    0.108    
growth_hormone  3.64659    0.38376   9.502 1.59e-15 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.829 on 97 degrees of freedom
Multiple R-squared:  0.9357,	Adjusted R-squared:  0.9344 
F-statistic: 705.6 on 2 and 97 DF,  p-value: < 2.2e-16


The analysis shows that the **X[, 2]** is a strong predictor of height in the first model, with a highly significant effect. However, in the second model, after including growth hormone, **X[, 2]** becomes less significant (p-value = 0.108), indicating that growth hormone might mediate part of the relationship between **X[, 2]** and height. Growth hormone itself is highly significant in the second model and explains a substantial amount of variance in height. The inclusion of growth hormone improves the model fit, as reflected by a lower residual standard error and a higher $R^2$ (0.9357).


> slide 62 from Xin He's slides


> slide 65 from Xin He's slides


> slide 30-34 from Gao Wang's slides

> slide 348-350 from GW