# Mediator

A mediator is a variable that **sits in the causal pathway between an exposure and an outcome**, explaining the mechanism through which the exposure exerts its effect on the outcome.

# Graphical Summary

![Fig](./graphical_summary/slides/Slide16.png)

# Key Formula

The key formula for the concept of a mediator is represented in a causal diagram as:

$$
X \rightarrow W \rightarrow Y
$$

Where:
- $X$ is the independent variable (e.g., genetic variant)
- $W$ is the mediator variable
- $Y$ is the dependent variable (e.g., trait)
- The arrows ($\rightarrow$) indicate the direction of causal influence

This diagram illustrates that a mediator ($W$) lies in the causal pathway between the independent variable ($X$) and the dependent variable ($Y$). The mediator transmits the effect of $X$ on $Y$, creating a causal pathway through which $X$ affects $Y$.

# Technical Details



## The Mediation Framework

A mediator **explains the mechanism** by which a genetic variant affects an outcome, representing the **actual biological pathway**:

$$
\text{Total Effect} = \text{Effect through Mediator} + \text{Other Effects}
$$

Where:
- **Total Effect**: SNP -> Outcome ($\beta$ without controlling for mediator)
- **Effect through Mediator**: SNP -> Mediator -> Outcome (the mediated pathway = $a \times b$)
- **Other Effects**: Effect NOT through the mediator ($\beta$ when controlling for mediator - includes unmeasured pleiotropy)


## Evidence for Mediation

**Strong evidence** when controlling for the mediator:
1. **Reduces effect size**: Total effect > Effect after controlling for mediator
2. **Eliminates significance**: p-value increases substantially
3. **Biological plausibility**: Mediator is in known pathway

## Analysis Steps

1. **Estimate total effect**: `lm(Outcome ~ SNP)` (should be significant)
2. **Test for mediation**: `lm(Outcome ~ SNP + Mediator)` (SNP effect should reduce/disappear)

**Interpretation**:
- If SNP effect disappears -> **Complete mediation**
- If SNP effect reduces -> **Partial mediation**  
- If SNP effect unchanged -> **No mediation**

## Mediation vs Other Variable Types

| Type | Question | Action | Structure | Examples |
|------|----------|--------|-----------|----------|
| **Confounder** | Does this affect both SNP and outcome? | Must control to remove bias | SNP ← Confounder → Outcome | Population ancestry, age, sex, environmental exposures |
| **Collider** | Is this caused by both SNP and outcome? | Never control—creates bias | SNP → Collider ← Outcome | Study participation, hospital admission, survival to study age |
| **Mediator** | Does this explain HOW SNP affects outcome? | Can control to isolate direct effects | SNP → Mediator → Outcome | Gene expression, protein levels, hormone levels, enzyme activity |

**Important:** Whether a variable is a confounder, mediator, or collider depends on your research question and the causal structure of your specific analysis. The same variable can play different roles in different analyses.


# Related Topics

- [ordinary least squares](https://statfungen.github.io/statgen-primer/ordinary_least_squares.html)
- [confounder](https://statfungen.github.io/statgen-primer/confounder.html)
- [collider](https://statfungen.github.io/statgen-primer/collider.html)

# Example

A genetic variant is associated with height—but *how* does it influence height? We have 5 individuals and suspect growth hormone is the mediator. If growth hormone is the **only** pathway, then controlling for it should eliminate the variant's association with height. If the association disappears, we have complete mediation; if it merely reduces, we have partial mediation.


## Setup

In [31]:
# Clear the environment
rm(list = ls())
set.seed(16)
# Define genotypes for 5 individuals at 3 variants
# These represent actual alleles at each position
# For example, Individual 1 has genotypes: CC, CT, AT
genotypes <- c(
 "CC", "CT", "AT",  # Individual 1
 "TT", "TT", "AA",  # Individual 2
 "CT", "CT", "AA",  # Individual 3
 "CC", "TT", "AA",  # Individual 4
 "CC", "CC", "TT"   # Individual 5
)
# Reshape into a matrix
N = 5
M = 3
geno_matrix <- matrix(genotypes, nrow = N, ncol = M, byrow = TRUE)
rownames(geno_matrix) <- paste("Individual", 1:N)
colnames(geno_matrix) <- paste("Variant", 1:M)

alt_alleles <- c("T", "C", "T")

# Convert to raw genotype matrix using the additive model
Xraw_additive <- matrix(0, nrow = N, ncol = M) # count number of non-reference alleles

rownames(Xraw_additive) <- rownames(geno_matrix)
colnames(Xraw_additive) <- colnames(geno_matrix)

for (i in 1:N) {
  for (j in 1:M) {
    alleles <- strsplit(geno_matrix[i,j], "")[[1]]
    Xraw_additive[i,j] <- sum(alleles == alt_alleles[j])
  }
}

X <- scale(Xraw_additive, center = TRUE, scale = TRUE)


We assign the growth hormones levels for each individual from variant 3:

In [32]:
# Generate growth hormone levels FROM variant 3 (mediator pathway)
GH_raw <- 6 + 2 * Xraw_additive[, 3] + rnorm(N, 0, 0.1)  # Variant 3 affects GH
GH <- scale(GH_raw)

Then we assign the height for the individuals (mediated by hormones):

In [33]:
# Create mediator structure: Variant 3 -> Growth Hormone -> Height
# Height is caused by:
# 1. Direct effect from growth hormone (the mediator)
# 2. Small effects from variants 1 and 2 (not mediated)
# 3. NO direct effect from variant 3 (fully mediated through GH)

height_raw <- 160 +                      # Base height
             3 * GH +                    # Growth hormone effect (mediator pathway)
             1 * X[, 1] +                # Small direct effect from variant 1
             0.5 * X[, 2] +              # Small direct effect from variant 2
             0 * X[, 3] +                # NO direct effect from variant 3 (fully mediated)
             rnorm(N, 0, 0.5)            # Small noise

Y <- scale(height_raw)

## OLS Regression

Then we perform OLS regression for the third variant:

In [34]:
# Perform OLS regression for the third variant
SNP <- X[, 3]  # Extract genotype for SNP 3
model <- lm(Y ~ SNP)  # OLS regression: Trait ~ SNP
adjusted_model <- lm(Y ~ SNP + GH)  # Adjust for GH, OLS regression: Trait ~ SNP + GH
summary_model <- summary(model)
summary_adjusted_model <- summary(adjusted_model)

p_value <- summary_model$coefficients[2, 4]  # p-value for SNP effect
beta <- summary_model$coefficients[2, 1]     # Estimated beta coefficient
p_value_adjusted <- summary_adjusted_model$coefficients[2, 4]  # p-value for SNP effect adjusted for growth hormone
beta_adjusted <- summary_adjusted_model$coefficients[2, 1]     # Estimated beta coefficient adjusted for growth hormone

In [35]:
# Create results table
results <- data.frame(Variant = "Variant 3", Beta = beta, P_Value = p_value, 
                      Beta_Adjusted = beta_adjusted, P_Value_Adjusted = p_value_adjusted)
results

Variant,Beta,P_Value,Beta_Adjusted,P_Value_Adjusted
<chr>,<dbl>,<dbl>,<dbl>,<dbl>
Variant 3,0.9277802,0.02304393,2.220295,0.6211939


Variant 3 shows complete mediation: its association with height (p = 0.023) disappears when controlling for growth hormone (p = 0.62). This confirms that Variant 3 affects height entirely through growth hormone, with no direct pathway.