# Marginal and Joint Effects

Marginal effects measure a genetic variant's influence on a trait when considered **alone, ignoring other variants**, while joint effects measure each variant's independent contribution when all variants are **simultaneously included** in the model, revealing their true effects after **accounting for correlations (LD) between them**.

# Graphical Summary

![Fig](./graphical_summary/slides/Slide9.png)

# Key Formula

In multiple markers linear regression, we extend the single marker model to incorporate multiple genetic variants:

$$
\mathbf{Y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\epsilon}
$$

Where:
- $\mathbf{Y}$ is the $N \times 1$ vector of trait values for $N$ individuals
- $\mathbf{X}$ is the $N \times M$ matrix of genotypes for $M$ variants across $N$ individuals
- $\boldsymbol{\beta}$ is the $M \times 1$ vector of effect sizes for each variant (to be estimated)
- $\boldsymbol{\epsilon}$ is the $N \times 1$ vector of error terms for $N$ individuals and $\boldsymbol{\epsilon} \sim N(0, \sigma^2\mathbf{I})$

We can still use **ordinary least squares (OLS)** to derive the estimators for $\boldsymbol{\beta}$ in matrix form:

$$
\hat{\boldsymbol{\beta}}_{\text{OLS}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}
$$


# Technical Details

## Marginal Effect
In [OLS](https://gaow.github.io/statgen-prerequisites/ordinary_least_squares.html), we discuss the single marker linear regression, which actually estimates the marginal effect of each genetic variant, considering each variant can be treated independently.

The marginal effect of a genetic variant is its association with the trait when analyzed in isolation, without accounting for other variants:

$$
\hat{\beta}_{\text{marginal},j} = (\mathbf{X}_j^T\mathbf{X}_j)^{-1}\mathbf{X}_j^T\mathbf{Y}
$$

Where $\mathbf{X}_j$ is the column vector for the $j$-th variant.

## Joint Effect
The joint effect of a genetic variant is its association with the trait when **analyzed simultaneously with other variants**, i.e., in the multiple markers model:

$$
\hat{\boldsymbol{\beta}}_{\text{joint}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}
$$

Where $\hat{\beta}_{\text{joint},j}$ (the $j$-th element of $\hat{\boldsymbol{\beta}}_{\text{joint}}$) represents the effect of the $j$-th variant after accounting for all other variants in the model.

## Key Differences Between Marginal and Joint Effects

- **Correlation Structure**: 
   - Marginal effects ignore correlations (linkage disequilibrium) between variants
   - Joint effects account for correlations between variants

- **Interpretation**:
   - Marginal effect: The expected change in the trait associated with a unit change in the variant, not accounting for other variants
   - Joint effect: The expected change in the trait associated with a unit change in the variant, holding all other variants constant

- **Consistency**:
   - When variants are uncorrelated, marginal and joint effects are identical
   - When variants are correlated, marginal and joint effects will differ
   - Joint effects can be smaller or larger than marginal effects, or even have opposite signs

# Example

This example demonstrates how marginal effects (conventional GWAS-style single-SNP analysis) can differ from joint effects (multiple regression with all SNPs). We'll create a scenario where all variants appear significant in marginal analysis due to linkage disequilibrium, but only the true causal variant remains significant in joint analysis.


Related topics:
- [OLS](https://gaow.github.io/statgen-prerequisites/ordinary_least_squares.html)
- [LD](https://gaow.github.io/statgen-prerequisites/linkage_disequilibrium.html)

In [35]:
# Clear the environment
rm(list = ls())
set.seed(9)  # For reproducibility

# Define genotypes for 10 individuals at 3 variants
# Create correlated genotypes to simulate linkage disequilibrium
N = 20
M = 3

# Generate correlated genotype data
# Variant 1 is the true causal variant
# Variants 2 and 3 are in LD with variant 1
variant1 <- sample(0:2, N, replace = TRUE, prob = c(0.4, 0.4, 0.2))

# Create LD: variants 2 and 3 are correlated with variant 1
variant2 <- ifelse(runif(N) < 0.9, variant1, sample(0:2, N, replace = TRUE))
variant3 <- ifelse(runif(N) < 0.8, variant1, sample(0:2, N, replace = TRUE))

# Combine into matrix
Xraw_additive <- cbind(variant1, variant2, variant3)
rownames(Xraw_additive) <- paste("Individual", 1:N)
colnames(Xraw_additive) <- paste("Variant", 1:M)

# Standardize genotypes
X <- scale(Xraw_additive, center = TRUE, scale = TRUE)

# Generate phenotype where only Variant 1 has a true causal effect
true_beta1 <- 1.5  # Strong effect for variant 1
epsilon <- rnorm(N, mean = 0, sd = 0.5)
Y_raw <- X[, 1] * true_beta1 + epsilon  # Only variant 1 affects the trait

# Standardize phenotype
Y <- scale(Y_raw)

We first recall how we calculate the marginal effect of each variant in [OLS](https://gaow.github.io/statgen-prerequisites/ordinary_least_squares.html):

In [36]:
# Calculate marginal effects (one SNP at a time)
p_values <- numeric(M)  # Store p-values
betas <- numeric(M)     # Store estimated effect sizes

for (j in 1:M) {
  SNP <- X[, j]  # Extract genotype for SNP j
  model <- lm(Y ~ SNP)  # OLS regression: Trait ~ SNP
  summary_model <- summary(model)
  
  # Store p-value and effect size (coefficient)
  p_values[j] <- summary_model$coefficients[2, 4]  # p-value for SNP effect
  betas[j] <- summary_model$coefficients[2, 1]     # Estimated beta coefficient
}

marginal_OLS_results <- data.frame(
  Variant = colnames(X), 
  Beta = round(betas, 4), 
  P_Value = round(p_values, 4),
  Significant = p_values < 0.05
)


Now let's calculate the joint effects by including all variants in one model:

In [38]:
# Multiple regression model including all variants simultaneously
joint_model <- lm(Y ~ X)
joint_summary <- summary(joint_model)

# Extract the joint effect coefficients and p-values
# Skip the intercept (first row)
joint_betas <- joint_summary$coefficients[2:(M+1), 1]
joint_p_values <- joint_summary$coefficients[2:(M+1), 4]

# Create results table for joint effects
joint_OLS_results <- data.frame(
  Variant = colnames(X),
  Beta = round(joint_betas, 4),
  P_Value = round(joint_p_values, 4),
  Significant = joint_p_values < 0.05
)


In [39]:
print("Marginal Effect:")
marginal_OLS_results
print("Joint Effect:")
joint_OLS_results

[1] "Marginal Effect:"


Variant,Beta,P_Value,Significant
<chr>,<dbl>,<dbl>,<lgl>
Variant 1,0.9416,0.0,True
Variant 2,0.4997,0.0249,True
Variant 3,0.8226,0.0,True


[1] "Joint Effect:"


Unnamed: 0_level_0,Variant,Beta,P_Value,Significant
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<lgl>
XVariant 1,Variant 1,1.0253,0.0001,True
XVariant 2,Variant 2,0.0757,0.4226,False
XVariant 3,Variant 3,-0.1323,0.4886,False
