# Intuition

fixed effect model


# Notations

> slide 146 in Hailiang Huang's slides
>
> slide 244-246 from GW

When we move from one study to multiple studies (e.g., in a meta-analysis), you may want to test whether a particular SNP has the same effect across studies. This is where the idea of sharing a common effect size becomes relevant.

In a meta-analysis of multiple studies, we often assume that the true effect size for each SNP is **the same across all studies** (but possibly with different observed effect sizes due to different sample sizes, populations, etc.). The meta-analysis combines results from multiple studies, and the fixed effect model in this context would assume that the SNP's effect size is consistent across all the studies being analyzed.

**[FIXME]** should we introduce the concept of meta-analysis first before discussing about FEM and REM?

---



In a fixed effect meta-analysis, we assume that the observed effect size $\hat{\beta}_k$ from study $k$ follows a normal distribution:

$$
\hat{\beta}_k \sim N(\beta_j, s_k^2)
$$

Where:
- $\beta_j$ is the **true effect size** for SNP $j$, assumed to be the same across all studies.
- $s_k^2$ is the **variance** of the observed effect size $\hat{\beta}_k$ for study $k$, reflecting the precision of the estimate in study $k$.



## Combined Estimate of Effect Size

   The goal of the meta-analysis is to estimate the true effect size $\beta_j$ from the observed effect sizes $\hat{\beta}_k$ across all studies. We do this by calculating a **weighted average** of the observed effect sizes, where the weights are proportional to the precision of each estimate (i.e., inversely proportional to the variance $s_k^2$):

   $$
   \hat{\beta}_j = \frac{\sum_{k=1}^{K} w_k \hat{\beta}_{k}}{\sum_{k=1}^{K} w_k}
   $$

   Where:
   - $w_k = \frac{1}{s_k^2}$ is the weight for study $k$.




## Maximum Likelihood Estimation (MLE) of the Summary Effect

   The MLE for the summary effect size $\beta_j$ under the fixed effect model is equivalent to the **inverse variance weighting**. This is because we assume that the observed effect sizes $\hat{\beta}_k$ are normally distributed with mean $\beta_j$ and variance $s_k^2$. The likelihood function for the fixed effect model is maximized by weighting the observed effect sizes by the inverse of their variances:

   $$
   \hat{\beta}_j = \frac{\sum_{k=1}^{K} \frac{\hat{\beta}_k}{s_k^2}}{\sum_{k=1}^{K} \frac{1}{s_k^2}}
   $$

   This is the **inverse variance weighting** formula used to combine the effect sizes from different studies in a meta-analysis.


# Example

In [1]:
rm(list=ls())
set.seed(22)  # For reproducibility

# ---- Define True Effect Size ----
true_effect_size <- 0.5  # True effect size for the causal SNP
true_causal_variant <- 2  # Assume SNP 2 is the causal variant

## Generate and Perform GWAS for Study 1

In this section, we generate the genotype matrix for Study 1, which consists of **100 individuals and 3 variants (SNPs)**. Each individual's genotype is represented by values 0, 1, or 2. The matrix is then standardized, meaning each column (representing a variant) will have a mean of 0 and variance of 1. After that, we simulate the response variable and perform the GWAS analysis by **running OLS regression for each SNP individually**.

Steps:
- Create a random genotype matrix for 100 individuals and 3 variants.
- Standardize the genotype matrix.
- Generate the response variable for 100 individuals.
- Perform OLS regression for each SNP to calculate the estimated effect sizes and standard errors.

In [2]:
# ---- Study 1 (100 individuals) ----
N1 <- 100  # Number of individuals in study 1
M <- 3     # Number of SNPs (variants)

# Create a random genotype matrix (0, 1, 2 values for each SNP)
X_raw1 <- matrix(sample(0:2, N1 * M, replace = TRUE), nrow = N1, ncol = M)

# Adding row and column names for Study 1
rownames(X_raw1) <- paste("Individual", 1:N1)
colnames(X_raw1) <- paste("Variant", 1:M)

# Standardize genotype matrix for Study 1
X1 = scale(X_raw1, scale=TRUE)

# Simulate phenotype with the true effect size on the causal SNP (SNP 2)
y1 <- true_effect_size * X1[, true_causal_variant] + rnorm(N1, mean = 0, sd = 1)

# Perform GWAS-style analysis for Study 1: Test each SNP independently using OLS
p_values1 <- numeric(M)  # Store p-values for Study 1
betas1 <- numeric(M)     # Store estimated effect sizes for Study 1
se1 <- numeric(M)        # Store standard errors for Study 1

for (j in 1:M) {
  SNP <- X1[, j]  # Extract genotype for SNP j
  model <- lm(y1 ~ SNP)  # OLS regression: Trait ~ SNP
  summary_model <- summary(model)
  
  # Store p-value, effect size (coefficient), and standard error
  p_values1[j] <- summary_model$coefficients[2, 4]  # p-value for SNP effect
  betas1[j] <- summary_model$coefficients[2, 1]     # Estimated beta coefficient
  se1[j] <- summary_model$coefficients[2, 2]        # Standard error of beta
}

# Create results table for Study 1
gwas_results1 <- data.frame(Variant = colnames(X1), Beta = betas1, SE = se1, P_Value = p_values1)
gwas_results1

Variant,Beta,SE,P_Value
<chr>,<dbl>,<dbl>,<dbl>
Variant 1,0.1366463,0.1137774,0.232647361
Variant 2,0.4059894,0.107023,0.000257103
Variant 3,-0.1104953,0.1140669,0.335085246


## Generate and Perform GWAS for Study 2
In this section, we generate the genotype matrix for Study 2, which consists of **500 individuals and 3 variants (SNPs)**. Similar to Study 1, the genotype matrix is standardized. The response variable is also simulated, and we perform the GWAS analysis by running OLS regression for each SNP independently to obtain the effect sizes and standard errors.

The steps are the same as Study 1.

In [3]:
# ---- Study 2 (500 individuals) ----
N2 <- 500  # Number of individuals in study 2

# Create a random genotype matrix (0, 1, 2 values for each SNP)
X_raw2 <- matrix(sample(0:2, N2 * M, replace = TRUE), nrow = N2, ncol = M)

# Adding row and column names for Study 2
rownames(X_raw2) <- paste("Individual", 1:N2)
colnames(X_raw2) <- paste("Variant", 1:M)

# Standardize genotype matrix for Study 2
X2 = scale(X_raw2, scale=TRUE)

# Simulate phenotype with the true effect size on the causal SNP (SNP 2)
y2 <- true_effect_size * X2[, true_causal_variant] + rnorm(N2, mean = 0, sd = 1)

# Perform GWAS-style analysis for Study 2: Test each SNP independently using OLS
p_values2 <- numeric(M)  # Store p-values for Study 2
betas2 <- numeric(M)     # Store estimated effect sizes for Study 2
se2 <- numeric(M)        # Store standard errors for Study 2

for (j in 1:M) {
  SNP <- X2[, j]  # Extract genotype for SNP j
  model <- lm(y2 ~ SNP)  # OLS regression: Trait ~ SNP
  summary_model <- summary(model)
  
  # Store p-value, effect size (coefficient), and standard error
  p_values2[j] <- summary_model$coefficients[2, 4]  # p-value for SNP effect
  betas2[j] <- summary_model$coefficients[2, 1]     # Estimated beta coefficient
  se2[j] <- summary_model$coefficients[2, 2]        # Standard error of beta
}

# Create results table for Study 2
gwas_results2 <- data.frame(Variant = colnames(X2), Beta = betas2, SE = se2, P_Value = p_values2)
gwas_results2

Variant,Beta,SE,P_Value
<chr>,<dbl>,<dbl>,<dbl>
Variant 1,0.007480303,0.04750693,0.8749484
Variant 2,0.42885532,0.04344779,4.162621e-21
Variant 3,-0.020721752,0.04749904,0.6628399


## Meta-Analysis

After performing the GWAS in both Study 1 and Study 2, we conduct a **meta-analysis** to combine the results from both studies and obtain a more robust estimate of the true effect size. 

The meta-analysis is performed using **inverse variance weighting**, where the effect size estimates from each study are weighted by the inverse of their squared standard errors. This ensures that studies with more precise estimates (smaller standard errors) contribute more to the combined estimate. The weighted average of the effect sizes from both studies is calculated, providing a summary effect size that incorporates data from both studies. This meta-analysis helps to obtain a more accurate and consistent estimate of the effect of each SNP on the trait across different sample sizes.

In [4]:
# ---- Meta-Analysis (Inverse Variance Weighting) ----

# Calculate weights (inverse of variance)
w1 <- 1 / (se1^2)  # Weights for Study 1
w2 <- 1 / (se2^2)  # Weights for Study 2

# Calculate combined effect size using inverse variance weighting
combined_betas <- (w1 * betas1 + w2 * betas2) / (w1 + w2)

# Calculate variance of the combined effect size
combined_variance <- 1 / (w1 + w2)
combined_se <- sqrt(combined_variance)

# Calculate p-values for each study's effect size
p_value_study1 <- 2 * (1 - pnorm(abs(betas1 / se1)))  # P-value for Study 1
p_value_study2 <- 2 * (1 - pnorm(abs(betas2 / se2)))  # P-value for Study 2

# Calculate p-value for the combined effect size
p_value_combined <- 2 * (1 - pnorm(abs(combined_betas / combined_se)))

# Combine the results
meta_analysis_results <- data.frame(
  Variant = colnames(X1),
  Beta_Study1 = betas1, SE_Study1 = se1, P_Value_Study1 = p_value_study1,
  Beta_Study2 = betas2, SE_Study2 = se2, P_Value_Study2 = p_value_study2,
  Combined_Beta = combined_betas, Combined_SE = combined_se,
  Combined_P_Value = p_value_combined
)

print("Meta-Analysis Results (Inverse Variance Weighting):")
meta_analysis_results


[1] "Meta-Analysis Results (Inverse Variance Weighting):"


Variant,Beta_Study1,SE_Study1,P_Value_Study1,Beta_Study2,SE_Study2,P_Value_Study2,Combined_Beta,Combined_SE,Combined_P_Value
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Variant 1,0.1366463,0.1137774,0.2297526563,0.007480303,0.04750693,0.8748846,0.02665618,0.04383891,0.5431553
Variant 2,0.4059894,0.107023,0.0001485517,0.42885532,0.04344779,0.0,0.42562002,0.0402569,0.0
Variant 3,-0.1104953,0.1140669,0.3327006047,-0.020721752,0.04749904,0.6626508,-0.03398814,0.0438492,0.4382722


The fixed effect model assumes that the true effect size of each variant is the same across all studies. For example, in Variant 2, the estimated effect sizes from both studies (0.4060 from Study 1 and 0.4289 from Study 2) are very similar, suggesting that the true effect size may be constant across studies. By using a fixed effect model, we can combine these estimates into a single, more precise effect size (0.4256 for Variant 2), taking into account the study precisions through inverse variance weighting.

Meta-analysis is essential here because it allows us to combine results from multiple studies, adjusting for differences in sample sizes and precision. Study 2 has more individuals and therefore a more precise estimate of the effect size, so it contributes more to the combined estimate. 

The meta-analysis ensures that the combined effect size is not only a weighted average of the studies' estimates but also accounts for the variability between studies, offering a more reliable and accurate estimate of the true effect size.