# Intuition


![figure](./cartoons/2_3.svg)

# Notations

## Random Effect Model for Meta-Analysis

In a **random effect meta-analysis**, we assume that the observed effect size $\hat{\beta}_k$ from study $k$ follows a normal distribution:

$$
\hat{\beta}_k \sim N(\beta_k, s_k^2)
$$

Where:
- $\beta_k$ is the **true effect size** for SNP $j$ in study $k$, which is assumed to vary across studies.
- $s_k^2$ is the **variance** of the observed effect size $\hat{\beta}_k$ for study $k$, reflecting the precision of the estimate in study $k$.
- $\beta_k$ itself follows a normal distribution across studies, with mean $\beta$ and variance $\sigma^2$:

$$
\beta_k \sim N(\beta, \sigma^2)
$$

Thus, we assume that the true effect size $\beta_k$ is random and varies across studies, and the observed effect size $\hat{\beta}_k$ is drawn from this distribution.

## Combined Estimate of Effect Size (Random Effects Model)

In the **random effect model**, the goal of the meta-analysis is to estimate the **true effect size** $\beta$ by taking into account both the within-study variance ($s_k^2$) and the between-study variance ($\sigma^2$). The combined estimate of the effect size is calculated by **weighting** the observed effect sizes $\hat{\beta}_k$ using a modified weight that incorporates both sources of variance:

$$
\hat{\beta} = \frac{\sum_{k=1}^{K} w_k \hat{\beta}_k}{\sum_{k=1}^{K} w_k}
$$

Where:
- $w_k = \frac{1}{s_k^2 + \sigma^2}$ is the weight for study $k$, which accounts for both the precision of the estimate and the between-study variability.

The inclusion of $\sigma^2$ in the weight adjusts for the fact that the true effect size $\beta_k$ may differ across studies, unlike the fixed effect model where the effect size is assumed to be constant across all studies.



# Example

In [1]:
rm(list=ls())
set.seed(22)  # For reproducibility

# ---- Define Parameters ----
N1 <- 100  # Number of individuals in study 1
N2 <- 500  # Number of individuals in study 2
M <- 3     # Number of SNPs (variants)
sigma2 <- 0.2  # Variance of the effect sizes

# ---- Generate True Effect Sizes for Each Variant ----
true_betas1 <- rnorm(M, mean = 0, sd = sigma2)  # True effect sizes for Study 1
true_betas2 <- rnorm(M, mean = 0, sd = sigma2)  # True effect sizes for Study 2

# Set the true effect size of the second variant to be 0.5 for both studies (causal variant)
true_betas1[2] <- rnorm(1, mean = 0.5, sd = sigma2)
true_betas2[2] <- rnorm(1, mean = 0.5, sd = sigma2)

# ---- Study 1 (100 individuals) ----
X_raw1 <- matrix(sample(0:2, N1 * M, replace = TRUE), nrow = N1, ncol = M)
rownames(X_raw1) <- paste("Individual", 1:N1)
colnames(X_raw1) <- paste("Variant", 1:M)

# Standardize genotype matrix for Study 1
X1 = scale(X_raw1, scale=TRUE)

# Simulate the response vector for Study 1 with true effect size for each variant + random noise
y1 <- X1 %*% true_betas1 + rnorm(N1, mean = 0, sd = 1)

# Perform GWAS for Study 1
p_values1 <- numeric(M)
betas1 <- numeric(M)
se1 <- numeric(M)

for (j in 1:M) {
  SNP <- X1[, j]
  model <- lm(y1 ~ SNP)
  summary_model <- summary(model)
  
  p_values1[j] <- summary_model$coefficients[2, 4]  # p-value
  betas1[j] <- summary_model$coefficients[2, 1]     # Estimated effect size
  se1[j] <- summary_model$coefficients[2, 2]        # Standard error
}

gwas_results1 <- data.frame(Variant = colnames(X1), Beta = betas1, SE = se1, P_Value = p_values1)
gwas_results1

Variant,Beta,SE,P_Value
<chr>,<dbl>,<dbl>,<dbl>
Variant 1,0.09196543,0.1143991,0.4234006
Variant 2,0.45309364,0.1052551,3.966895e-05
Variant 3,0.10775096,0.1142584,0.3479765


In [3]:
# ---- Study 2 (500 individuals) ----
X_raw2 <- matrix(sample(0:2, N2 * M, replace = TRUE), nrow = N2, ncol = M)
rownames(X_raw2) <- paste("Individual", 1:N2)
colnames(X_raw2) <- paste("Variant", 1:M)

# Standardize genotype matrix for Study 2
X2 = scale(X_raw2, scale=TRUE)

# Simulate the response vector for Study 2 with true effect size for each variant + random noise
y2 <- X2 %*% true_betas2 + rnorm(N2, mean = 0, sd = 1)

# Perform GWAS for Study 2
p_values2 <- numeric(M)
betas2 <- numeric(M)
se2 <- numeric(M)

for (j in 1:M) {
  SNP <- X2[, j]
  model <- lm(y2 ~ SNP)
  summary_model <- summary(model)
  
  p_values2[j] <- summary_model$coefficients[2, 4]  # p-value
  betas2[j] <- summary_model$coefficients[2, 1]     # Estimated effect size
  se2[j] <- summary_model$coefficients[2, 2]        # Standard error
}

gwas_results2 <- data.frame(Variant = colnames(X2), Beta = betas2, SE = se2, P_Value = p_values2)
gwas_results2


Variant,Beta,SE,P_Value
<chr>,<dbl>,<dbl>,<dbl>
Variant 1,0.1120101,0.05256348,0.03358244
Variant 2,0.5117323,0.04756332,2.04615e-24
Variant 3,0.3663971,0.05018506,1.138896e-12


In [4]:
# ---- Meta-Analysis (Random Effect Model) ----
# Calculate weights (inverse of variance)
w1 <- 1 / (se1^2 + 0.2)  # Include between-study variance (0.2 here as an example)
w2 <- 1 / (se2^2 + 0.2)

# Calculate combined effect size using random effect model (inverse variance weighting)
combined_betas_random <- (w1 * betas1 + w2 * betas2) / (w1 + w2)

# Calculate variance of the combined effect size
combined_variance_random <- 1 / (w1 + w2)
combined_se_random <- sqrt(combined_variance_random)

# Calculate p-values for each study's effect size
p_value_study1 <- 2 * (1 - pnorm(abs(betas1 / se1)))  # P-value for Study 1
p_value_study2 <- 2 * (1 - pnorm(abs(betas2 / se2)))  # P-value for Study 2

# Calculate p-value for the combined effect size
p_value_random <- 2 * (1 - pnorm(abs(combined_betas_random / combined_se_random)))

# Combine the results
meta_analysis_results_random <- data.frame(
  Variant = colnames(X1),
  Beta_Study1 = betas1, SE_Study1 = se1, P_Value_Study1 = p_value_study1,
  Beta_Study2 = betas2, SE_Study2 = se2, P_Value_Study2 = p_value_study2,
  Combined_Beta = combined_betas_random, Combined_SE = combined_se_random,
  Combined_P_Value = p_value_random
)

print("Meta-Analysis Results (Random Effect Model):")
meta_analysis_results_random


[1] "Meta-Analysis Results (Random Effect Model):"


Variant,Beta_Study1,SE_Study1,P_Value_Study1,Beta_Study2,SE_Study2,P_Value_Study2,Combined_Beta,Combined_SE,Combined_P_Value
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
Variant 1,0.09196543,0.1143991,0.4214549,0.1120101,0.05256348,0.03309326,0.1022366,0.3223328,0.7511092
Variant 2,0.45309364,0.1052551,1.671966e-05,0.5117323,0.04756332,0.0,0.4830383,0.3213848,0.1328415
Variant 3,0.10775096,0.1142584,0.3456574,0.3663971,0.05018506,2.857714e-13,0.2403529,0.3222213,0.4557129
