# Intuition



Meta-analysis 

**[FIXME!!] should we move this to section 2, to introduce this before FEM and REM? or move 2.2 and 2.3 after this lecture?**

# Notations

> slide 154-156 from Xin He's slides
>
> slide 141-145 from Hailiang Huang's slides
>

**Meta-analysis** is a statistical technique used to combine results from multiple independent studies addressing the same research question. By pooling data across studies, it provides a more precise and generalized estimate of the effect, enhancing statistical power, particularly when individual studies have small sample sizes.

## Key Points:

- **Combining Results**: Aggregates findings from different studies to produce a single summary measure of effect, such as the **odds ratio** ($\hat{\theta}$) or **mean difference** ($\hat{\mu}$).
  
- **Effect Size**: Combines **effect sizes** ($\hat{\beta}_i$) from each study $i$, often weighted by the inverse of the **variance** ($\hat{\sigma}_i^2$) or sample size ($n_i$) of each study. The weighted effect size estimate $\hat{\beta}_{\text{meta}}$ is given by:
  
  $$
  \hat{\beta}_{\text{meta}} = \frac{\sum_i w_i \hat{\beta}_i}{\sum_i w_i}, \quad \text{where} \quad w_i = \frac{1}{\hat{\sigma}_i^2}
  $$

  where $w_i$ represents the weight for each study.

- **Heterogeneity**: Assesses the variation between study results, usually measured with **$I^2$** or the **Q-test**. The **$I^2$ statistic** quantifies the proportion of variation across studies that is due to heterogeneity rather than chance:

  $$
  I^2 = \frac{Q - (k - 1)}{Q} \times 100
  $$

  where $Q$ is the Cochran’s Q statistic, and $k$ is the number of studies.

- **Models**:
  - **Fixed Effects**: Assumes a single true effect size across all studies. The overall effect size is calculated as the weighted average of the individual study effect sizes.
  
  - **Random Effects**: Assumes that each study has its own true effect size, with variability in the effect size between studies. The random-effects model incorporates both within-study and between-study variability:

    $$
    \hat{\beta}_{\text{meta}} = \frac{\sum_i w_i^* \hat{\beta}_i}{\sum_i w_i^*}, \quad w_i^* = \frac{1}{\hat{\sigma}_i^2 + \tau^2}
    $$

    where $\tau^2$ represents the between-study variance.

- **Publication Bias**: Considers potential bias due to the non-publication of null findings. Methods such as **funnel plots** and **Egger’s test** can assess this bias. Funnel plots visualize the relationship between study size and effect size, with asymmetry indicating potential publication bias.

## Applications:

In **GWAS**, meta-analysis combines **summary statistics** ($\hat{\beta}_i$, $\hat{\text{SE}}_i$) from multiple studies to provide a more robust estimate of genetic associations with traits (e.g., height). This process increases power and reliability when individual studies are underpowered.

## Advantages:

- **Increased Power**: By combining data, meta-analysis improves the ability to detect true effects, particularly when individual studies have limited sample sizes.
  
- **Generalizability**: Results from meta-analysis can be generalized across diverse populations and study designs, making them more widely applicable.

- **Precision**: Pooling data reduces random error and provides more precise estimates of effect sizes.

## Challenges:

- **Heterogeneity**: Variations in study designs, populations, or methodologies across studies can lead to heterogeneity, which may introduce bias into the meta-analysis results.
  
- **Publication Bias**: Exclusion of studies with null results, especially those unpublished, can distort meta-analysis findings.


# Example

In [25]:
rm(list=ls())
set.seed(3)

# Simulate true mean and effect size
baseline <- 170  # Population mean of the trait (e.g., height in cm) when the genetic variant has no effect (Model 1)
theta_true <- 2  # True effect size of the genetic variant. This represents the change in height (in cm) associated with each additional minor allele (Model 2)
sd_y <- 1  # Standard deviation of the trait (e.g., variability in height measurement within the population)

# Function to simulate data and generate summary statistics
simulate_data <- function(n, theta_true) {
  genotype <- sample(c(0, 1, 2), size = n, replace = TRUE)  # Randomly assign genotypes (0, 1, 2)
  height_values <- rnorm(n, mean = baseline + theta_true * genotype, sd = sd_y)
  lm_model <- lm(height_values ~ genotype)  # Fit linear model
  # Generate summary statistics
  summary_stats <- data.frame(
    SNP = "rs12345",        # Example SNP identifier
    CHR = 1,                # Example chromosome
    BP = 12345678,          # Example base pair position
    A1 = "A",               # Effect allele (minor allele)
    A2 = "G",               # Other allele (major allele)
    MAF = pmin(mean(genotype) / 2, 1 - mean(genotype) / 2), # Minor allele frequency
    BETA = coef(lm_model)[2],  # Effect size estimate (slope of genotype in the regression model)
    SE = summary(lm_model)$coefficients[2, 2],  # Standard error of BETA
    Z = coef(lm_model)[2] / summary(lm_model)$coefficients[2, 2],  # Z-score
    P_value = summary(lm_model)$coefficients[2, 4],  # P-value for BETA
    N = n                   # Sample size
  )
  return(summary_stats)
}

# Simulate data for three different studies with varying sample sizes
study1_stats <- simulate_data(n = 3, theta_true = 2)
study2_stats <- simulate_data(n = 5, theta_true = 1.8)
study3_stats <- simulate_data(n = 4, theta_true = 2.2)
# Combine summary statistics from the three studies into one dataframe
combined_stats <- rbind(study1_stats, study2_stats, study3_stats)
combined_stats

Unnamed: 0_level_0,SNP,CHR,BP,A1,A2,MAF,BETA,SE,Z,P_value,N
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
genotype,rs12345,1,12345678,A,G,0.5,1.968497,0.796407,2.471723,0.24474534,3
genotype1,rs12345,1,12345678,A,G,0.4,1.441272,0.3646564,3.952411,0.02889836,5
genotype2,rs12345,1,12345678,A,G,0.5,2.387378,0.4039198,5.910524,0.02745191,4


In [26]:
# Perform Meta-Analysis: Random Effects Model
# Weight based on the inverse of the standard error squared
combined_stats$weight <- 1 / (combined_stats$SE^2)

# Meta-analysis estimate of effect size (BETA) using random effects model
meta_BETA <- sum(combined_stats$weight * combined_stats$BETA) / sum(combined_stats$weight)

# Standard error for the meta-analysis effect size
meta_SE <- sqrt(1 / sum(combined_stats$weight))

# Z-score for the meta-analysis
meta_Z <- meta_BETA / meta_SE

# P-value for the meta-analysis using Z-score
meta_P_value <- 2 * (1 - pnorm(abs(meta_Z)))

# Meta-analysis summary
meta_summary <- data.frame(
  Meta_BETA = meta_BETA,
  Meta_SE = meta_SE,
  Meta_Z = meta_Z,
  Meta_P_value = meta_P_value
)

meta_summary


Meta_BETA,Meta_SE,Meta_Z,Meta_P_value
<dbl>,<dbl>,<dbl>,<dbl>
1.876719,0.2562741,7.323093,2.422507e-13
