# Intuition

![figure](./cartoons/2_4.svg)

# Notations

## Percentage of Variance Explained 

In the context of a single marker random effect model, the percentage of variance explained (PVE) relates to how much of the phenotypic variance is accounted for by the genetic variant. 

Recall:

$$
\mathbf{Y} = \mathbf{X} \boldsymbol{\beta}  + \boldsymbol{\epsilon}, \quad \boldsymbol{\epsilon} \sim N(0, \sigma^2\mathbf{I})
$$


Where:
- $\mathbf{Y}$ is the $N \times 1$ vector of trait values for $N$ individuals
- $\mathbf{X}$ is the $N \times M$ vector of genotypes for $M$ variants across $N$ individuals
- $\boldsymbol{\beta}$ is the $M \times 1$ vector of effect size for $M$ genetic variants
- $\boldsymbol{\epsilon}$ is the $N \times 1$ vector of error terms for $N$ individuals

This equation is often used for estimating the proportion of phenotypic variance explained by all measured SNPs in GWAS, $\frac{\operatorname{Var}(\mathbf{X} \boldsymbol{\beta})}{\operatorname{Var}(\mathbf{Y})}$; where $\operatorname{Var}$ denotes sample variance (or $\frac{\sigma_{\mathbf{X} \boldsymbol{\beta}}^2}{\sigma_\mathbf{Y}^2}$). This quantity is commonly referred to as the **proportion of variance in phenotypes explained (PVE)** by available genotypes or **SNP heritability** under the additive model, denoted as $h^2_g$.


## Heritability

- Reference Resource:
> P54 in Tian Ge's slides:

We can decompose a trait into contributions from **genetic** and **environmental** factors:

$$
\mathbf{y} = \mathbf{g} + \mathbf{e}
$$

where:
- $\mathbf{g} = \mathbf{X} \boldsymbol{\beta}$ represents genetic effects.
- $\mathbf{e}$ encompasses all non-genetic (environmental) factors.

### Genetic Component Decomposition
The genetic component $\mathbf{g}$ can be further partitioned based on the nature of the genetic variants:

$$
\mathbf{g} = \mathbf{g}_a + \mathbf{g}_d + \mathbf{g}_i
$$

where:
- $\mathbf{g}_a$ represents **additive genetic effects**.
- $\mathbf{g}_d$ represents **dominance effects**.
- $\mathbf{g}_i$ represents **gene-gene interactions** (**epistasis**).

### Environmental Component Decomposition
Similarly, the environmental component $\mathbf{e}$ can be decomposed into:

$$
\mathbf{e} = \mathbf{e}_c + \mathbf{e}_u
$$

where:
- $\mathbf{e}_c$ represents **common (shared) environmental factors**.
- $\mathbf{e}_u$ represents **unique (independent) environmental factors**.

### Phenotypic Variance Decomposition
Thus, the total phenotypic variance is given by:

\begin{align*}
\sigma_{\mathbf{y}}^2 &= \sigma_{\mathbf{g}}^2 + \sigma_{\mathbf{e}}^2 \\
&= \left( \sigma_{\mathbf{g}_a}^2 + \sigma_{\mathbf{g}_d}^2 + \sigma_{\mathbf{g}_i}^2 \right) + \left( \sigma_{\mathbf{e}_c}^2 + \sigma_{\mathbf{e}_u}^2 \right)
\end{align*}

### Heritability Measures

#### Broad-Sense Heritability ($H^2$)

$$
H^2 = \frac{\sigma_{\mathbf{g}}^2}{\sigma_{\mathbf{y}}^2}
$$

- Represents the proportion of **total phenotypic variance** explained by **all genetic effects** (both additive and non-additive).

#### Narrow-Sense Heritability ($h^2$)

$$
 h^2 = \frac{\sigma_{\mathbf{g}_a}^2}{\sigma_{\mathbf{y}}^2}
$$

- Represents the proportion of **phenotypic variance** explained **only by additive genetic effects**.
- Equals the sum of additive effects of each phenotype-contributing allele.

This decomposition helps distinguish different genetic contributions to a trait and informs genetic studies, especially in quantitative genetics and heritability estimation.



# Example

The total heritability is set to 55%.
- The first two variants ($X_1, X_2$) contribute to additive genetic variance.
- The third variant ($X_3$) contributes to dominant genetic variance.
- The contributions are proportionally divided based on their predefined percentages.

First, we simulate genotype data for 100 individuals and 3 SNPs. Each genotype is randomly assigned as 0 (homozygous major), 1 (heterozygous), or 2 (homozygous minor). The genotype matrix is scaled for all three variants.


In [1]:
rm(list=ls())
set.seed(24)  # For reproducibility

# ---- Define True Effect Sizes ----
h2_total <- 0.55  # Total heritability
h2_additive <- c(0.10, 0.20) * h2_total  # Contributions from first two variants (additive)
h2_dominant <- 0.25 * h2_total  # Contribution from third variant (dominant)

# Number of individuals and SNPs
N <- 100   # Number of individuals
M <- 3     # Number of SNPs (variants)

# Create a random genotype matrix (0, 1, 2 values for each SNP)
X_raw <- matrix(sample(0:2, N * M, replace = TRUE), nrow = N, ncol = M)

# Add row and column names 
rownames(X_raw) <- paste("Individual", 1:N)
colnames(X_raw) <- paste("Variant", 1:M)

# Standardize genotype matrix for additive effects (mean 0, variance 1)
X <- scale(X_raw, scale = TRUE)

# Convert third variant (X3) to dominant encoding (0 for homozygous major, 1 for heterozygous or homozygous minor)
X_dominant <- ifelse(X_raw[,3] > 0, 1, 0)  
X_dominant <- scale(X_dominant, scale = TRUE)  # Standardize

Computes effect sizes for each genetic factor based on predefined heritability contributions.

Constructs the genetic component ($\mathbf{g}$):
- First two variants ($X_1, X_2$) contribute additively.
- The third variant ($X_3$) contributes dominantly.

Then we add environmental noise to account for non-genetic factors, and standardizes the phenotype ($\mathbf{y}$) so it has mean 0 and variance 1.



In [2]:
# Compute effect sizes based on heritability contributions
beta_additive <- sqrt(h2_additive)
beta_dominant <- sqrt(h2_dominant)

# Generate phenotype
g <- X[, 1:2] %*% beta_additive + X_dominant * beta_dominant  # Genetic component
e <- rnorm(N, mean = 0, sd = sqrt(1 - h2_total))  # Environmental noise

y <- g + e  # Total phenotype

# Standardize phenotype (mean 0, variance 1)
y <- scale(y, scale = TRUE)


Now let's fit a linear regression model to estimate:

- Additive genetic variance using only $X_1$ and $X_2$.
- Total genetic variance (additive + dominant) by including $X_3$.

In [3]:
# ---- Estimate Heritability ----

# Fit linear models
model_additive <- lm(y ~ X[,1] + X[,2])  # Only additive effects
model_full <- lm(y ~ X[,1] + X[,2] + X_dominant)  # Additive + dominant effects

# Compute variance components
var_additive <- sum(summary(model_additive)$coef[-1,1]^2 * apply(X[,1:2], 2, var))
var_total <- var(as.vector(y))

var_genetic <- sum(summary(model_full)$coef[-1,1]^2 * c(apply(X[,1:2], 2, var), var(X_dominant)))


- Narrow-sense heritability ($h^2$): Calculates variance explained by additive effects ($X_1$ and $X_2$).
- Broad-sense heritability ($H^2$): Includes both additive and dominant effects ($X_1, X_2$ and $X_3$).

In [4]:
# Narrow-sense heritability (h^2) - Only additive variance
h2_narrow <- var_additive / var_total

# Broad-sense heritability (H^2) - Includes both additive and dominant variance
h2_broad <- var_genetic / var_total

# Print results
cat("Estimated Narrow-Sense Heritability (h^2):", h2_narrow, "\n")
cat("Estimated Broad-Sense Heritability (H^2):", h2_broad, "\n")

Estimated Narrow-Sense Heritability (h^2): 0.2318079 
Estimated Broad-Sense Heritability (H^2): 0.4598361 


# Supplementary

- Zhu H, Zhou X. Statistical methods for SNP heritability estimation and partition: A review. Comput Struct Biotechnol J. 2020 Jun 18;18:1557-1568. doi: 10.1016/j.csbj.2020.06.011. PMID: 32637052; PMCID: PMC7330487.