# Intuition

![figure_1](./cartoons/1_1.svg)

![figure](https://www.frontiersin.org/files/Articles/127738/fbioe-03-00013-HTML-r1/image_m/fbioe-03-00013-g001.jpg)
> Figure 1. Common genetic variations. Variations at the (A) nucleotide level and (B) structural level. (C) Single nucleotide polymorphism A/T across a population.
> 
> Cardoso JGR, Andersen MR, Herrgård MJ and Sonnenschein N (2015) Analysis of genetic variation and potential applications in genome-scale metabolic modeling. Front. Bioeng. Biotechnol. 3:13. doi: 10.3389/fbioe.2015.00013

# Notations

## Genotype

### Raw genotype matrix $\mathbf{X}_\text{raw}$

In the lectures, we use $\mathbf{X}_\text{raw}$ to denote a $N$ by $M$ genotype matrix, and for people who is dipolid,  $x_{i,j} \in \{0,1,2\}$ represents the genotype of individual $i=1,...,N$ at variant $j=1,...,M$ (assuming single allelic).

Assume that the genotype matrix is denoted as a $N$ by $M$ matrix, $\mathbf{X}_\text{raw}$, where

\begin{equation*}
\mathbf{X}_\text{raw} =
\begin{bmatrix}
x_{11} & x_{12} & \cdots & x_{1M} \\
x_{21} & x_{22} & \cdots & x_{2M} \\
\vdots & \vdots & \ddots & \vdots \\
x_{N1} & x_{N2} & \cdots & x_{NM}
\end{bmatrix}
\end{equation*}

- Rows ($i = 1, \dots, N$) correspond to individuals.
- Columns ($j = 1, \dots, M$) correspond to variants.
- Each entry $x_{ij}$ represents the genotype of individual $i$ for variant $j$, where:
  - $0$: Homozygous for the reference allele.
  - $1$: Heterozygous.
  - $2$: Homozygous for the alternative allele.

### Standardized genotype matrix $\mathbf{X}$

We often standarize the genotype matrix $\mathbf{X}_\text{raw}$ so that each SNP is mean-centered and the variance is 1. Working with mean-centered genotypes and phenotypes makes life a lot easier, but it means we don’t have to estimate intercept terms.

## MAF

The **minor allele frequency (MAF)** is a fundamental concept in statistical genetics that quantifies the frequency of the less common allele at a given genetic locus in a population.  

Given a genotype matrix $\mathbf{X}_\text{raw},$ as above, the **allele frequency** for variant $j$ is given by the expectation of $\mathbf{X}_{\text{raw},\cdot,j}$ divided by $2$, accounting for human diploidy:  

$$
f_j = \frac{\mathbb{E}[X_{\text{raw},\cdot,j}]}{2} = \frac{1}{2N} \sum_{i=1}^{N} X_{\text{raw},ij}
$$

where $N$ is the total number of individuals.  

The **minor allele frequency (MAF)** is defined as:  

$$
\min(f_j, 1 - f_j)
$$

ensuring that it always represents the frequency of the **less** common allele in the population.


## Hardy-Weinberg Equilibrium (HWE)

**Hardy-Weinberg equilibrium (HWE)** is a principle in population genetics that describes the relationship between allele frequencies and genotype frequencies in a non-evolving population. Under HWE, allele and genotype frequencies in a population remain constant from generation to generation, assuming there is no selection, mutation, migration, genetic drift, or non-random mating.

For a biallelic locus with alleles $A$ (major allele) and $a$ (minor allele), and we observe that the frequencies of them are $f_A$ and $f_a$.

Normally one would expect that HWE holds, i.e., 

$$
P(AA) = f_A^2
$$
$$
P(Aa) = 2f_Af_a
$$
$$
P(aa) = f_a^2
$$

These frequencies must satisfy the equation:

$$
f_A^2 + 2f_A f_a + f_a^2 = 1
$$

In the absence of selection, mutation, genetic drift, or other forces, allele frequencies $f_A$ and $f_a$ are constant between generations, so equilibrium is reached.

**An example of when HWE may not hold**

Selection: If natural or artificial selection is acting on the population, certain genotypes may have a higher or lower fitness, which alters the allele frequencies over time. For example, individuals with the homozygous dominant genotype may have higher survival rates, leading to an increase in the major allele frequency.

# Example

## Genotype and scaled genotype

In [1]:
rm(list=ls())
# Genotype matrix for 5 individuals and 2 variants
# Rows correspond to individuals, columns to variants
N=5
M=3
X_raw <- matrix(c(0, 1, 1, 2, 2, 0, 1, 1, 1, 0, 2, 1, 0, 0, 2), 
                    nrow = N, ncol = M, byrow = TRUE)
# Adding row and column names
rownames(X_raw) <- paste("Individual", 1:N)
colnames(X_raw) <- paste("Variant", 1:M)
X_raw

Unnamed: 0,Variant 1,Variant 2,Variant 3
Individual 1,0,1,1
Individual 2,2,2,0
Individual 3,1,1,1
Individual 4,0,2,1
Individual 5,0,0,2


In [2]:
# standardize genotype matrix
X = scale(X_raw, scale=TRUE)
X

Unnamed: 0,Variant 1,Variant 2,Variant 3
Individual 1,-0.6708204,-0.2390457,0.0
Individual 2,1.5652476,0.9561829,-1.414214
Individual 3,0.4472136,-0.2390457,0.0
Individual 4,-0.6708204,0.9561829,0.0
Individual 5,-0.6708204,-1.4342743,1.414214


## MAF

In [3]:
# MAF Calculation for each variant
MAF <- apply(X_raw, 2, function(x) min(mean(x) / 2, 1 - mean(x) / 2))
# Print MAF
cat("Minor Allele Frequencies (MAF):\n")
for (j in 1:ncol(X_raw)) {
  cat(paste("MAF for Variant", j, ":", MAF[j]), "\n")
}


Minor Allele Frequencies (MAF):


MAF for Variant 1 : 0.3 
MAF for Variant 2 : 0.4 
MAF for Variant 3 : 0.5 


## HWE

> This data is from E. B. Ford (1971) on the scarlet tiger moth, for which the phenotypes of a sample of the population were recorded. ([wiki reference](https://en.wikipedia.org/wiki/Hardy–Weinberg_principle))
>
> **Table 3: Example Hardy–Weinberg Principle Calculation**
>
> | Phenotype          | White-spotted (AA) | Intermediate (Aa) | Little spotting (aa) | Total |
>|--------------------|-------------------|-------------------|----------------------|-------|
>| Number        | 1469              | 138               | 5                    | 1612  |
>
> From this, allele frequencies can be calculated:



In [6]:
N_AA=1469
N_Aa=138
N_aa=5
N_all = N_AA + N_Aa + N_aa
freq_A = (2*N_AA + 1*N_Aa )/ (2*N_all)
freq_A
freq_a = 1-freq_A
freq_a

So the Hardy-Weinberg expectation is

In [7]:
expected_AA = freq_A*freq_A* N_all
expected_Aa = 2*freq_A*freq_a* N_all
expected_aa = freq_a*freq_a* N_all

One can use Pearson's chi-squared test to test if HWE holds:

$$
\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}
$$

where:  
- $O_i$ = Observed genotype count (AA, Aa, aa).  
- $E_i$ = Expected genotype count under Hardy-Weinberg Equilibrium.  
- The summation runs over all genotype categories.  

The expected counts under HWE are:

$$
E_{AA} = f_A^2 \cdot N
$$

$$
E_{Aa} = 2f_A f_a \cdot N
$$

$$
E_{aa} = f_a^2 \cdot N
$$

where:  
- $f_A$: frequency of allele A
- $f_a = 1 - f_A$: frequency of allele a
- $N$ = Total number of individuals.

To determine statistical significance, compare $\chi^2$ with a **chi-square distribution** with **1 degree of freedom** (df = 1). The p-value is computed as:

$$
p = P(\chi^2 > \text{observed } \chi^2)
$$

If $p < 0.05$, we reject the **Hardy-Weinberg Equilibrium** assumption.


In [8]:
chi_sq = (N_AA - expected_AA)^2/expected_AA + (N_Aa - expected_Aa)^2/expected_Aa + (N_aa - expected_aa)^2/expected_aa
chi_sq


There is 1 degree of freedom (degrees of freedom for test for Hardy–Weinberg proportions are number of genotypes minus number of alleles). The 5% significance level for 1 degree of freedom is 3.84, and since the $\chi^2$ value is less than this, the null hypothesis that the population is in Hardy–Weinberg frequencies is not rejected.

One can also use the `chisq.test` to perform the $\chi^2$-test directly.

In [9]:
# Given genotype counts
obs_counts <- c(N_AA, N_Aa, N_aa)  # Observed counts for AA, Aa, aa

# Compute expected genotype counts under HWE
exp_counts <- c(freq_A^2, 2 * freq_A * freq_a, freq_a^2) * N_all

# Perform chi-square test
chi_sq_test <- chisq.test(x = obs_counts, p = exp_counts / N_all, rescale.p = TRUE)

# Print results
# print(chi_sq_test)
chi_sq_test$statistic

"Chi-squared approximation may be incorrect"


# **TODO**
- [ ] how to remove the warning in the message --- is it because data too small? -- something is wrong in the code
- [ ] how to make HWE doesn't hold for variant 2 after selection..
- [ ] MAF estimate and the standard error of the estimation?

# Supplementary

> MAF estimate: slide 40-43 from GW
>
> HWE: slide 45-47 from GW

In [19]:
# Function to calculate the expected genotype frequencies under HWE
HWE_expected <- function(f_A, f_a) {
  c(f_A^2, 2*f_A*f_a, f_a^2)  # P(AA), P(Aa), P(aa)
}

# Calculate allele frequencies
f_A1 <- mean(X_raw[,1] == 0) + 0.5 * mean(X_raw[,1] == 1) # Variant 1
f_a1 <- 1 - f_A1
f_A2 <- mean(X_raw[,2] == 0) + 0.5 * mean(X_raw[,2] == 1) # Variant 2
f_a2 <- 1 - f_A2
f_A3 <- mean(X_raw[,3] == 0) + 0.5 * mean(X_raw[,3] == 1) # Variant 2
f_a3 <- 1 - f_A3

# Expected genotypic frequencies for each variant under HWE
hwe_expected_variant1 <- HWE_expected(f_A1, f_a1)
hwe_expected_variant2 <- HWE_expected(f_A2, f_a2)
hwe_expected_variant3 <- HWE_expected(f_A3, f_a3)
cat("Expected Genotype Frequencies for Variant 1 (HWE):", hwe_expected_variant1, "\n")
cat("Observed Genotype Frequencies for Variant 1:", table(X_raw[,1])/nrow(X_raw), "\n")
cat("Expected Genotype Frequencies for Variant 2 (HWE):", hwe_expected_variant2, "\n")
cat("Observed Genotype Frequencies for Variant 2:", table(X_raw[,2])/nrow(X_raw), "\n")
cat("Expected Genotype Frequencies for Variant 3 (HWE):", hwe_expected_variant3, "\n")
cat("Observed Genotype Frequencies for Variant 3:", table(X_raw[,3])/nrow(X_raw), "\n")


Expected Genotype Frequencies for Variant 1 (HWE): 0.49 0.42 0.09 
Observed Genotype Frequencies for Variant 1: 0.6 0.2 0.2 
Expected Genotype Frequencies for Variant 2 (HWE): 0.16 0.48 0.36 
Observed Genotype Frequencies for Variant 2: 0.2 0.4 0.4 
Expected Genotype Frequencies for Variant 3 (HWE): 0.25 0.5 0.25 
Observed Genotype Frequencies for Variant 3: 0.2 0.6 0.2 


The expected and observed frequencies seem like close to each other, so we can say **HWE roughly holds for both variants**.

More formally one can test with a **chi-squared test** to compare the observed genotype frequencies with the expected frequencies under HWE. 

In [20]:
# Chi-squared test for deviation from Hardy-Weinberg equilibrium for Variant 1
observed_freqs_variant1 <- table(factor(X_raw[,1], levels = 0:2)) / nrow(X_raw)
chisq_test_variant1 <- chisq.test(observed_freqs_variant1, p = hwe_expected_variant1)

# Chi-squared test for deviation from Hardy-Weinberg equilibrium for Variant 2
observed_freqs_variant2 <- table(factor(X_raw[,2], levels = 0:2)) / nrow(X_raw)
chisq_test_variant2 <- chisq.test(observed_freqs_variant2, p = hwe_expected_variant2)

# Chi-squared test for deviation from Hardy-Weinberg equilibrium for Variant 3
observed_freqs_variant3 <- table(factor(X_raw[,3], levels = 0:2)) / nrow(X_raw)
chisq_test_variant3 <- chisq.test(observed_freqs_variant3, p = hwe_expected_variant3)

# Interpretation of results
if (chisq_test_variant1$p.value < 0.05) {
  cat("\nVariant 1: HWE does not hold.\n")
} else {
  cat("\nVariant 1: HWE holds.\n")
}

if (chisq_test_variant2$p.value < 0.05) {
  cat("\nVariant 2: HWE does not hold.\n")
} else {
  cat("\nVariant 2: HWE holds.\n")
}

if (chisq_test_variant3$p.value < 0.05) {
  cat("\nVariant 3: HWE does not hold.\n")
} else {
  cat("\nVariant 3: HWE holds.\n")
}

“Chi-squared approximation may be incorrect”
“Chi-squared approximation may be incorrect”
“Chi-squared approximation may be incorrect”



Variant 1: HWE holds.

Variant 2: HWE holds.

Variant 3: HWE holds.


Now let's assume that **individuals carrying 2 risk alleles (a2) cannot survive after born**, then the population become:

In [135]:
genotypes_selected <- genotypes[genotypes[, 2] != 2, ]

In [136]:
# Calculate allele frequencies
f_A1_selected <- mean(genotypes_selected[,1] == 0) + 0.5 * mean(genotypes_selected[,1] == 1) # Variant 1
f_a1_selected <- 1 - f_A1_selected
f_A2_selected <- mean(genotypes_selected[,2] == 0) + 0.5 * mean(genotypes_selected[,2] == 1) # Variant 2
f_a2_selected <- 1 - f_A2_selected
f_A3_selected <- mean(genotypes_selected[,3] == 0) + 0.5 * mean(genotypes_selected[,3] == 1) # Variant 3
f_a3_selected <- 1 - f_A3_selected


# Expected genotypic frequencies for each variant under HWE
hwe_expected_variant1_selected <- HWE_expected(f_A1_selected, f_a1_selected)
hwe_expected_variant2_selected <- HWE_expected(f_A2_selected, f_a2_selected)
hwe_expected_variant3_selected <- HWE_expected(f_A3_selected, f_a3_selected)
cat("===========After selection===========\n")
cat("Expected Genotype Frequencies for Variant 1 (HWE):", hwe_expected_variant1_selected, "\n")
cat("Observed Genotype Frequencies for Variant 1:", table(genotypes_selected[,1])/nrow(genotypes_selected), "\n")
cat("Expected Genotype Frequencies for Variant 2 (HWE):", hwe_expected_variant2_selected, "\n")
cat("Observed Genotype Frequencies for Variant 2:", table(genotypes_selected[,2])/nrow(genotypes_selected), "\n")
cat("Expected Genotype Frequencies for Variant 3 (HWE):", hwe_expected_variant3_selected, "\n")
cat("Observed Genotype Frequencies for Variant 3:", table(genotypes_selected[,3])/nrow(genotypes_selected), "\n")

Expected Genotype Frequencies for Variant 1 (HWE): 0.6944444 0.2777778 0.02777778 
Observed Genotype Frequencies for Variant 1: 0.6666667 0.3333333 
Expected Genotype Frequencies for Variant 2 (HWE): 0.4444444 0.4444444 0.1111111 
Observed Genotype Frequencies for Variant 2: 0.3333333 0.6666667 
Expected Genotype Frequencies for Variant 3 (HWE): 0.1111111 0.4444444 0.4444444 
Observed Genotype Frequencies for Variant 3: 0.6666667 0.3333333 


Then we can use the **chi-squared test** again

In [137]:
observed_freqs_variant1_selected <- table(factor(genotypes_selected[,1], levels = 0:2))/nrow(genotypes_selected)
observed_freqs_variant2_selected <- table(factor(genotypes_selected[,2], levels = 0:2)) / nrow(genotypes_selected)
observed_freqs_variant3_selected <- table(factor(genotypes_selected[,3], levels = 0:2)) / nrow(genotypes_selected)

# Chi-squared test for Variant 1
chisq_test_v1_selected <- chisq.test(observed_freqs_variant1_selected, p = hwe_expected_variant1_selected)
cat("\nChi-squared p-value for Variant 1 (HWE):", chisq_test_v1_selected$p.value, "\n")

# Chi-squared test for Variant 2
chisq_test_v2_selected <- chisq.test(observed_freqs_variant2_selected, p = hwe_expected_variant2_selected)
cat("Chi-squared p-value for Variant 2 (HWE):", chisq_test_v2_selected$p.value, "\n")

# Chi-squared test for Variant 3
chisq_test_v3_selected <- chisq.test(observed_freqs_variant3_selected, p = hwe_expected_variant3_selected)
cat("Chi-squared p-value for Variant 3 (HWE):", chisq_test_v3_selected$p.value, "\n")

# Interpretation of results
if (chisq_test_v1_selected$p.value < 0.05) {
  cat("\nVariant 1: HWE does not hold after selection.\n")
} else {
  cat("\nVariant 1: HWE holds after selection.\n")
}

if (chisq_test_v2_selected$p.value < 0.05) {
  cat("\nVariant 2: HWE does not hold after selection.\n")
} else {
  cat("\nVariant 2: HWE holds after selection.\n")
}

if (chisq_test_v3_selected$p.value < 0.05) {
  cat("\nVariant 3: HWE does not hold after selection.\n")
} else {
  cat("\nVariant 3: HWE holds after selection.\n")
}

“Chi-squared approximation may be incorrect”



Chi-squared p-value for Variant 1 (HWE): 0.9801987 


“Chi-squared approximation may be incorrect”


Chi-squared p-value for Variant 2 (HWE): 0.8824969 


“Chi-squared approximation may be incorrect”


Chi-squared p-value for Variant 3 (HWE): 0.8824969 

Variant 1: HWE holds after selection.

Variant 2: HWE holds after selection.

Variant 3: HWE holds after selection.


```svg
<svg viewBox="0 0 800 600" xmlns="http://www.w3.org/2000/svg">
  <!-- Background -->
  <rect x="0" y="0" width="800" height="600" fill="#f8f9fa" />
  
  <!-- Title -->
  <text x="400" y="30" font-family="Arial" font-size="22" font-weight="bold" text-anchor="middle">Key Concepts in Statistical Genetics</text>
  
  <!-- Genotype Matrix Section -->
  <rect x="40" y="50" width="340" height="325" fill="#e3f2fd" stroke="#2196f3" stroke-width="2" rx="5" />
  <text x="210" y="65" font-family="Arial" font-size="16" font-weight="bold" text-anchor="middle">Raw Genotype Matrix (Xraw)</text>
  
  <!-- Genotype Matrix -->
  <g transform="translate(70, 90)">
    <!-- Matrix -->
    <rect x="0" y="0" width="280" height="160" fill="white" stroke="#999" stroke-width="1" />
    
    <!-- Column Headers -->
    <text x="30" y="-5" font-family="Arial" font-size="12" text-anchor="middle">SNP₁</text>
    <text x="90" y="-5" font-family="Arial" font-size="12" text-anchor="middle">SNP₂</text>
    <text x="150" y="-5" font-family="Arial" font-size="12" text-anchor="middle">SNP₃</text>
    <text x="200" y="-5" font-family="Arial" font-size="12" text-anchor="middle">...</text>
    <text x="250" y="-5" font-family="Arial" font-size="12" text-anchor="middle">SNPₘ</text>
    
    <!-- Row Headers -->
    <text x="-5" y="25" font-family="Arial" font-size="12" text-anchor="end">Ind₁</text>
    <text x="-5" y="65" font-family="Arial" font-size="12" text-anchor="end">Ind₂</text>
    <text x="-5" y="105" font-family="Arial" font-size="12" text-anchor="end">...</text>
    <text x="-5" y="145" font-family="Arial" font-size="12" text-anchor="end">Indₙ</text>
    
    <!-- Grid Lines -->
    <line x1="0" y1="40" x2="280" y2="40" stroke="#ddd" stroke-width="1" />
    <line x1="0" y1="80" x2="280" y2="80" stroke="#ddd" stroke-width="1" />
    <line x1="0" y1="120" x2="280" y2="120" stroke="#ddd" stroke-width="1" />
    <line x1="60" y1="0" x2="60" y2="160" stroke="#ddd" stroke-width="1" />
    <line x1="120" y1="0" x2="120" y2="160" stroke="#ddd" stroke-width="1" />
    <line x1="180" y1="0" x2="180" y2="160" stroke="#ddd" stroke-width="1" />
    <line x1="220" y1="0" x2="220" y2="160" stroke="#ddd" stroke-width="1" />
    <!-- <line x1="260" y1="0" x2="260" y2="160" stroke="#ddd" stroke-width="1" /> -->
    
    <!-- Matrix Values -->
    <text x="30" y="25" font-family="Arial" font-size="14" text-anchor="middle">0</text>
    <text x="90" y="25" font-family="Arial" font-size="14" text-anchor="middle">1</text>
    <text x="150" y="25" font-family="Arial" font-size="14" text-anchor="middle">2</text>
    <text x="200" y="25" font-family="Arial" font-size="14" text-anchor="middle">...</text>
    <text x="250" y="25" font-family="Arial" font-size="14" text-anchor="middle">0</text>
    
    <text x="30" y="65" font-family="Arial" font-size="14" text-anchor="middle">1</text>
    <text x="90" y="65" font-family="Arial" font-size="14" text-anchor="middle">2</text>
    <text x="150" y="65" font-family="Arial" font-size="14" text-anchor="middle">0</text>
    <text x="200" y="65" font-family="Arial" font-size="14" text-anchor="middle">...</text>
    <text x="250" y="65" font-family="Arial" font-size="14" text-anchor="middle">1</text>
    
    <text x="30" y="105" font-family="Arial" font-size="14" text-anchor="middle">2</text>
    <text x="90" y="105" font-family="Arial" font-size="14" text-anchor="middle">0</text>
    <text x="150" y="105" font-family="Arial" font-size="14" text-anchor="middle">1</text>
    <text x="200" y="105" font-family="Arial" font-size="14" text-anchor="middle">...</text>
    <text x="250" y="105" font-family="Arial" font-size="14" text-anchor="middle">2</text>
    
    <text x="30" y="145" font-family="Arial" font-size="14" text-anchor="middle">0</text>
    <text x="90" y="145" font-family="Arial" font-size="14" text-anchor="middle">2</text>
    <text x="150" y="145" font-family="Arial" font-size="14" text-anchor="middle">1</text>
    <text x="200" y="145" font-family="Arial" font-size="14" text-anchor="middle">...</text>
    <text x="250" y="145" font-family="Arial" font-size="14" text-anchor="middle">0</text>
  </g>
  
  <!-- Genotype Legend -->
  <text x="210" y="275" font-family="Arial" font-size="14" font-weight="bold" text-anchor="middle">Genotype Coding</text>
  <text x="70" y="300" font-family="Arial" font-size="14">0: Homozygous reference (AA)</text>
  <text x="70" y="325" font-family="Arial" font-size="14">1: Heterozygous (Aa)</text>
  <text x="70" y="350" font-family="Arial" font-size="14">2: Homozygous alternative (aa)</text>
  
  <!-- Standardization Section -->
  <rect x="40" y="410" width="340" height="70" fill="#fff3e0" stroke="#ff9800" stroke-width="2" rx="5" />
  <text x="210" y="430" font-family="Arial" font-size="14" font-weight="bold" text-anchor="middle">Standardized Genotype Matrix (X)</text>
  <text x="210" y="450" font-family="Arial" font-size="13" text-anchor="middle">X[,j] = 0</text>
  <text x="210" y="470" font-family="Arial" font-size="13" text-anchor="middle">Var[X(,j)] = 1</text>
  
  <!-- MAF Section -->
  <rect x="420" y="50" width="340" height="180" fill="#e8eaf6" stroke="#3f51b5" stroke-width="2" rx="5" />
  <text x="590" y="75" font-family="Arial" font-size="16" font-weight="bold" text-anchor="middle">Minor Allele Frequency (MAF)</text>
  
  <!-- MAF Explanation -->
  <text x="440" y="100" font-family="Arial" font-size="13">• Frequency of less common allele at a locus</text>
  <text x="440" y="125" font-family="Arial" font-size="13">• For variant j, allele frequency:</text>
  <text x="460" y="150" font-family="Arial" font-size="13">  f_j = E[X_raw(,j) / 2] = Σ (X_raw) / (2N)</text>
  <text x="440" y="175" font-family="Arial" font-size="13">• MAF = min(f_j, 1-f_j)</text>
  
  <!-- MAF Visual -->
  <rect x="450" y="190" width="280" height="30" fill="#d1d9ff" stroke="#3f51b5" stroke-width="1" />
  <rect x="450" y="190" width="112" height="30" fill="#8c9eff" />
  <text x="506" y="210" font-family="Arial" font-size="12" text-anchor="middle" fill="white">f_a = 0.4</text>
  <text x="640" y="210" font-family="Arial" font-size="12" text-anchor="middle">f_A = 0.6</text>
  
  <!-- HWE Section -->
  <rect x="420" y="260" width="340" height="220" fill="#fce4ec" stroke="#e91e63" stroke-width="2" rx="5" />
  <text x="590" y="285" font-family="Arial" font-size="16" font-weight="bold" text-anchor="middle">Hardy-Weinberg Equilibrium (HWE)</text>
  
  <!-- HWE Explanation -->
  <text x="440" y="310" font-family="Arial" font-size="13">• Expected genotype proportions based on allele</text>
  <text x="445" y="330" font-family="Arial" font-size="13">  frequencies when certain conditions are met</text>
  
  <text x="440" y="355" font-family="Arial" font-size="13">• With allele frequencies f_A and f_a = 1-f_A:</text>
  <text x="460" y="380" font-family="Arial" font-size="13">P(AA) = f_A² (genotype 0)</text>
  <text x="460" y="400" font-family="Arial" font-size="13">P(Aa) = 2f_A·f_a (genotype 1)</text>
  <text x="460" y="420" font-family="Arial" font-size="13">P(aa) = f_a² (genotype 2)</text>
  
  <!-- HWE Visual -->
  <rect x="440" y="435" width="300" height="30" fill="white" stroke="#e91e63" stroke-width="1" />
  <rect x="440" y="435" width="144" height="30" fill="#f8bbd0" />
  <rect x="584" y="435" width="96" height="30" fill="#f48fb1" />
  <rect x="680" y="435" width="60" height="30" fill="#f06292" />
  <text x="512" y="455" font-family="Arial" font-size="12" text-anchor="middle">f_A² = 0.36</text>
  <text x="632" y="455" font-family="Arial" font-size="12" text-anchor="middle">2f_A·f_a = 0.48</text>
  <text x="710" y="455" font-family="Arial" font-size="12" text-anchor="middle">f_a² = 0.16</text>

  <!-- Connecting Arrow from Raw to Standardized -->  
  <path d="M 210 380 L 210 405" stroke="#666" stroke-width="2" fill="none" marker-end="url(#arrowhead)" />

  <!-- Connecting Arrow from Raw to MAF -->
  <path d="M 385 160 L 415 160" stroke="#666" stroke-width="2" fill="none" marker-end="url(#arrowhead)" />
  
  <!-- Connecting Arrow from MAF to HWE -->
  <path d="M 590 235 L 590 255" stroke="#666" stroke-width="2" fill="none" marker-end="url(#arrowhead)" />
  
  <!-- Definitions -->
  <defs>
    <marker id="arrowhead" markerWidth="10" markerHeight="7" refX="9" refY="3.5" orient="auto">
      <polygon points="0 0, 10 3.5, 0 7" fill="#666" />
    </marker>
  </defs>
</svg>
```