# Lab 3.1: Hardy-Weinberg Equilibrium Calculator
## Unit 3: Population Genetics

### 🎯 Learning Objectives
- Derive the Hardy-Weinberg equation from first principles
- Calculate allele and genotype frequencies in populations
- Test populations for Hardy-Weinberg equilibrium using χ² test
- Apply H-W principles to human genetic data
- Understand conditions required for equilibrium

### 📖 Connection to Course
This lab covers **Hardy-Weinberg Law** from Unit 3:
- Statement and derivation of H-W equation
- Application to human populations
- Understanding evolutionary forces that upset equilibrium
- Foundation for all population genetics analysis

### 🧬 The Question
**How can we tell if a population is evolving?**  
The Hardy-Weinberg principle provides a null hypothesis for evolution!

In [None]:
# === GOOGLE COLAB SETUP ===
try:
    from google.colab import output
    output.enable_custom_widget_manager()
    print("✓ Widgets enabled")
except:
    print("✓ Running outside Colab")

import numpy as np
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from ipywidgets import *
from IPython.display import display, HTML, clear_output
from datetime import datetime
from scipy import stats

print("✓ Libraries loaded successfully!")

## Part 1: Hardy-Weinberg Theory

### Historical Context

**The Problem (1908):**
Critics of Mendelian genetics argued that dominant alleles would inevitably increase in frequency, eventually eliminating recessive alleles from populations.

**The Solution:**
Independently, **G.H. Hardy** (British mathematician) and **Wilhelm Weinberg** (German physician) showed this was mathematically incorrect.

### The Hardy-Weinberg Principle

**Statement:**
In a large, randomly mating population with no evolutionary forces acting, allele and genotype frequencies remain constant from generation to generation.

**Key Insight:**
Sexual reproduction alone does NOT change allele frequencies!

### Required Conditions (Assumptions)

For H-W equilibrium to hold, ALL of these must be true:

1. **Large population size** - No genetic drift
2. **Random mating** - No inbreeding or mate choice
3. **No mutations** - No new alleles introduced
4. **No gene flow** - No migration in/out
5. **No selection** - All genotypes equally fit

**Reality Check:** These conditions are NEVER perfectly met in nature!

**Why it's useful:** Deviations from H-W tell us evolution is occurring.

### Derivation of H-W Equation

Consider a gene with two alleles: **A** and **a**

**Step 1: Define allele frequencies**
- Frequency of A = **p**
- Frequency of a = **q**
- p + q = 1 (these are the only two alleles)

**Step 2: Random mating**

Create a Punnett square for the population:

```
         Sperm
       A(p)    a(q)
    ┌─────────────────┐
    │              │
E A(p)│  AA(p²)   Aa(pq) │
g    │              │
g    ├─────────────────┤
s    │              │
  a(q)│  Aa(pq)   aa(q²) │
    │              │
    └─────────────────┘
```

**Step 3: Calculate genotype frequencies**

- Frequency of **AA** = p × p = **p²**
- Frequency of **Aa** = pq + pq = **2pq**
- Frequency of **aa** = q × q = **q²**

**Hardy-Weinberg Equation:**

## p² + 2pq + q² = 1

Where:
- **p²** = frequency of homozygous dominant (AA)
- **2pq** = frequency of heterozygotes (Aa)
- **q²** = frequency of homozygous recessive (aa)

**Step 4: Verify frequencies remain constant**

Calculate allele frequencies in next generation:

Frequency of A = p² + ½(2pq) = p² + pq = p(p + q) = p(1) = **p**

Frequency of a = q² + ½(2pq) = q² + pq = q(q + p) = q(1) = **q**

**Result:** Allele frequencies unchanged! ✓

### Key Applications

**1. Calculate carrier frequencies**
For recessive genetic disorders, if we know disease frequency (q²), we can find carriers (2pq).

**2. Detect evolution**
If observed genotype frequencies ≠ H-W predictions, evolution is occurring.

**3. Identify evolutionary forces**
Specific deviations from H-W indicate which forces are acting:
- Excess homozygotes → inbreeding
- Excess heterozygotes → heterozygote advantage
- Different frequencies between generations → selection, drift, etc.

## Part 2: Human Genetic Data

In [None]:
# Human genetic traits database
human_genetics = {
    'Cystic Fibrosis': {
        'allele_system': 'Normal (N) vs CF (n)',
        'disease_freq': 0.0004, 'dominant': False,
        'population': 'European descent',
        'description': 'Recessive disorder affecting lungs and pancreas'
    },
    'Sickle Cell Anemia': {
        'allele_system': 'Normal (Hb^A) vs Sickle (Hb^S)',
        'disease_freq': 0.01, 'dominant': False,
        'population': 'African descent',
        'description': 'Recessive disorder, heterozygotes resistant to malaria'
    },
    'Phenylketonuria (PKU)': {
        'allele_system': 'Normal (P) vs PKU (p)',
        'disease_freq': 0.0001, 'dominant': False,
        'population': 'European descent',
        'description': 'Recessive metabolic disorder'
    },
    'Huntington Disease': {
        'allele_system': 'Normal (h) vs Huntington (H)',
        'disease_freq': 0.00005, 'dominant': True,
        'population': 'European descent',
        'description': 'Dominant neurodegenerative disorder, late onset'
    },
    'Tay-Sachs Disease': {
        'allele_system': 'Normal (T) vs Tay-Sachs (t)',
        'disease_freq': 0.0003, 'dominant': False,
        'population': 'Ashkenazi Jewish',
        'description': 'Recessive lysosomal storage disorder'
    },
    'ABO Blood Type': {
        'allele_system': 'I^A, I^B, i (three alleles)',
        'disease_freq': None, 'dominant': None,
        'population': 'All populations',
        'description': 'Codominant system, multiple alleles'
    },
    'PTC Tasting': {
        'allele_system': 'Taster (T) vs Non-taster (t)',
        'disease_freq': None, 'dominant': True,
        'population': 'All populations',
        'description': 'Ability to taste phenylthiocarbamide, ~70% tasters'
    }
}

print("HUMAN GENETIC TRAITS DATABASE")
print("="*80)
print(f"{'Trait':<25}{'Inheritance':<15}{'Population':<20}")
print("="*80)

for trait, data in human_genetics.items():
    inheritance = 'Dominant' if data['dominant'] else 'Recessive' if data['dominant'] is False else 'Multiple'
    print(f"{trait:<25}{inheritance:<15}{data['population']:<20}")

print("="*80)
print(f"\nTotal traits: {len(human_genetics)}")
print("Covers: Mendelian disorders, blood types, sensory traits")
print("\n✓ Database ready for analysis!")

## Part 3: Hardy-Weinberg Calculator

In [None]:
def hardy_weinberg_calculator(freq_AA, freq_Aa, freq_aa):
    """
    Calculate H-W equilibrium from observed genotype counts
    """
    
    # Total individuals
    total = freq_AA + freq_Aa + freq_aa
    
    if total == 0:
        print("Error: Total count cannot be zero!")
        return
    
    # Observed genotype frequencies
    obs_freq_AA = freq_AA / total
    obs_freq_Aa = freq_Aa / total
    obs_freq_aa = freq_aa / total
    
    # Calculate allele frequencies
    # p = freq(AA) + 0.5*freq(Aa)
    # q = freq(aa) + 0.5*freq(Aa)
    p = obs_freq_AA + 0.5 * obs_freq_Aa
    q = obs_freq_aa + 0.5 * obs_freq_Aa
    
    # Expected genotype frequencies under H-W
    exp_freq_AA = p**2
    exp_freq_Aa = 2 * p * q
    exp_freq_aa = q**2
    
    # Expected counts
    exp_count_AA = exp_freq_AA * total
    exp_count_Aa = exp_freq_Aa * total
    exp_count_aa = exp_freq_aa * total
    
    # Chi-square test
    # χ² = Σ (Observed - Expected)² / Expected
    chi_square = (
        (freq_AA - exp_count_AA)**2 / exp_count_AA +
        (freq_Aa - exp_count_Aa)**2 / exp_count_Aa +
        (freq_aa - exp_count_aa)**2 / exp_count_aa
    )
    
    # Degrees of freedom = # genotypes - # alleles = 3 - 2 = 1
    df = 1
    
    # P-value (probability of getting this χ² by chance)
    p_value = 1 - stats.chi2.cdf(chi_square, df)
    
    # Test result (α = 0.05)
    at_equilibrium = p_value > 0.05
    
    # Heterozygosity
    obs_heterozygosity = obs_freq_Aa
    exp_heterozygosity = 2 * p * q
    
    # Visualization
    fig = make_subplots(
        rows=2, cols=2,
        subplot_titles=('Observed vs Expected Frequencies',
                       'Allele Frequencies',
                       'Chi-Square Test',
                       'Genotype Counts'),
        specs=[[{'type': 'bar'}, {'type': 'bar'}],
               [{'type': 'indicator'}, {'type': 'bar'}]]
    )
    
    # 1. Observed vs Expected frequencies
    genotypes = ['AA', 'Aa', 'aa']
    observed = [obs_freq_AA, obs_freq_Aa, obs_freq_aa]
    expected = [exp_freq_AA, exp_freq_Aa, exp_freq_aa]
    
    fig.add_trace(go.Bar(
        x=genotypes, y=observed, name='Observed',
        marker_color='#3498DB'
    ), row=1, col=1)
    fig.add_trace(go.Bar(
        x=genotypes, y=expected, name='Expected (H-W)',
        marker_color='#E74C3C'
    ), row=1, col=1)
    
    # 2. Allele frequencies
    fig.add_trace(go.Bar(
        x=['A (p)', 'a (q)'],
        y=[p, q],
        marker_color=['#2ECC71', '#F39C12'],
        showlegend=False
    ), row=1, col=2)
    
    # 3. Chi-square result
    fig.add_trace(go.Indicator(
        mode="number",
        value=chi_square,
        title={'text': f"χ² = {chi_square:.3f}<br>p = {p_value:.4f}"},
        number={'valueformat': ".3f"},
        domain={'x': [0, 1], 'y': [0, 1]}
    ), row=2, col=1)
    
    # 4. Genotype counts
    fig.add_trace(go.Bar(
        x=genotypes,
        y=[freq_AA, freq_Aa, freq_aa],
        marker_color=['#9B59B6', '#1ABC9C', '#E67E22'],
        showlegend=False
    ), row=2, col=2)
    
    fig.update_yaxes(title_text="Frequency", row=1, col=1)
    fig.update_yaxes(title_text="Frequency", row=1, col=2)
    fig.update_yaxes(title_text="Count", row=2, col=2)
    
    fig.update_layout(
        height=700,
        title_text='<b>Hardy-Weinberg Analysis</b>',
        showlegend=True
    )
    
    # Print results
    print("\n" + "="*70)
    print("HARDY-WEINBERG EQUILIBRIUM ANALYSIS")
    print("="*70)
    print(f"\nSAMPLE SIZE: {total} individuals")
    print(f"\nOBSERVED GENOTYPE COUNTS:")
    print(f"  AA: {freq_AA} ({obs_freq_AA:.4f})")
    print(f"  Aa: {freq_Aa} ({obs_freq_Aa:.4f})")
    print(f"  aa: {freq_aa} ({obs_freq_aa:.4f})")
    print(f"\nALLELE FREQUENCIES:")
    print(f"  p (frequency of A) = {p:.4f}")
    print(f"  q (frequency of a) = {q:.4f}")
    print(f"  p + q = {p + q:.4f} ✓")
    print(f"\nEXPECTED GENOTYPE FREQUENCIES (H-W):")
    print(f"  AA (p²) = {exp_freq_AA:.4f}")
    print(f"  Aa (2pq) = {exp_freq_Aa:.4f}")
    print(f"  aa (q²) = {exp_freq_aa:.4f}")
    print(f"  Sum = {exp_freq_AA + exp_freq_Aa + exp_freq_aa:.4f} ✓")
    print(f"\nEXPECTED GENOTYPE COUNTS:")
    print(f"  AA: {exp_count_AA:.1f}")
    print(f"  Aa: {exp_count_Aa:.1f}")
    print(f"  aa: {exp_count_aa:.1f}")
    print(f"\nCHI-SQUARE TEST:")
    print(f"  χ² = {chi_square:.3f}")
    print(f"  df = {df}")
    print(f"  p-value = {p_value:.4f}")
    print(f"  Critical value (α=0.05): 3.841")
    print(f"\nRESULT: {'✓ EQUILIBRIUM' if at_equilibrium else '✗ NOT IN EQUILIBRIUM'}")
    if at_equilibrium:
        print(f"  Population IS consistent with H-W equilibrium (p > 0.05)")
    else:
        print(f"  Population DEVIATES from H-W equilibrium (p < 0.05)")
        print(f"  → Evolution is occurring!")
    print(f"\nHETEROZYGOSITY:")
    print(f"  Observed: {obs_heterozygosity:.4f}")
    print(f"  Expected: {exp_heterozygosity:.4f}")
    if obs_heterozygosity > exp_heterozygosity * 1.1:
        print(f"  → Excess heterozygotes (heterozygote advantage?)")
    elif obs_heterozygosity < exp_heterozygosity * 0.9:
        print(f"  → Deficit of heterozygotes (inbreeding? Wahlund effect?)")
    else:
        print(f"  → Heterozygosity as expected")
    print("="*70)
    
    fig.show()

# Create interactive calculator
AA_slider = IntSlider(value=100, min=0, max=500, step=10,
                     description='AA count:')
Aa_slider = IntSlider(value=400, min=0, max=1000, step=10,
                     description='Aa count:')
aa_slider = IntSlider(value=500, min=0, max=500, step=10,
                     description='aa count:')

display(HTML("<h3>🧬 Hardy-Weinberg Calculator</h3>"))
display(HTML("<p>Enter observed genotype counts to test for equilibrium:</p>"))
interact(hardy_weinberg_calculator, freq_AA=AA_slider,
        freq_Aa=Aa_slider, freq_aa=aa_slider);

## Part 4: Human Genetics Analyzer

In [None]:
def disease_frequency_calculator(disease_freq, inheritance='recessive'):
    """
    Calculate carrier frequencies from disease incidence
    """
    
    if inheritance == 'recessive':
        # For recessive: disease frequency = q²
        q_squared = disease_freq
        q = np.sqrt(q_squared)
        p = 1 - q
        
        # Genotype frequencies
        freq_AA = p**2
        freq_Aa = 2 * p * q  # Carriers!
        freq_aa = q**2  # Affected
        
        carrier_ratio = f"1 in {int(1/freq_Aa) if freq_Aa > 0 else 'infinite'}"
        
    else:  # dominant
        # For dominant: disease = p² + 2pq
        # Simplified: If disease rare, disease ≈ 2pq
        # So p ≈ disease_freq / 2
        if disease_freq < 0.01:  # Rare disease assumption
            p = disease_freq / 2
            q = 1 - p
            
            freq_AA = p**2  # Homozygous affected (rare)
            freq_Aa = 2 * p * q  # Heterozygous affected
            freq_aa = q**2  # Unaffected
            
            carrier_ratio = "N/A (carriers are affected)"
        else:
            print("Warning: Dominant allele not rare, simplified calculation may be inaccurate")
            return
    
    # Visualization
    fig = make_subplots(
        rows=1, cols=2,
        subplot_titles=('Genotype Frequencies', 'Population Breakdown'),
        specs=[[{'type': 'bar'}, {'type': 'pie'}]]
    )
    
    # 1. Bar chart
    genotypes = ['Normal<br>Homozygous', 'Carrier/<br>Heterozygous', 'Affected<br>Homozygous']
    frequencies = [freq_AA, freq_Aa, freq_aa]
    colors = ['#2ECC71', '#F39C12', '#E74C3C']
    
    fig.add_trace(go.Bar(
        x=genotypes, y=frequencies,
        marker_color=colors,
        text=[f"{f:.4f}" for f in frequencies],
        textposition='outside'
    ), row=1, col=1)
    
    # 2. Pie chart
    if inheritance == 'recessive':
        labels = ['Normal (AA)', 'Carriers (Aa)', 'Affected (aa)']
    else:
        labels = ['Unaffected (aa)', 'Affected (Aa)', 'Affected (AA)']
    
    fig.add_trace(go.Pie(
        labels=labels,
        values=frequencies,
        marker_colors=colors
    ), row=1, col=2)
    
    fig.update_yaxes(title_text="Frequency", row=1, col=1)
    fig.update_layout(height=400, showlegend=False,
                     title_text=f'<b>{inheritance.title()} Disease Analysis</b>')
    
    # Print results
    print("\n" + "="*70)
    print(f"DISEASE FREQUENCY ANALYSIS ({inheritance.upper()})")
    print("="*70)
    print(f"\nDISEASE INCIDENCE: {disease_freq:.6f} ({disease_freq*100:.4f}%)")
    print(f"  = 1 in {int(1/disease_freq) if disease_freq > 0 else 'infinite'}")
    print(f"\nALLELE FREQUENCIES:")
    if inheritance == 'recessive':
        print(f"  p (normal allele) = {p:.6f}")
        print(f"  q (disease allele) = {q:.6f}")
    else:
        print(f"  p (disease allele) = {p:.6f}")
        print(f"  q (normal allele) = {q:.6f}")
    print(f"\nGENOTYPE FREQUENCIES:")
    if inheritance == 'recessive':
        print(f"  Normal homozygous (AA): {freq_AA:.6f} ({freq_AA*100:.4f}%)")
        print(f"  Carriers (Aa): {freq_Aa:.6f} ({freq_Aa*100:.4f}%)")
        print(f"    → {carrier_ratio}")
        print(f"  Affected (aa): {freq_aa:.6f} ({freq_aa*100:.4f}%)")
        print(f"\nKEY INSIGHT:")
        carrier_to_affected = freq_Aa / freq_aa if freq_aa > 0 else float('inf')
        print(f"  {carrier_to_affected:.1f}× more carriers than affected individuals!")
        print(f"  Most disease alleles hide in carriers (not subject to selection)")
    else:
        print(f"  Unaffected (aa): {freq_aa:.6f} ({freq_aa*100:.4f}%)")
        print(f"  Affected heterozygous (Aa): {freq_Aa:.6f} ({freq_Aa*100:.4f}%)")
        print(f"  Affected homozygous (AA): {freq_AA:.6f} ({freq_AA*100:.4f}%)")
        print(f"\nKEY INSIGHT:")
        print(f"  Dominant alleles always expressed → subject to selection")
        print(f"  Rare dominant diseases must have low fitness impact")
        print(f"  Or maintained by new mutations / late onset (e.g., Huntington's)")
    print("="*70)
    
    fig.show()

# Create interactive calculator
disease_dropdown = Dropdown(
    options=[(name, data['disease_freq']) for name, data in human_genetics.items() 
             if data['disease_freq'] is not None],
    value=0.0004,
    description='Disease:'
)

inheritance_dropdown = Dropdown(
    options=['recessive', 'dominant'],
    value='recessive',
    description='Type:'
)

display(HTML("<h3>💊 Disease Carrier Calculator</h3>"))
display(HTML("<p>Calculate carrier frequencies from disease incidence:</p>"))
interact(disease_frequency_calculator, 
        disease_freq=disease_dropdown,
        inheritance=inheritance_dropdown);

## Part 5: Challenge Problems

### Challenge 1: Test for Equilibrium 🧬

**Scenario:** You survey 1000 individuals for a gene with two alleles

**Observed genotype counts:**
- AA: 100
- Aa: 400
- aa: 500

**Questions:**
1. Calculate allele frequencies (p and q)
2. Calculate expected genotype frequencies under H-W
3. Perform chi-square test
4. Is the population in equilibrium?

<details>
<summary>Solution</summary>

**Step 1: Calculate allele frequencies**

Total individuals = 1000

Frequency of A:
- From AA: 100 individuals contribute 200 A alleles
- From Aa: 400 individuals contribute 400 A alleles
- Total A alleles: 200 + 400 = 600
- Total alleles: 2000 (1000 diploid individuals)
- **p = 600/2000 = 0.30**

Alternative calculation:
- p = freq(AA) + 0.5×freq(Aa)
- p = 100/1000 + 0.5×400/1000 = 0.10 + 0.20 = **0.30** ✓

Frequency of a:
- **q = 1 - p = 1 - 0.30 = 0.70**

**Step 2: Expected genotype frequencies (H-W)**

- AA (p²) = 0.30² = 0.09
- Aa (2pq) = 2 × 0.30 × 0.70 = 0.42
- aa (q²) = 0.70² = 0.49

Check: 0.09 + 0.42 + 0.49 = 1.00 ✓

**Expected counts:**
- AA: 0.09 × 1000 = **90**
- Aa: 0.42 × 1000 = **420**
- aa: 0.49 × 1000 = **490**

**Step 3: Chi-square test**

χ² = Σ (Observed - Expected)² / Expected

χ² = (100-90)²/90 + (400-420)²/420 + (500-490)²/490

χ² = 100/90 + 400/420 + 100/490

χ² = 1.111 + 0.952 + 0.204 = **2.267**

Degrees of freedom = 3 genotypes - 2 alleles = **1**

Critical value (α = 0.05, df = 1) = **3.841**

p-value ≈ 0.132

**Step 4: Conclusion**

χ² = 2.267 < 3.841 (critical value)

p = 0.132 > 0.05

**Result: ACCEPT H-W equilibrium**

The population IS consistent with Hardy-Weinberg equilibrium. The small deviations from expected values can be explained by random sampling error alone.
</details>

### Challenge 2: Sickle Cell Anemia 🩸

**Scenario:** In a West African population, 1% of newborns have sickle cell anemia (genotype Hb^S Hb^S)

**Questions:**
1. What is the frequency of the sickle cell allele (Hb^S)?
2. What percentage of the population are carriers (Hb^A Hb^S)?
3. Why is this allele so common despite causing disease?

<details>
<summary>Solution</summary>

**Given:**
- Disease frequency (Hb^S Hb^S) = 0.01 = 1%

**Step 1: Frequency of Hb^S allele**

Let q = frequency of Hb^S

Disease frequency = q²

0.01 = q²

q = √0.01 = **0.10** (10%)

p = 1 - q = **0.90** (90%)

**Step 2: Carrier frequency**

Carriers (Hb^A Hb^S) = 2pq

= 2 × 0.90 × 0.10

= **0.18 = 18%**

About **1 in 5.5 people** are carriers!

**Ratio check:**
- Carriers: 18%
- Affected: 1%
- Ratio: 18:1 → 18× more carriers than affected!

**Step 3: Why so common?**

**Heterozygote advantage** (overdominance):

In malaria-endemic regions:
- **Hb^A Hb^A**: Normal, but susceptible to malaria
- **Hb^A Hb^S**: Carriers, RESISTANT to malaria! (Higher fitness)
- **Hb^S Hb^S**: Sickle cell disease (lethal without treatment)

**Relative fitness:**
- w(AA) ≈ 0.88 (malaria risk)
- w(AS) ≈ 1.00 (malaria resistant, no disease)
- w(SS) ≈ 0.20 (severe disease)

**Result:** Selection MAINTAINS both alleles!
- Hb^S can't go to fixation (SS lethal)
- Hb^S can't be eliminated (AS has advantage)
- Equilibrium frequency balances benefits vs costs

**Geographic pattern:**
Hb^S frequency correlates with historical malaria prevalence:
- West/Central Africa: 10-20%
- Mediterranean: 5-10%
- Northern Europe: <1% (no malaria)

This is a **classic example of balancing selection**!
</details>

### Challenge 3: Deviations from Equilibrium 🔍

**Question:** For each scenario, predict how genotype frequencies will deviate from H-W expectations:

**Scenario A:** A population practices first-cousin marriage

**Scenario B:** A population shows heterozygote advantage for a particular gene

**Scenario C:** Two isolated populations with different allele frequencies suddenly merge

<details>
<summary>Solution</summary>

**Scenario A: Inbreeding (Consanguinity)**

**Effect:** DEFICIT of heterozygotes, EXCESS of homozygotes

**Why:**
- Inbreeding increases probability of inheriting identical alleles
- Relatives share alleles by descent
- More AA and aa, fewer Aa than H-W predicts

**Example:**
If p = 0.5, q = 0.5

H-W predicts:
- AA: 0.25
- Aa: 0.50
- aa: 0.25

With inbreeding (F = 0.2):
- AA: 0.25 + 0.20(0.25) = **0.30** ↑
- Aa: 0.50 - 0.20(0.50) = **0.40** ↓
- aa: 0.25 + 0.20(0.25) = **0.30** ↑

**Note:** Allele frequencies (p, q) DON'T change!
Only genotype frequencies deviate from H-W.

**Medical significance:**
↑ Homozygosity → ↑ recessive genetic disorders

---

**Scenario B: Heterozygote Advantage**

**Effect:** EXCESS of heterozygotes

**Why:**
- Heterozygotes have highest fitness
- Selection favors Aa over AA and aa
- Maintains BOTH alleles (balancing selection)

**Example: Sickle cell (as above)**

Expected (H-W with p=0.9, q=0.1):
- AA: 0.81
- Aa: 0.18
- aa: 0.01

Observed (with selection):
- AA: ~0.75 ↓
- Aa: ~0.24 ↑ (selected FOR)
- aa: ~0.01 (maintained by mutation)

**Result:**
- Stable polymorphism (both alleles persist)
- Neither goes to fixation
- Equilibrium depends on fitness values

---

**Scenario C: Wahlund Effect (Population Subdivision)**

**Effect:** DEFICIT of heterozygotes

**Why:**
- Treating subdivided populations as one
- Different allele frequencies in subpopulations
- Less interbreeding between subpopulations than expected

**Example:**

Population 1: p₁ = 0.3, q₁ = 0.7 (500 individuals)
Population 2: p₂ = 0.7, q₂ = 0.3 (500 individuals)

If analyzed separately (both in H-W):
- Pop 1: AA=0.09, Aa=0.42, aa=0.49
- Pop 2: AA=0.49, Aa=0.42, aa=0.09

Combined average allele frequency:
- p̄ = (0.3 + 0.7)/2 = 0.5
- q̄ = 0.5

Expected if single population:
- AA: 0.25
- Aa: 0.50
- aa: 0.25

Observed (average of two pops):
- AA: (0.09 + 0.49)/2 = **0.29** ↑
- Aa: (0.42 + 0.42)/2 = **0.42** ↓
- aa: (0.49 + 0.09)/2 = **0.29** ↑

**Deficit of heterozygotes!**

**Lesson:** Always check for population structure before testing H-W!
</details>

## Part 6: Export & Summary

In [None]:
def export_results():
    """
    Export H-W analysis results
    """
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    output_dir = "/content"
    
    # Create sample analysis data
    data = []
    
    # Example 1: Equilibrium population
    data.append({
        'Population': 'Example 1 (Equilibrium)',
        'AA': 100, 'Aa': 400, 'aa': 500,
        'p': 0.30, 'q': 0.70,
        'Expected_AA': 90, 'Expected_Aa': 420, 'Expected_aa': 490,
        'Chi_square': 2.267, 'P_value': 0.132, 'H-W_Status': 'Equilibrium'
    })
    
    # Add human genetic disorders
    for trait, info in human_genetics.items():
        if info['disease_freq'] and not info['dominant']:
            q = np.sqrt(info['disease_freq'])
            p = 1 - q
            carrier_freq = 2 * p * q
            data.append({
                'Population': trait,
                'Disease_Freq': info['disease_freq'],
                'q': q, 'p': p,
                'Carrier_Freq': carrier_freq,
                'Carrier_Ratio': f"1 in {int(1/carrier_freq)}",
                'Description': info['description']
            })
    
    df = pd.DataFrame(data)
    csv_file = f"{output_dir}/lab_3_1_hardy_weinberg_{timestamp}.csv"
    df.to_csv(csv_file, index=False)
    
    print(f"✓ Saved: {csv_file}")
    print(f"\nExported {len(data)} analyses")
    print("\nDownload: 📁 Files → /content → right-click → Download")
    return df

btn = Button(description='📥 Export Results', button_style='success', icon='download')
btn.on_click(lambda b: export_results())

display(HTML("<h3>📤 Export Analysis</h3>"))
display(btn)

## Summary

### Key Insights

✅ **H-W Principle** - Null hypothesis for evolution  
✅ **Equation Derivation** - p² + 2pq + q² = 1  
✅ **5 Conditions** - Large N, random mating, no mutation/migration/selection  
✅ **Chi-square Test** - Statistical test for equilibrium (df=1)  
✅ **Carrier Frequencies** - Calculate from disease incidence  

### Hardy-Weinberg Equation

**Allele frequencies:**
- p + q = 1
- p = frequency of dominant allele (A)
- q = frequency of recessive allele (a)

**Genotype frequencies:**
- AA (homozygous dominant) = p²
- Aa (heterozygous) = 2pq
- aa (homozygous recessive) = q²

**Verify:** p² + 2pq + q² = 1 ✓

### Calculating from Data

**Given genotype counts → Find allele frequencies:**

p = freq(AA) + ½ × freq(Aa)

q = freq(aa) + ½ × freq(Aa)

Or simply: q = 1 - p

**Given disease frequency → Find carriers:**

For recessive disorders:
- q = √(disease frequency)
- Carrier frequency = 2pq

### Chi-Square Test

**Formula:**

χ² = Σ (Observed - Expected)² / Expected

**Degrees of freedom:** # genotypes - # alleles = 3 - 2 = 1

**Critical value** (α=0.05, df=1): 3.841

**Decision:**
- If χ² < 3.841 (p > 0.05): Accept H-W equilibrium
- If χ² > 3.841 (p < 0.05): Reject H-W, evolution occurring

### Common Deviations

**Deficit of heterozygotes:**
- Inbreeding (consanguinity)
- Wahlund effect (population subdivision)
- Null alleles (technical artifact)

**Excess of heterozygotes:**
- Heterozygote advantage (balancing selection)
- Negative assortative mating
- Selection against homozygotes

### Real-World Applications

**Medical genetics:**
- Genetic counseling (carrier risk)
- Disease screening programs
- Population health planning

**Conservation biology:**
- Assess genetic diversity
- Detect inbreeding
- Monitor threatened populations

**Forensics:**
- DNA profile frequencies
- Paternity testing
- Population databases

**Evolution research:**
- Detect natural selection
- Study population structure
- Measure evolutionary forces

### Important Examples

**Sickle Cell (Heterozygote Advantage):**
- Disease: q² = 0.01 (1%)
- Carriers: 2pq = 0.18 (18%!)
- Maintained by malaria resistance

**Cystic Fibrosis:**
- Disease: 1 in 2,500 (European)
- Carriers: 1 in 25
- 500× more carriers than affected!

**Key Insight:** Most recessive alleles hide in heterozygotes

### The Big Picture

**Hardy-Weinberg = Evolutionary Baseline**

Like Newton's First Law for genetics:
- Object at rest stays at rest (no forces)
- Population in equilibrium stays in equilibrium (no evolution)

**Real populations deviate because:**
1. **Finite size** → genetic drift
2. **Non-random mating** → inbreeding, assortative mating
3. **Mutations** → new alleles
4. **Migration** → gene flow
5. **Selection** → differential fitness

**These deviations = EVOLUTION!**

H-W gives us a null hypothesis to test whether evolution is occurring.

### Next Steps

Now that you understand H-W equilibrium, you're ready for:
- **Lab 3.2:** Natural selection (forces that upset H-W)
- **Lab 3.3:** Genetic drift (random sampling effects)
- **Lab 3.4:** Migration & mutation (more evolutionary forces)

**Congratulations!** You've mastered the foundation of population genetics! 🎉