# Lab 4.1: Speciation Simulator
## Unit 4: Speciation & Human Evolution

### üéØ Learning Objectives
- Model allopatric speciation via geographic isolation
- Calculate genetic divergence (F_ST)
- Measure reproductive isolation
- Estimate time to speciation
- Analyze real speciation examples

### üìñ Connection to Course
Covers **Speciation Modes** from Unit 4: Allopatric, sympatric, and parapatric mechanisms

### üåç The Big Question
**How do new species form?** Let's simulate and quantify the process!

In [None]:
# === GOOGLE COLAB SETUP ===
try:
    from google.colab import output
    output.enable_custom_widget_manager()
    print("‚úì Widgets enabled")
except:
    print("‚úì Running outside Colab")

import numpy as np
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from ipywidgets import *
from IPython.display import display, HTML
from datetime import datetime

np.random.seed(42)
print("‚úì Libraries loaded!")

## Part 1: Speciation Theory

### Biological Species Concept
Species = groups of interbreeding populations **reproductively isolated** from others

### Three Modes

**1. Allopatric** (Geographic isolation) - Most common
- Barrier forms ‚Üí populations isolated ‚Üí diverge ‚Üí reproductive isolation
- Examples: Darwin's finches, Grand Canyon squirrels

**2. Sympatric** (Same location) - Rare but dramatic
- Polyploidy OR disruptive selection + assortative mating
- Examples: Lake Victoria cichlids (500+ species!)

**3. Parapatric** (Environmental gradient)
- Selection > gene flow along cline
- Examples: Mine-tolerant grasses

### Genetic Divergence (F_ST)
## F_ST = (p‚ÇÅ - p‚ÇÇ)¬≤ / [2pÃÑ(1-pÃÑ)]
- 0: No differentiation | 0.15: Moderate | 1.0: Complete

In [None]:
# Speciation examples database
speciation_examples = {
    "Darwin's Finches": {
        'mode': 'Allopatric',
        'location': 'Gal√°pagos Islands',
        'species_count': 13,
        'time_mya': 2.3,
        'key_trait': 'Beak morphology',
        'mechanism': 'Island isolation + adaptive radiation'
    },
    'Lake Victoria Cichlids': {
        'mode': 'Sympatric',
        'location': 'Lake Victoria, Africa',
        'species_count': 500,
        'time_mya': 0.015,  # 15,000 years!
        'key_trait': 'Coloration, feeding',
        'mechanism': 'Disruptive selection + assortative mating'
    },
    'Hawaiian Drosophila': {
        'mode': 'Allopatric',
        'location': 'Hawaiian Islands',
        'species_count': 800,
        'time_mya': 5,
        'key_trait': 'Morphology, behavior',
        'mechanism': 'Island hopping + founder effects'
    },
    'Rhagoletis Flies': {
        'mode': 'Sympatric',
        'location': 'North America',
        'species_count': 2,
        'time_mya': 0.00016,  # 160 years!
        'key_trait': 'Host plant preference',
        'mechanism': 'Host shift (apple vs hawthorn)'
    },
    'Wheat (Triticum)': {
        'mode': 'Sympatric (Polyploidy)',
        'location': 'Middle East',
        'species_count': 3,
        'time_mya': 0.01,
        'key_trait': 'Chromosome number',
        'mechanism': 'Allopolyploidy (2n‚Üí4n‚Üí6n)'
    }
}

print("SPECIATION EXAMPLES")
print("="*70)
for name, data in speciation_examples.items():
    print(f"{name}: {data['species_count']} species ({data['mode']})")
print("\n‚úì Database ready!")

## Part 2: Allopatric Speciation Simulator

In [None]:
def allopatric_speciation(pop_size, generations, mutation_rate, initial_p):
    """
    Simulate allopatric speciation with genetic divergence
    """
    # Two isolated populations
    p1, p2 = initial_p, initial_p
    
    traj_p1, traj_p2, traj_fst = [p1], [p2], []
    
    for gen in range(generations):
        # Independent evolution (drift + mutation)
        # Population 1
        num_A1 = np.random.binomial(2*pop_size, p1)
        p1 = num_A1 / (2*pop_size)
        p1 += np.random.normal(0, mutation_rate)  # Mutation
        p1 = np.clip(p1, 0, 1)
        
        # Population 2
        num_A2 = np.random.binomial(2*pop_size, p2)
        p2 = num_A2 / (2*pop_size)
        p2 += np.random.normal(0, mutation_rate)
        p2 = np.clip(p2, 0, 1)
        
        traj_p1.append(p1)
        traj_p2.append(p2)
        
        # Calculate F_ST
        p_avg = (p1 + p2) / 2
        if p_avg > 0 and p_avg < 1:
            fst = (p1 - p2)**2 / (2 * p_avg * (1 - p_avg))
            traj_fst.append(fst)
        else:
            traj_fst.append(0)
    
    # Reproductive isolation estimate (based on F_ST)
    final_fst = traj_fst[-1] if traj_fst else 0
    repro_isolation = min(final_fst * 100, 100)  # Percentage
    
    # Visualization
    fig = make_subplots(
        rows=2, cols=2,
        subplot_titles=('Allele Frequency Divergence', 'Genetic Divergence (F_ST)',
                       'Final Allele Frequencies', 'Reproductive Isolation'),
        specs=[[{'type': 'scatter'}, {'type': 'scatter'}],
               [{'type': 'bar'}, {'type': 'indicator'}]]
    )
    
    gens = list(range(len(traj_p1)))
    
    # 1. Divergence
    fig.add_trace(go.Scatter(x=gens, y=traj_p1, name='Pop 1',
                            line=dict(color='#3498DB', width=3)), row=1, col=1)
    fig.add_trace(go.Scatter(x=gens, y=traj_p2, name='Pop 2',
                            line=dict(color='#E74C3C', width=3)), row=1, col=1)
    
    # 2. F_ST
    fig.add_trace(go.Scatter(x=list(range(1, len(traj_fst)+1)), y=traj_fst,
                            line=dict(color='#2ECC71', width=3),
                            showlegend=False), row=1, col=2)
    
    # 3. Final frequencies
    fig.add_trace(go.Bar(x=['Pop 1', 'Pop 2'], y=[traj_p1[-1], traj_p2[-1]],
                        marker_color=['#3498DB', '#E74C3C'],
                        showlegend=False), row=2, col=1)
    
    # 4. Isolation indicator
    fig.add_trace(go.Indicator(
        mode="gauge+number",
        value=repro_isolation,
        title={'text': "Isolation %"},
        gauge={'axis': {'range': [0, 100]},
               'bar': {'color': '#2ECC71'},
               'threshold': {'line': {'color': 'red', 'width': 4},
                           'thickness': 0.75, 'value': 90}}), row=2, col=2)
    
    fig.update_xaxes(title_text="Generation", row=1, col=1)
    fig.update_xaxes(title_text="Generation", row=1, col=2)
    fig.update_yaxes(title_text="Allele freq (p)", range=[0,1], row=1, col=1)
    fig.update_yaxes(title_text="F_ST", row=1, col=2)
    fig.update_yaxes(title_text="Final p", row=2, col=1)
    
    fig.update_layout(height=700, title_text='<b>Allopatric Speciation Simulation</b>')
    
    # Results
    print("\n" + "="*70)
    print("ALLOPATRIC SPECIATION RESULTS")
    print("="*70)
    print(f"\nINITIAL: Both populations p = {initial_p:.3f}")
    print(f"FINAL: Pop 1 p = {traj_p1[-1]:.3f}, Pop 2 p = {traj_p2[-1]:.3f}")
    print(f"DIVERGENCE: Œîp = {abs(traj_p1[-1] - traj_p2[-1]):.3f}")
    print(f"\nF_ST: {final_fst:.3f}")
    if final_fst < 0.05:
        print("  ‚Üí Slight differentiation")
    elif final_fst < 0.15:
        print("  ‚Üí Moderate differentiation")
    elif final_fst < 0.25:
        print("  ‚Üí High differentiation")
    else:
        print("  ‚Üí Very high differentiation (approaching species level!)")
    print(f"\nREPRODUCTIVE ISOLATION: {repro_isolation:.1f}%")
    if repro_isolation > 80:
        print("  ‚Üí LIKELY NEW SPECIES! Strong reproductive barriers expected.")
    elif repro_isolation > 50:
        print("  ‚Üí Subspecies level, partial reproductive isolation")
    else:
        print("  ‚Üí Still one species, populations can interbreed")
    print("="*70)
    
    fig.show()

# Interactive
N_slider = IntSlider(value=100, min=50, max=500, step=50, description='Pop size:')
gen_slider = IntSlider(value=200, min=100, max=1000, step=100, description='Generations:')
mu_slider = FloatSlider(value=0.001, min=0, max=0.01, step=0.001, description='Mutation:')
p_slider = FloatSlider(value=0.5, min=0.1, max=0.9, step=0.1, description='Initial p:')

display(HTML("<h3>üèîÔ∏è Allopatric Speciation Simulator</h3>"))
interact(allopatric_speciation, pop_size=N_slider, generations=gen_slider,
        mutation_rate=mu_slider, initial_p=p_slider);

## Part 3: Genetic Divergence Calculator

In [None]:
def calculate_fst(p1, p2):
    """
    Calculate F_ST between two populations
    """
    p_avg = (p1 + p2) / 2
    
    if p_avg == 0 or p_avg == 1:
        print("F_ST undefined (fixed alleles)")
        return
    
    # F_ST formula
    numerator = (p1 - p2)**2
    denominator = 2 * p_avg * (1 - p_avg)
    fst = numerator / denominator
    
    # Heterozygosity
    H_S = (2*p1*(1-p1) + 2*p2*(1-p2)) / 2
    H_T = 2 * p_avg * (1 - p_avg)
    
    # Alternative F_ST
    fst_alt = (H_T - H_S) / H_T if H_T > 0 else 0
    
    # Visualization
    fig = go.Figure()
    fig.add_trace(go.Indicator(
        mode="gauge+number+delta",
        value=fst,
        title={'text': "F_ST"},
        delta={'reference': 0.15},
        gauge={
            'axis': {'range': [0, 1]},
            'bar': {'color': '#2ECC71'},
            'steps': [
                {'range': [0, 0.05], 'color': 'lightgray'},
                {'range': [0.05, 0.15], 'color': '#F39C12'},
                {'range': [0.15, 0.25], 'color': '#E74C3C'},
                {'range': [0.25, 1], 'color': '#C0392B'}],
            'threshold': {
                'line': {'color': 'red', 'width': 4},
                'thickness': 0.75,
                'value': 0.25}}))
    
    fig.update_layout(height=400, title_text='<b>Genetic Divergence</b>')
    
    print("\n" + "="*70)
    print("F_ST CALCULATION")
    print("="*70)
    print(f"\nPopulation 1: p‚ÇÅ = {p1:.3f}")
    print(f"Population 2: p‚ÇÇ = {p2:.3f}")
    print(f"Average: pÃÑ = {p_avg:.3f}")
    print(f"\nF_ST = (p‚ÇÅ - p‚ÇÇ)¬≤ / [2pÃÑ(1-pÃÑ)]")
    print(f"     = ({p1:.3f} - {p2:.3f})¬≤ / [2√ó{p_avg:.3f}√ó{1-p_avg:.3f}]")
    print(f"     = {numerator:.6f} / {denominator:.6f}")
    print(f"     = {fst:.4f}")
    print(f"\nAlternative: F_ST = (H_T - H_S) / H_T = {fst_alt:.4f}")
    print(f"\nINTERPRETATION:")
    if fst < 0.05:
        print("  ‚Üí Little differentiation (likely one species)")
    elif fst < 0.15:
        print("  ‚Üí Moderate differentiation (subspecies level)")
    elif fst < 0.25:
        print("  ‚Üí High differentiation (incipient species)")
    else:
        print("  ‚Üí Very high differentiation (likely distinct species)")
    print("="*70)
    
    fig.show()

# Interactive
p1_slider = FloatSlider(value=0.3, min=0, max=1, step=0.05, description='Pop 1 (p‚ÇÅ):')
p2_slider = FloatSlider(value=0.7, min=0, max=1, step=0.05, description='Pop 2 (p‚ÇÇ):')

display(HTML("<h3>üìä F_ST Calculator</h3>"))
interact(calculate_fst, p1=p1_slider, p2=p2_slider);

## Part 4: Challenge Problems

### Challenge 1: Darwin's Finches üê¶

**Given:**
- 13 finch species evolved from one ancestor
- Time: ~2.3 million years
- Different islands provided isolation

**Questions:**
1. What type of speciation is this?
2. Calculate average time per speciation event
3. Why did speciation happen so rapidly?

<details>
<summary>Solution</summary>

**1. Type:** ALLOPATRIC + ADAPTIVE RADIATION

**Mechanism:**
- Geographic isolation (different islands)
- Different selective pressures (seeds, insects, etc.)
- Rapid divergence in beak morphology

**2. Time per event:**
13 species from 1 ancestor = 12 speciation events
Time = 2.3 Mya / 12 = **~192,000 years per speciation**

**3. Why rapid?**
- **Small founding populations** (strong drift)
- **Empty niches** (no competition)
- **Strong selection** (food sources)
- **Geographic isolation** (prevents gene flow)
- **Adaptive radiation** (burst of diversification)

This is MUCH faster than typical vertebrate speciation!
</details>

### Challenge 2: Calculate F_ST üìê

**Scenario:** Two isolated mouse populations
- Population A: p = 0.2
- Population B: p = 0.8

**Questions:**
1. Calculate F_ST
2. Are these likely separate species?
3. How much gene flow would prevent this divergence?

<details>
<summary>Solution</summary>

**1. Calculate F_ST:**

pÃÑ = (0.2 + 0.8) / 2 = 0.5

F_ST = (p‚ÇÅ - p‚ÇÇ)¬≤ / [2pÃÑ(1-pÃÑ)]
     = (0.2 - 0.8)¬≤ / [2(0.5)(0.5)]
     = 0.36 / 0.50
     = **0.72**

**2. Separate species?**

F_ST = 0.72 ‚Üí **VERY HIGH differentiation**

YES, likely separate species!
- Populations are highly differentiated
- Probably reproductively isolated
- Would need additional data (behavior, morphology)

**3. Gene flow needed:**

To prevent divergence: **Nm > 1** (classic rule)

Where:
- N = effective population size
- m = migration rate

Even **1 migrant per generation** is enough to prevent fixation of different alleles!

For these populations with F_ST = 0.72:
- Current: Nm ‚âà 0 (no gene flow)
- Needed: m > 1/N

If N = 100, need m > 0.01 (1% migration) to prevent divergence
</details>

### Challenge 3: Polyploid Speciation üå±

**Scenario:** Wheat evolution via polyploidy
- Diploid ancestor (2n = 14)
- Tetraploid wheat (4n = 28)
- Hexaploid bread wheat (6n = 42)

**Questions:**
1. Can 4n wheat breed with 2n ancestor?
2. Why is polyploidy instant speciation?
3. Why is this common in plants but rare in animals?

<details>
<summary>Solution</summary>

**1. Can they breed?**

**NO!** Reproductive isolation is IMMEDIATE.

Cross: 2n (‚ôÄ) √ó 4n (‚ôÇ)
- Gametes: n (from 2n) √ó 2n (from 4n)
- Hybrid: 3n (triploid)

**Problem:** Triploids are STERILE
- Meiosis fails (unpaired chromosomes)
- Can't produce viable gametes
- Reproductive dead end

**Result:** 4n is NEW SPECIES instantly!

**2. Why instant?**

**Chromosome incompatibility** creates immediate postzygotic isolation:
- Different chromosome numbers
- Meiosis impossible in hybrids
- No gene flow possible

This is the **fastest speciation mechanism**!
- Allopatric: 100,000s years
- Sympatric (disruptive): 10,000s years
- **Polyploidy: ONE GENERATION!**

**3. Plants vs Animals?**

**Common in plants because:**
- Self-fertilization possible (4n can reproduce with itself)
- Indeterminate growth
- Less complex organ systems
- Can tolerate gene dosage imbalance

**Rare in animals because:**
- Require mate (but all others 2n)
- Sex determination disrupted (XXX, XXY)
- Complex organ systems (development fails)
- Gene dosage critical

**Estimate:** 50-70% of flowering plants are polyploid!
Including: wheat, cotton, tobacco, strawberries, coffee

**Animals:** Very rare, mostly in some fish, frogs, lizards
</details>

In [None]:
def export_results():
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    data = []
    for name, info in speciation_examples.items():
        data.append({
            'Example': name,
            'Mode': info['mode'],
            'Species_Count': info['species_count'],
            'Time_Mya': info['time_mya'],
            'Key_Trait': info['key_trait'],
            'Mechanism': info['mechanism']
        })
    df = pd.DataFrame(data)
    csv_file = f"/content/lab_4_1_speciation_{timestamp}.csv"
    df.to_csv(csv_file, index=False)
    print(f"‚úì Saved: {csv_file}")
    print(f"Exported {len(data)} speciation examples")

btn = Button(description='üì• Export', button_style='success', icon='download')
btn.on_click(lambda b: export_results())
display(HTML("<h3>üì§ Export</h3>"))
display(btn)

## Summary

### Key Concepts

‚úÖ **Speciation** = Evolution of reproductive isolation  
‚úÖ **Three modes**: Allopatric (geographic), Sympatric (same location), Parapatric (gradient)  
‚úÖ **F_ST** = Measure of genetic divergence (0-1)  
‚úÖ **Time to speciation**: Varies from 1 generation (polyploidy) to millions of years  

### Equations

**Genetic Divergence:**
## F_ST = (p‚ÇÅ - p‚ÇÇ)¬≤ / [2pÃÑ(1-pÃÑ)]

**Gene flow prevents divergence:**
## Nm > 1

### Speciation Modes

**Allopatric (most common):**
- Geographic barrier ‚Üí isolation ‚Üí divergence
- Examples: Islands, mountains, rivers
- Time: 100,000s-1,000,000s years

**Sympatric (rare but dramatic):**
- Polyploidy (instant!) OR
- Disruptive selection + assortative mating
- Examples: Cichlids, polyploid plants

**Parapatric:**
- Environmental gradient
- Selection > gene flow
- Partial isolation

### Real-World Examples

**Darwin's Finches**: 13 species in 2.3 Mya (allopatric + adaptive radiation)  
**Lake Victoria Cichlids**: 500+ species in 15,000 years! (sympatric)  
**Wheat**: 2n‚Üí4n‚Üí6n via polyploidy (instant speciation)  
**Rhagoletis**: Apple vs hawthorn in 160 years (sympatric, host shift)  

### The Big Picture

**Speciation is the origin of biodiversity!**

- Without speciation: 1 species
- With speciation: ~8.7 million species!

**Different mechanisms, different timescales:**
- Polyploidy: 1 generation
- Sympatric: 1,000s generations
- Allopatric: 100,000s+ generations

**All require:** REPRODUCTIVE ISOLATION

### Next Lab

**Lab 4.2: Phylogenetic Tree Builder** - Reconstruct evolutionary relationships!

**Congratulations!** üéâ