# Lab 1.3: Modern Taxonomy Toolkit
## Unit 1: Taxonomy & Biosystematics

### üéØ Learning Objectives
- Apply DNA barcoding for species identification
- Interpret karyotypes and chromosomal data
- Use numerical taxonomy methods (UPGMA)
- Integrate molecular and morphological data
- Understand integrative taxonomy approaches

### üìñ Connection to Course
Covers **Modern Taxonomic Methods** from Unit 1: 21st century tools

### üß¨ The Big Question
**How do we identify species when morphology isn't enough?** Use DNA!

In [None]:
# === GOOGLE COLAB SETUP ===
try:
    from google.colab import output
    output.enable_custom_widget_manager()
    print("‚úì Widgets enabled")
except:
    print("‚úì Running outside Colab")

import numpy as np
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from ipywidgets import interact, FloatSlider, IntSlider, Dropdown, Button, Layout
from IPython.display import display, HTML
from datetime import datetime
from scipy.cluster.hierarchy import dendrogram, linkage
from scipy.spatial.distance import pdist, squareform
import matplotlib.pyplot as plt

np.random.seed(42)
print("‚úì Libraries loaded!")

## Part 1: Modern Taxonomy Theory

### Why Modern Methods?

**Traditional morphology limitations:**
- Cryptic species (look identical)
- Sexual dimorphism (males ‚â† females)
- Age variation (juveniles ‚â† adults)
- Convergent evolution (similar but unrelated)
- Damaged specimens

**Modern solutions:**
- DNA barcoding
- Karyotypes (chromosomes)
- Numerical taxonomy
- Integrative approaches

### DNA Barcoding

**Concept:** Use short DNA sequence as species "barcode"

**Standard barcode genes:**
- **Animals:** COI (Cytochrome c oxidase subunit I)
  - Mitochondrial gene
  - ~650 base pairs
  - High variation between species
  - Low variation within species
- **Plants:** rbcL, matK (chloroplast genes)
- **Fungi:** ITS (Internal Transcribed Spacer)

**Barcoding process:**
1. Extract DNA from specimen
2. PCR amplify barcode gene
3. Sequence the amplicon
4. Compare to reference database (BOLD, GenBank)
5. Match = species identified!

**Barcoding gap:**
- Intraspecific variation: 0-2%
- Interspecific variation: >2%
- Clear separation = good barcode

### Karyotypes

**Chromosome analysis:**
- Number (2n = diploid number)
- Size and shape
- Banding patterns

**Taxonomic value:**
- Species-specific karyotypes
- Chromosome number often diagnostic
- Detects cryptic species
- Tracks polyploidy events

**Examples:**
- Human: 2n = 46
- Chimpanzee: 2n = 48 (fusion in humans)
- Dog: 2n = 78
- Fruit fly: 2n = 8

### Numerical Taxonomy

**Phenetics:** Group by overall similarity

**Method (UPGMA):**
1. Score many characters (50-100+)
2. Calculate pairwise similarity
3. Cluster by similarity
4. Generate phenogram (not phylogeny!)

**Distance measures:**
## d = (differences) / (total characters)

**UPGMA algorithm:**
- Unweighted Pair Group Method with Arithmetic mean
- Same as phylogenetics (Lab 4.2)
- But: No assumption of common ancestry
- Just similarity-based grouping

### Integrative Taxonomy

**Combine multiple lines of evidence:**
- Morphology (traditional)
- DNA sequences (molecular)
- Chromosomes (karyotypes)
- Ecology (niche differences)
- Behavior (mating calls, etc.)
- Geography (allopatry)

**Consensus approach:**
If multiple independent datasets agree ‚Üí strong species hypothesis

## Part 2: DNA Barcode Database

In [None]:
# Simulated COI barcode sequences (first 100 bp shown)
# In reality these are ~650 bp
barcodes = {
    'Mus musculus': {
        'common_name': 'House Mouse',
        'coi_sequence': 'ATGACCAACATTCGCAACCTCTGACTAATCCCCCCATCTTTCGGATCCGCAGCCATTCTTCTTATCGACCAACCCAACCCTGCCTTCTCCCTCACCATCAT',
        'specimens': ['MM001', 'MM002', 'MM003'],
        'variation': 0.5  # % within species
    },
    'Mus spretus': {
        'common_name': 'Algerian Mouse',
        'coi_sequence': 'ATGACCAACATTCGCAACCTCTGACTAATCCCCCCATCTTTCGGATCCGCAGCCATTCTTCTTATCGACCAACCCAACCCTGCCTTCTCCCTCACCGTCAT',
        'specimens': ['MS001', 'MS002'],
        'variation': 0.3
    },
    'Rattus norvegicus': {
        'common_name': 'Norway Rat',
        'coi_sequence': 'ATGACCAACATTCGCAACCTCTGACTAATCCCCCCATCTTTCGGATCCGCAGCCATTCTTCTTATCGACCAACCCAACCCTGCCTTGTCCCTCGCCATCAT',
        'specimens': ['RN001', 'RN002', 'RN003'],
        'variation': 0.4
    },
    'Apodemus sylvaticus': {
        'common_name': 'Wood Mouse',
        'coi_sequence': 'ATGACCAACATTCGCAACCTCTGACTAATCCCCCCATCTTTCGGATCCGCAGCCATTCTTCTTATCGACCAACCCAACCCTGCCTTCTGCCTCACCATCAT',
        'specimens': ['AS001', 'AS002'],
        'variation': 0.6
    },
    'Peromyscus maniculatus': {
        'common_name': 'Deer Mouse',
        'coi_sequence': 'ATGACCAACATTCGCAACCTCTGACTAATCCCCCCATCTTTCGGATCCGCAGCCATTCTTCTTATCGACCAACCCAACCCTGCCATCTCCCTCACCATCAT',
        'specimens': ['PM001', 'PM002', 'PM003'],
        'variation': 0.7
    }
}

# Calculate pairwise differences
def sequence_difference(seq1, seq2):
    """Calculate % difference between sequences"""
    if len(seq1) != len(seq2):
        return None
    differences = sum(1 for a, b in zip(seq1, seq2) if a != b)
    return (differences / len(seq1)) * 100

print("\nDNA BARCODE DATABASE (COI gene)")
print("="*80)
print(f"{'Species':<30}{'Common Name':<25}{'Within-species variation'}")
print("="*80)
for species, data in barcodes.items():
    print(f"{species:<30}{data['common_name']:<25}{data['variation']:.1f}%")

print(f"\n‚úì {len(barcodes)} species with COI barcodes ready!")
print(f"\nBARCODE LENGTH: 100 bp (subset shown; full barcodes ~650 bp)")
print(f"BARCODE GENE: COI (Cytochrome c oxidase subunit I)")
print(f"TYPICAL VARIATION: Within species <2%, Between species >2%")

## Part 3: DNA Barcode Analyzer

In [None]:
def analyze_barcodes():
    """
    Calculate barcoding gap and create distance matrix
    """
    species_list = list(barcodes.keys())
    n = len(species_list)
    
    # Calculate distance matrix
    distances = np.zeros((n, n))
    for i in range(n):
        for j in range(n):
            if i != j:
                seq1 = barcodes[species_list[i]]['coi_sequence']
                seq2 = barcodes[species_list[j]]['coi_sequence']
                distances[i, j] = sequence_difference(seq1, seq2)
    
    # Get within-species and between-species distances
    within_species = [data['variation'] for data in barcodes.values()]
    between_species = distances[np.triu_indices_from(distances, k=1)]
    
    # Find barcoding gap
    max_within = max(within_species)
    min_between = min(between_species)
    gap = min_between - max_within
    
    # Visualization
    fig = make_subplots(
        rows=1, cols=2,
        subplot_titles=('Distance Matrix (Heatmap)', 'Barcoding Gap'),
        specs=[[{'type': 'heatmap'}, {'type': 'box'}]]
    )
    
    # 1. Heatmap
    fig.add_trace(go.Heatmap(
        z=distances,
        x=species_list,
        y=species_list,
        colorscale='Reds',
        text=np.round(distances, 1),
        texttemplate='%{text}%',
        textfont={"size": 10},
        colorbar=dict(title="% Difference")
    ), row=1, col=1)
    
    # 2. Barcoding gap
    fig.add_trace(go.Box(
        y=within_species,
        name='Within species',
        marker_color='#2ECC71',
        boxmean='sd'
    ), row=1, col=2)
    
    fig.add_trace(go.Box(
        y=between_species,
        name='Between species',
        marker_color='#E74C3C',
        boxmean='sd'
    ), row=1, col=2)
    
    fig.update_yaxes(title_text="Sequence Difference (%)", row=1, col=2)
    fig.update_layout(height=500, title_text='<b>DNA Barcoding Analysis</b>')
    
    # Results
    print("\n" + "="*70)
    print("DNA BARCODING GAP ANALYSIS")
    print("="*70)
    print(f"\nWITHIN-SPECIES VARIATION:")
    print(f"  Range: {min(within_species):.1f}% - {max_within:.1f}%")
    print(f"  Average: {np.mean(within_species):.1f}%")
    print(f"  ‚Üí Individuals of same species very similar")
    
    print(f"\nBETWEEN-SPECIES VARIATION:")
    print(f"  Range: {min_between:.1f}% - {max(between_species):.1f}%")
    print(f"  Average: {np.mean(between_species):.1f}%")
    print(f"  ‚Üí Different species clearly distinct")
    
    print(f"\nBARCODING GAP:")
    print(f"  Max within-species: {max_within:.1f}%")
    print(f"  Min between-species: {min_between:.1f}%")
    print(f"  Gap size: {gap:.1f}%")
    
    if gap > 1.0:
        print(f"\n‚úì CLEAR BARCODING GAP - Excellent for species ID!")
        print(f"  Any sequence >2% different = different species")
    elif gap > 0:
        print(f"\n‚úì BARCODING GAP PRESENT - Good for species ID")
    else:
        print(f"\n‚ö† NO BARCODING GAP - May need additional markers")
    
    print(f"\nMOST SIMILAR SPECIES PAIR:")
    min_idx = np.unravel_index(np.argmin(distances + np.eye(n)*1000), distances.shape)
    sp1, sp2 = species_list[min_idx[0]], species_list[min_idx[1]]
    print(f"  {sp1} vs {sp2}")
    print(f"  Distance: {distances[min_idx]:.1f}%")
    
    print("="*70)
    
    fig.show()

display(HTML("<h3>üß¨ DNA Barcoding Gap Analysis</h3>"))
analyze_barcodes()

## Part 4: Karyotype Database

In [None]:
# Karyotype data for various species
karyotypes = {
    'Homo sapiens': {
        'diploid': 46,
        'autosomes': 22,
        'sex_chromosomes': 'XX/XY',
        'note': 'Chromosome 2 is fusion of two ancestral chromosomes'
    },
    'Pan troglodytes': {
        'diploid': 48,
        'autosomes': 23,
        'sex_chromosomes': 'XX/XY',
        'note': 'Two chromosomes correspond to human chromosome 2'
    },
    'Gorilla gorilla': {
        'diploid': 48,
        'autosomes': 23,
        'sex_chromosomes': 'XX/XY',
        'note': 'Similar to chimpanzee karyotype'
    },
    'Canis familiaris': {
        'diploid': 78,
        'autosomes': 38,
        'sex_chromosomes': 'XX/XY',
        'note': 'High chromosome number typical of canids'
    },
    'Felis catus': {
        'diploid': 38,
        'autosomes': 18,
        'sex_chromosomes': 'XX/XY',
        'note': 'Moderate chromosome number'
    },
    'Mus musculus': {
        'diploid': 40,
        'autosomes': 19,
        'sex_chromosomes': 'XX/XY',
        'note': 'Standard lab mouse'
    },
    'Drosophila melanogaster': {
        'diploid': 8,
        'autosomes': 3,
        'sex_chromosomes': 'XX/XY',
        'note': 'Very low chromosome number, giant polytene chromosomes'
    },
    'Triticum aestivum': {
        'diploid': 42,
        'autosomes': 21,
        'sex_chromosomes': 'None',
        'note': 'Hexaploid (6n), result of two polyploidy events'
    }
}

# Visualization
species = list(karyotypes.keys())
diploid_numbers = [data['diploid'] for data in karyotypes.values()]
colors = ['#E74C3C' if '48' in str(d) else '#3498DB' if d > 40 else '#2ECC71' for d in diploid_numbers]

fig = go.Figure(go.Bar(
    x=species,
    y=diploid_numbers,
    marker_color=colors,
    text=diploid_numbers,
    textposition='outside'
))

fig.update_layout(
    title='<b>Chromosome Numbers Across Species</b>',
    xaxis_title='Species',
    yaxis_title='Diploid Number (2n)',
    height=500
)

print("\nKARYOTYPE DATABASE")
print("="*80)
print(f"{'Species':<30}{'2n':<8}{'Autosomes':<12}{'Sex Chr'}")
print("="*80)
for sp, data in karyotypes.items():
    print(f"{sp:<30}{data['diploid']:<8}{data['autosomes']:<12}{data['sex_chromosomes']}")

print(f"\n‚úì {len(karyotypes)} species karyotypes!")
print(f"\nKEY OBSERVATION:")
print(f"Human (2n=46) vs Chimp (2n=48) - chromosome fusion in humans!")

fig.show()

## Part 5: Numerical Taxonomy (UPGMA)

In [None]:
def numerical_taxonomy_upgma():
    """
    Perform UPGMA clustering on morphological data
    """
    # Morphological character matrix (binary: 0/1)
    # 10 characters scored for 5 rodent species
    characters = {
        'Character': [
            'Long tail', 'Large ears', 'Nocturnal', 'Arboreal',
            'White belly', 'Striped back', 'Bushy tail', 'Hibernate',
            'Cheek pouches', 'Gnaw bark'
        ]
    }
    
    data = {
        'House Mouse': [1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
        'Wood Mouse': [1, 1, 1, 0, 1, 0, 0, 0, 0, 0],
        'Norway Rat': [1, 0, 1, 0, 0, 0, 0, 0, 0, 0],
        'Gray Squirrel': [1, 0, 0, 1, 0, 0, 1, 0, 1, 1],
        'Chipmunk': [1, 0, 0, 0, 0, 1, 0, 1, 1, 0]
    }
    
    species = list(data.keys())
    matrix = np.array(list(data.values()))
    
    # Calculate pairwise distances (Hamming distance)
    distances = pdist(matrix, metric='hamming') * 100  # Convert to percentage
    
    # Perform UPGMA clustering
    linkage_matrix = linkage(distances, method='average')
    
    # Create dendrogram
    plt.figure(figsize=(10, 6))
    dendrogram(linkage_matrix, labels=species, leaf_font_size=12)
    plt.title('UPGMA Phenogram (Morphological Characters)', fontsize=14, fontweight='bold')
    plt.xlabel('Species', fontsize=12)
    plt.ylabel('Distance (%)', fontsize=12)
    plt.tight_layout()
    plt.show()
    
    # Distance matrix
    dist_matrix = squareform(distances)
    
    print("\n" + "="*70)
    print("NUMERICAL TAXONOMY (UPGMA)")
    print("="*70)
    print(f"\nCHARACTERS SCORED: {len(characters['Character'])}")
    print(f"SPECIES ANALYZED: {len(species)}")
    print(f"METHOD: Unweighted Pair Group Method with Arithmetic mean")
    
    print(f"\nMORPHOLOGICAL CHARACTER MATRIX:")
    df = pd.DataFrame(data, index=characters['Character']).T
    print(df.to_string())
    
    print(f"\nDISTANCE MATRIX (% different):")
    dist_df = pd.DataFrame(dist_matrix, index=species, columns=species)
    print(np.round(dist_df, 1).to_string())
    
    print(f"\nCLUSTERING RESULTS:")
    print(f"  Most similar: House Mouse - Wood Mouse ({dist_matrix[0,1]:.1f}% different)")
    print(f"  Both in genus Mus, similar ecology")
    print(f"\n  Next cluster: Norway Rat joins mice")
    print(f"  All murids (rats and mice)")
    print(f"\n  Separate cluster: Squirrel and Chipmunk")
    print(f"  Both sciurids, different ecology from murids")
    
    print(f"\nPHENOGRAM vs PHYLOGENY:")
    print(f"  Phenogram: Based on OVERALL similarity")
    print(f"  Phylogeny: Based on SHARED DERIVED characters")
    print(f"  ‚Üí Phenogram may not reflect evolutionary relationships!")
    print(f"  ‚Üí Convergent evolution can mislead numerical taxonomy")
    print("="*70)

display(HTML("<h3>üìä Numerical Taxonomy with UPGMA</h3>"))
numerical_taxonomy_upgma()

## Part 6: Integrative Taxonomy Example

In [None]:
def integrative_taxonomy_case():
    """
    Show how multiple lines of evidence resolve cryptic species
    """
    # Case: Two morphologically similar populations
    evidence = {
        'Morphology': {
            'Population A': 'Small rodent, 25g, brown fur',
            'Population B': 'Small rodent, 24g, brown fur',
            'Conclusion': 'SAME SPECIES (indistinguishable)'
        },
        'DNA Barcoding': {
            'Population A': 'COI sequence: ATGACC...',
            'Population B': 'COI sequence: ATGACT...',
            'Difference': '3.5% (>2% threshold)',
            'Conclusion': 'DIFFERENT SPECIES'
        },
        'Karyotype': {
            'Population A': '2n = 40 chromosomes',
            'Population B': '2n = 38 chromosomes',
            'Conclusion': 'DIFFERENT SPECIES (chromosome difference)'
        },
        'Ecology': {
            'Population A': 'Forest habitat, nocturnal',
            'Population B': 'Grassland habitat, crepuscular',
            'Conclusion': 'DIFFERENT NICHES (supports separation)'
        },
        'Breeding Tests': {
            'A √ó A': 'Fertile offspring',
            'B √ó B': 'Fertile offspring',
            'A √ó B': 'No offspring (reproductive isolation)',
            'Conclusion': 'REPRODUCTIVELY ISOLATED'
        }
    }
    
    # Visualization
    categories = list(evidence.keys())
    support_same = [1, 0, 0, 0, 0]  # Only morphology suggests same
    support_different = [0, 1, 1, 1, 1]  # All others suggest different
    
    fig = go.Figure()
    fig.add_trace(go.Bar(
        y=categories,
        x=support_same,
        orientation='h',
        name='Same Species',
        marker_color='#95A5A6',
        text=['Same', '', '', '', ''],
        textposition='inside'
    ))
    fig.add_trace(go.Bar(
        y=categories,
        x=support_different,
        orientation='h',
        name='Different Species',
        marker_color='#E74C3C',
        text=['', 'Different', 'Different', 'Different', 'Different'],
        textposition='inside'
    ))
    
    fig.update_layout(
        title='<b>Integrative Taxonomy: Weight of Evidence</b>',
        barmode='stack',
        height=400
    )
    
    print("\n" + "="*70)
    print("INTEGRATIVE TAXONOMY CASE STUDY")
    print("="*70)
    print(f"\nSCENARIO: Two morphologically identical populations")
    print(f"QUESTION: Are they the same species or cryptic species?")
    
    for i, (method, results) in enumerate(evidence.items(), 1):
        print(f"\n{i}. {method.upper()}:")
        for key, value in results.items():
            if key == 'Conclusion':
                print(f"   ‚Üí {value}")
            else:
                print(f"   {key}: {value}")
    
    print(f"\n" + "="*70)
    print("FINAL DETERMINATION")
    print("="*70)
    print(f"\nEVIDENCE SUMMARY:")
    print(f"  Same species: 1/5 lines of evidence (20%)")
    print(f"  Different species: 4/5 lines of evidence (80%)")
    print(f"\n‚úì CONSENSUS: TWO CRYPTIC SPECIES")
    print(f"\nCryptic species = morphologically similar but genetically distinct")
    print(f"\nKEY LESSON:")
    print(f"  Morphology alone can be misleading!")
    print(f"  Multiple independent datasets provide strong support")
    print(f"  Integrative taxonomy resolves difficult cases")
    print("="*70)
    
    fig.show()

display(HTML("<h3>üî¨ Integrative Taxonomy: Cryptic Species</h3>"))
integrative_taxonomy_case()

## Part 7: Challenge Problems

### Challenge 1: Interpret DNA Barcode üß¨

**You sequence an unknown mouse specimen:**
- COI sequence differs from *Mus musculus* by 0.8%
- COI sequence differs from *Mus spretus* by 3.2%
- COI sequence differs from *Rattus norvegicus* by 7.5%

**Questions:**
1. What species is it?
2. How confident are you?
3. What if it differed from *M. musculus* by 2.5%?

<details>
<summary>Solution</summary>

**1. Species Identification:**

**Distances:**
- *Mus musculus*: 0.8% different
- *Mus spretus*: 3.2% different  
- *Rattus norvegicus*: 7.5% different

**‚úì Identification: *Mus musculus* (House Mouse)**

**Reasoning:**
- 0.8% difference = WITHIN typical intraspecific variation (<2%)
- Matches *M. musculus* best
- Other species >2% = clearly different species

**2. Confidence Level:**

**HIGH CONFIDENCE (>95%)**

**Why:**
- Clear barcoding gap exists
- 0.8% well below 2% threshold
- Nearest other species (3.2%) well above threshold
- 4√ó difference between best match and second-best

**Supporting evidence:**
- Barcoding gap: 0.8% vs 3.2% = 2.4% gap
- No ambiguity in assignment

**3. If 2.5% Different:**

**Problem: In the "grey zone"!**

**2.5% is:**
- Above typical intraspecific (<2%)
- Below typical interspecific (>3%)
- Right at the boundary!

**Possibilities:**

**Option A: Divergent *M. musculus* population**
- Geographic isolation
- Incipient speciation
- Still same species but diverging

**Option B: Undescribed cryptic species**
- Looks like *M. musculus*
- Genetically distinct
- Needs more investigation

**Option C: Hybrid**
- *M. musculus* √ó *M. spretus* hybrid
- Intermediate sequence

**What to do:**
1. **Sequence more individuals** from same population
2. **Use additional genes** (nuclear DNA, not just COI)
3. **Check morphology** carefully
4. **Test breeding** compatibility
5. **Examine karyotype** (chromosome number?)

**Conclusion for 2.5%:**
**LOW CONFIDENCE - needs more data!**

Single barcode gene insufficient for borderline cases.
</details>

### Challenge 2: Karyotype Evidence üî¨

**Two populations of mice:**
- Population 1: 2n = 40, all pairs homologous
- Population 2: 2n = 38, two chromosomes fused
- Hybrids (1√ó2): 2n = 39, partially sterile

**Questions:**
1. Are these the same species?
2. What happened evolutionarily?
3. Compare to human-chimp karyotype difference

<details>
<summary>Solution</summary>

**1. Same Species?**

**NO - Different species (or at least incipient species)**

**Evidence:**
- Different chromosome numbers (40 vs 38)
- Hybrids partially sterile (reproductive isolation!)
- Chromosomal incompatibility

**Reproductive isolation:**
- Hybrid fertility reduced
- Gene flow restricted
- Populations diverging

**Biological Species Concept:**
"Groups of interbreeding populations, reproductively isolated"
‚Üí Reduced fertility = reproductive barrier = separate species

**But complex:**
- Partial fertility (not complete isolation)
- May be incipient species
- Speciation in progress!

**2. Evolutionary Event:**

**ROBERTSONIAN FUSION (chromosome fusion)**

**What happened:**
- Population 2 had fusion event
- Two chromosomes ‚Üí One chromosome
- 2n: 40 ‚Üí 38

**Mechanism:**
- Centromeric fusion
- Two acrocentric chromosomes fuse at centromeres
- Forms one metacentric chromosome
- No gene loss (all information preserved)

**Consequences:**

**Pure populations:**
- Pop 1 (40): Normal meiosis
- Pop 2 (38): Normal meiosis

**Hybrids (39):**
- Meiotic problems!
- Fused chromosome can't pair properly
- Some gametes unbalanced
- Reduced fertility

**Why partially sterile?**
- Trivalent forms (2 chromosomes + 1 fused)
- Irregular segregation
- Some viable gametes, some inviable
- ~50% fertility reduction

**3. Human-Chimp Comparison:**

**EXACT SAME MECHANISM!**

**Human vs Chimp:**
- Human: 2n = 46
- Chimp: 2n = 48
- Difference: 2 chromosomes

**What happened:**
- Ancestral karyotype: 2n = 48 (like chimp)
- Robertsonian fusion in human lineage
- Two small chromosomes ‚Üí Human chromosome 2
- Human chr 2 = Chimp chr 2A + 2B

**Evidence:**
- Human chr 2 has TWO centromeres!
- One active, one vestigial
- Telomeric sequences in middle
- Perfect match to chimp chromosomes

**Implications:**

**For mice (40/38):**
- Karyotype difference = reproductive barrier
- Reduced hybrid fertility
- Populations diverging

**For humans (46) and chimps (48):**
- Karyotype difference = reproductive barrier
- Would expect hybrid problems
- But species already separated ~6 Mya!
- Fusion happened AFTER speciation

**Key lesson:**
Chromosome fusions can:
1. CAUSE reproductive isolation (mouse case)
2. REINFORCE existing isolation (human-chimp)

Either way: Karyotype differences = strong evidence for species boundaries!
</details>

### Challenge 3: Choose the Right Method üîß

**For each scenario, choose best taxonomic method(s):**

**A.** Identify seized wildlife products (dried meat, no morphology)

**B.** Classify newly discovered beetle (complete specimen, tropical)

**C.** Resolve cryptic species complex (5 morphologically identical populations)

**D.** Reconstruct phylogeny of bird genus (15 species, museum skins)

<details>
<summary>Solution</summary>

**A. Wildlife Product Identification**

**BEST METHOD: DNA Barcoding**

**Why:**
- No morphology available (dried/processed)
- DNA survives cooking, drying, aging
- COI barcode works from tiny tissue sample
- Compare to BOLD database
- Fast, reliable, court-admissible

**Not useful:**
- Morphology: Destroyed/unrecognizable
- Karyotype: Requires fresh cells
- Numerical taxonomy: Needs characters

**Procedure:**
1. Extract DNA from sample
2. PCR amplify COI gene
3. Sequence
4. BLAST against BOLD database
5. Match ‚Üí species identified!

**Real application:**
- Bushmeat markets
- Shark fin soup
- Illegal ivory
- Mislabeled seafood

**B. New Beetle Classification**

**BEST METHOD: Morphology FIRST, then DNA**

**Priority 1: Traditional Morphology**
- Complete specimen available
- Beetle morphology well-studied
- Compare to keys/collections
- Examine genitalia (often diagnostic)
- Check family, genus

**Priority 2: DNA Barcoding**
- Confirm morphological ID
- Check for cryptic species
- Add to reference database
- Future identifications easier

**Also useful:**
- Numerical taxonomy (if genus complex)
- Integrative approach best

**Why morphology first:**
- Beetle DNA databases incomplete (tropical!)
- Morphology provides family/genus
- May be new to science!
- Type specimen = morphology required

**C. Cryptic Species Complex**

**BEST METHOD: Integrative Taxonomy**

**Multiple lines of evidence needed:**

**1. DNA Barcoding (COI)**
- Sequence all 5 populations
- Look for barcoding gaps
- >2% difference suggests separate species

**2. Nuclear DNA**
- Multiple nuclear genes
- Confirm mitochondrial patterns
- Gene trees

**3. Karyotype Analysis**
- Check chromosome numbers
- Differences = strong evidence

**4. Morphometrics**
- Detailed measurements
- Statistical analysis
- Subtle differences?

**5. Ecology**
- Different niches?
- Habitat preferences?
- Temporal separation?

**6. Breeding Tests**
- If feasible
- Test hybrid fertility
- Reproductive isolation?

**Decision tree:**
- If 3+ methods agree ‚Üí Strong case for species status
- If conflict ‚Üí Need more data
- Publish integrative analysis

**D. Bird Phylogeny Reconstruction**

**BEST METHOD: Molecular Phylogenetics**

**Primary: DNA Sequences**
- Extract DNA from museum skins
- Multiple genes (nuclear + mitochondrial)
- Cytochrome b, COI, NADH genes
- Nuclear markers (introns, RAG1, etc.)
- Build phylogenetic tree (Maximum Likelihood)

**Secondary: Morphology**
- Score morphological characters
- Plumage patterns
- Bill shape
- Skeletal features
- Use for total evidence tree

**Not ideal:**
- Numerical taxonomy: Doesn't give phylogeny
- Karyotype: Hard from museum skins

**Why molecular best:**
- Many characters (base pairs)
- Objective (no bias)
- Quantitative
- Can calibrate with fossils
- Museum DNA extractable

**Methods (from Lab 4.2):**
- UPGMA if clock-like
- Maximum Likelihood best
- Bayesian if computational power

**SUMMARY:**

| Scenario | Best Method | Why |
|----------|------------|-----|
| Wildlife product | DNA barcoding | No morphology |
| New beetle | Morphology + DNA | Complete specimen |
| Cryptic species | Integrative | Multiple lines |
| Phylogeny | Molecular + morph | Tree reconstruction |

**General rule:**
Use ALL available methods - integrative approach strongest!
</details>

In [None]:
def export_results():
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    
    # Export barcode database
    barcode_data = []
    for species, info in barcodes.items():
        barcode_data.append({
            'Species': species,
            'Common_Name': info['common_name'],
            'COI_Sequence': info['coi_sequence'],
            'Within_Species_Variation': info['variation']
        })
    df1 = pd.DataFrame(barcode_data)
    barcode_file = f"/content/lab_1_3_barcodes_{timestamp}.csv"
    df1.to_csv(barcode_file, index=False)
    print(f"‚úì Saved: {barcode_file}")
    
    # Export karyotype database
    karyo_data = []
    for species, info in karyotypes.items():
        karyo_data.append({
            'Species': species,
            'Diploid_Number': info['diploid'],
            'Autosomes': info['autosomes'],
            'Sex_Chromosomes': info['sex_chromosomes'],
            'Note': info['note']
        })
    df2 = pd.DataFrame(karyo_data)
    karyo_file = f"/content/lab_1_3_karyotypes_{timestamp}.csv"
    df2.to_csv(karyo_file, index=False)
    print(f"‚úì Saved: {karyo_file}")
    print(f"\nExported DNA barcodes and karyotype data")

btn = Button(description='üì• Export', button_style='success', icon='download')
btn.on_click(lambda b: export_results())
display(HTML("<h3>üì§ Export</h3>"))
display(btn)

## Summary

### Key Concepts

‚úÖ **DNA Barcoding:** COI gene for animal species identification  
‚úÖ **Barcoding Gap:** <2% within species, >2% between species  
‚úÖ **Karyotypes:** Chromosome number often diagnostic  
‚úÖ **Numerical Taxonomy:** UPGMA clustering by similarity  
‚úÖ **Integrative Taxonomy:** Multiple lines of evidence  

### DNA Barcoding

**Standard Genes:**
- Animals: COI (~650 bp)
- Plants: rbcL, matK
- Fungi: ITS

**Barcoding Gap:**
- Within species: <2% variation
- Between species: >2% variation
- Clear gap = reliable identification

**Process:**
1. Extract DNA
2. PCR amplify barcode
3. Sequence
4. Compare to database (BOLD)
5. Match = identified!

**Applications:**
- Wildlife forensics
- Cryptic species
- Damaged specimens
- Larvae/juveniles
- Processed products

### Karyotypes

**Chromosome Data:**
- Number (2n)
- Size and shape
- Banding patterns

**Taxonomic Value:**
- Species-specific
- Detects cryptic species
- Tracks polyploidy
- Reveals fusions/fissions

**Examples:**
- Human: 2n=46 (fusion event)
- Chimp: 2n=48 (ancestral)
- Dog: 2n=78 (many small)
- Fly: 2n=8 (very few)

**Reproductive Barriers:**
- Different chromosome numbers
- Hybrid sterility/infertility
- Meiotic problems
- ‚Üí Speciation

### Numerical Taxonomy

**Phenetics:**
- Group by overall similarity
- No evolutionary assumptions
- Many characters (50-100+)

**UPGMA Method:**
1. Score characters
2. Calculate distances
3. Cluster by similarity
4. Generate phenogram

**Phenogram vs Phylogeny:**
- Phenogram: Similarity-based
- Phylogeny: Ancestry-based
- Convergence misleads phenetics
- Cladistics better for evolution

**But useful for:**
- Initial grouping
- Morphological analysis
- Practical classification

### Integrative Taxonomy

**Multiple Evidence Types:**

**1. Morphology**
- Traditional characters
- Morphometrics
- Detailed measurements

**2. Molecular**
- DNA barcoding
- Multiple genes
- Phylogenetics

**3. Chromosomal**
- Karyotypes
- Chromosome number
- Banding patterns

**4. Ecological**
- Habitat preferences
- Niche differences
- Temporal separation

**5. Behavioral**
- Mating calls
- Courtship displays
- Activity patterns

**6. Reproductive**
- Breeding tests
- Hybrid fertility
- Gamete compatibility

**Decision Rule:**
- 1 line of evidence: Weak
- 2 lines: Moderate
- 3+ lines: Strong
- Consensus: Very strong!

### Cryptic Species

**Definition:** Morphologically identical but genetically distinct

**Detection:**
- DNA barcoding gap
- Karyotype differences
- Ecological separation
- Reproductive isolation

**Importance:**
- Underestimated biodiversity
- Conservation implications
- Disease vector identification
- Understanding speciation

### Modern vs Traditional

**Traditional Strengths:**
- No equipment needed
- Works on fossils
- Rich literature
- Field-based

**Modern Strengths:**
- Detects cryptic species
- Objective
- Quantitative
- Works on fragments

**Best Approach:**
**Use both! Integrative taxonomy strongest.**

### Real-World Applications

**Wildlife Forensics:**
- Illegal trade detection
- Bushmeat identification
- Timber species

**Conservation:**
- Cryptic species discovery
- Population genetics
- Management units

**Medicine:**
- Disease vector ID
- Parasite taxonomy
- Drug source verification

**Agriculture:**
- Pest identification
- Crop verification
- GMO detection

### The Future

**Emerging Methods:**
- Whole genome sequencing
- Environmental DNA (eDNA)
- Metabarcoding (bulk samples)
- Machine learning on images
- Portable sequencers (MinION)

**Challenges:**
- Database completeness
- Computational power
- Data integration
- Cost and access

**Opportunities:**
- Citizen science
- Rapid assessment
- Global biodiversity
- Real-time identification

### üéä UNIT 1 COMPLETE!

**You've mastered:**
- Taxonomic keys (Lab 1.1)
- ICZN nomenclature (Lab 1.2)
- Modern taxonomy methods (Lab 1.3)

### üéâ ENTIRE PROJECT COMPLETE!

**All 12 Labs Finished:**
- ‚úÖ Unit 1: Taxonomy & Biosystematics (3 labs)
- ‚úÖ Unit 2: Evolution & Extinction (3 labs)
- ‚úÖ Unit 3: Population Genetics (4 labs)
- ‚úÖ Unit 4: Speciation & Human Evolution (3 labs)

**Total:** 12 interactive labs, ~12-17 hours of student engagement

**Congratulations - you've completed the entire Biosystematics & Evolution interactive lab series!** üéäüéâüèÜ