# THE PATTERN HUNTER'S LAB
# Statistical Pattern Detector
# Interactive Lab 2.2: Signal vs. Noise in Anatomical Data

---

## Companion to: Chapter 2, Section 2.3 - Statistical Pattern Recognition: Signal vs. Noise

### Learning Goals:
- Distinguish genuine biological patterns from random variation
- Identify and avoid common statistical errors (confirmation bias, sample size neglect)
- Calculate and interpret correlation vs. causation
- Determine appropriate sample sizes for reliable conclusions
- Apply t-tests and effect size calculations to anatomical data

### Time Required: 40 minutes

## SETUP: Install and Import Libraries

In [None]:
!pip install -q plotly kaleido ipywidgets matplotlib seaborn numpy pandas scipy

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import seaborn as sns
from scipy import stats
from IPython.display import display, HTML, Markdown
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed
import warnings

warnings.filterwarnings('ignore')
sns.set_style("whitegrid")
np.random.seed(42)  # For reproducibility

print("Libraries loaded successfully!")
print("Ready to detect statistical patterns!")

## PART 1: THE PATTERN HUNTER'S STATISTICAL BRIEFING

From Chapter 2.3:
> "One of the most crucial skills in comparative anatomy is separating genuine 
> biological patterns from random coincidences. Our brains naturally seek patterns, 
> but sometimes detect them where none exist."

### Common Pattern Recognition Errors:

**1. CONFIRMATION BIAS**
- Noticing evidence that supports beliefs
- Ignoring contradictory data
- Example: Seeing only large-beaked birds in dry habitats

**2. SAMPLE SIZE NEGLECT**
- Drawing conclusions from insufficient observations
- Example: "All urban birds are fearless" after observing 3 birds

**3. POST HOC REASONING**
- Assuming temporal sequence = causation
- Example: Migration causes weather changes

**4. CLUSTERING ILLUSION**
- Perceiving patterns in random distributions
- Example: Random spacing interpreted as territoriality

### Essential Statistical Concepts:

**Population vs. Sample**
- Sample size affects reliability
- Random sampling reduces bias

**Variability & Central Tendency**
- Mean, median, mode
- Standard deviation quantifies spread

**Correlation vs. Causation**
- Variables may change together without causation
- Third variables often explain correlations

**Statistical Significance**
- p < 0.05 conventional threshold (5% chance)
- Effect size matters too!

## PART 2: URBAN VS. RURAL SPARROW ANALYSIS (from Chapter 2.3)

In [None]:
# Recreate the Exercise 2.1 from Chapter 2.3

display(Markdown("### Exercise 2.1: Urban vs. Rural House Sparrow Beak Analysis"))
display(Markdown("**Research Question**: Do urban house sparrows have different beak dimensions than rural populations?"))

# Generate realistic data based on textbook example
# Urban: mean 11.85 mm, Rural: mean 10.93 mm

# Urban sparrows (n=30)
np.random.seed(42)
urban_beaks = np.random.normal(loc=11.85, scale=0.8, size=30)

# Rural sparrows (n=30)
rural_beaks = np.random.normal(loc=10.93, scale=0.75, size=30)

# Create DataFrame
sparrow_data = pd.DataFrame({
    'Population': ['Urban']*30 + ['Rural']*30,
    'Beak_Length_mm': np.concatenate([urban_beaks, rural_beaks])
})

print("="*70)
print("SPARROW BEAK DATA SUMMARY")
print("="*70)
print("\nSample Sizes:")
print(f"  Urban population: {len(urban_beaks)} individuals")
print(f"  Rural population: {len(rural_beaks)} individuals")

# Descriptive statistics
urban_mean = urban_beaks.mean()
rural_mean = rural_beaks.mean()
urban_std = urban_beaks.std()
rural_std = rural_beaks.std()

print("\nDescriptive Statistics:")
print("-" * 70)
print(f"{'Statistic':<20} {'Urban':<15} {'Rural':<15}")
print("-" * 70)
print(f"{'Mean (mm)':<20} {urban_mean:<15.2f} {rural_mean:<15.2f}")
print(f"{'Std Dev (mm)':<20} {urban_std:<15.2f} {rural_std:<15.2f}")
print(f"{'Min (mm)':<20} {urban_beaks.min():<15.2f} {rural_beaks.min():<15.2f}")
print(f"{'Max (mm)':<20} {urban_beaks.max():<15.2f} {rural_beaks.max():<15.2f}")

# Calculate difference (as in textbook)
difference = urban_mean - rural_mean
percent_diff = (difference / rural_mean) * 100

print("\n" + "="*70)
print("PRELIMINARY ANALYSIS (from Chapter 2.3)")
print("="*70)
print(f"Difference in means: {urban_mean:.2f} - {rural_mean:.2f} = {difference:.2f} mm")
print(f"Percentage difference: ({difference:.2f}/{rural_mean:.2f}) √ó 100 = {percent_diff:.1f}%")
print(f"\nEffect size: {percent_diff:.1f}% difference")

if percent_diff > 5:
    print("‚úì Large enough to be potentially biologically meaningful")
else:
    print("‚ö† Small difference - biological significance unclear")

print("\nNext step: t-test to determine statistical significance")
print("="*70)

# Display sample data
display(Markdown("\n### Sample Data Preview"))
display(sparrow_data.head(10))

## PART 3: STATISTICAL SIGNIFICANCE TESTING

In [None]:
def perform_ttest(group1, group2, group1_name="Group 1", group2_name="Group 2",
                  alpha=0.05):
    """
    Perform independent samples t-test with detailed interpretation
    """
    
    # Calculate t-test
    t_stat, p_value = stats.ttest_ind(group1, group2)
    
    # Calculate effect size (Cohen's d)
    pooled_std = np.sqrt(((len(group1)-1)*group1.std()**2 + (len(group2)-1)*group2.std()**2) / 
                          (len(group1) + len(group2) - 2))
    cohens_d = (group1.mean() - group2.mean()) / pooled_std
    
    print("="*70)
    print("STATISTICAL SIGNIFICANCE TEST: Independent Samples t-test")
    print("="*70)
    
    print(f"\nHYPOTHESES:")
    print("-" * 70)
    print(f"  H‚ÇÄ (Null): No difference between {group1_name} and {group2_name}")
    print(f"  H‚ÇÅ (Alternative): Significant difference exists")
    print(f"  Significance level: Œ± = {alpha}")
    
    print(f"\nTEST RESULTS:")
    print("-" * 70)
    print(f"  t-statistic: {t_stat:.4f}")
    print(f"  p-value: {p_value:.4f}")
    print(f"  Degrees of freedom: {len(group1) + len(group2) - 2}")
    
    print(f"\nEFFECT SIZE:")
    print("-" * 70)
    print(f"  Cohen's d: {cohens_d:.4f}")
    
    if abs(cohens_d) < 0.2:
        effect_interpretation = "Small effect"
    elif abs(cohens_d) < 0.5:
        effect_interpretation = "Medium effect"
    elif abs(cohens_d) < 0.8:
        effect_interpretation = "Large effect"
    else:
        effect_interpretation = "Very large effect"
    
    print(f"  Interpretation: {effect_interpretation}")
    
    print(f"\nDECISION:")
    print("=" * 70)
    
    if p_value < alpha:
        print(f"  ‚úì REJECT null hypothesis (p = {p_value:.4f} < {alpha})")
        print(f"  ‚úì Statistically significant difference detected")
        print(f"  ‚úì Only {p_value*100:.2f}% chance this difference is due to random variation")
    else:
        print(f"  ‚úó FAIL TO REJECT null hypothesis (p = {p_value:.4f} ‚â• {alpha})")
        print(f"  ‚úó No statistically significant difference")
        print(f"  ‚úó {p_value*100:.1f}% chance difference is due to random variation")
    
    print(f"\nBIOLOGICAL INTERPRETATION:")
    print("-" * 70)
    
    if p_value < alpha and abs(cohens_d) > 0.5:
        print(f"  Statistical significance: YES")
        print(f"  Effect size: {effect_interpretation}")
        print(f"  Biological importance: LIKELY")
        print(f"\n  The difference between {group1_name} and {group2_name} is both")
        print(f"  statistically significant AND large enough to be biologically meaningful.")
        print(f"\n  Next steps: Investigate mechanisms (adaptation, plasticity, genetics)")
        
    elif p_value < alpha and abs(cohens_d) <= 0.5:
        print(f"  Statistical significance: YES")
        print(f"  Effect size: {effect_interpretation}")
        print(f"  Biological importance: UNCERTAIN")
        print(f"\n  While statistically significant, the small effect size suggests")
        print(f"  the biological importance may be limited. Consider practical significance.")
        
    else:
        print(f"  Statistical significance: NO")
        print(f"  Biological importance: UNLIKELY (but not ruled out)")
        print(f"\n  No evidence of difference. Could be due to:")
        print(f"    ‚Ä¢ True absence of difference")
        print(f"    ‚Ä¢ Insufficient sample size (Type II error)")
        print(f"    ‚Ä¢ High variability masking real differences")
    
    print("\n" + "="*70)
    
    return {
        't_statistic': t_stat,
        'p_value': p_value,
        'cohens_d': cohens_d,
        'significant': p_value < alpha
    }

# Test urban vs rural sparrows
results = perform_ttest(urban_beaks, rural_beaks, "Urban Sparrows", "Rural Sparrows")

## PART 4: VISUALIZING THE DIFFERENCE

In [None]:
# Create comprehensive visualization

fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=(
        'Distribution Comparison',
        'Box Plot with Individual Points',
        'Means with 95% Confidence Intervals',
        'Histogram Overlap'
    ),
    specs=[[{'type': 'violin'}, {'type': 'box'}],
           [{'type': 'scatter'}, {'type': 'histogram'}]]
)

# Plot 1: Violin plot
fig.add_trace(
    go.Violin(y=urban_beaks, name='Urban', fillcolor='red', opacity=0.6),
    row=1, col=1
)
fig.add_trace(
    go.Violin(y=rural_beaks, name='Rural', fillcolor='blue', opacity=0.6),
    row=1, col=1
)

# Plot 2: Box plot with points
fig.add_trace(
    go.Box(y=urban_beaks, name='Urban', marker_color='red', boxpoints='all', jitter=0.3),
    row=1, col=2
)
fig.add_trace(
    go.Box(y=rural_beaks, name='Rural', marker_color='blue', boxpoints='all', jitter=0.3),
    row=1, col=2
)

# Plot 3: Means with confidence intervals
# Calculate 95% CI
urban_ci = stats.t.interval(0.95, len(urban_beaks)-1, 
                             loc=urban_mean, 
                             scale=stats.sem(urban_beaks))
rural_ci = stats.t.interval(0.95, len(rural_beaks)-1,
                             loc=rural_mean,
                             scale=stats.sem(rural_beaks))

fig.add_trace(
    go.Scatter(
        x=['Urban', 'Rural'],
        y=[urban_mean, rural_mean],
        error_y=dict(
            type='data',
            array=[urban_mean - urban_ci[0], rural_mean - rural_ci[0]],
            visible=True
        ),
        mode='markers',
        marker=dict(size=15, color=['red', 'blue']),
        showlegend=False
    ),
    row=2, col=1
)

# Plot 4: Overlapping histograms
fig.add_trace(
    go.Histogram(x=urban_beaks, name='Urban', opacity=0.6, marker_color='red', nbinsx=15),
    row=2, col=2
)
fig.add_trace(
    go.Histogram(x=rural_beaks, name='Rural', opacity=0.6, marker_color='blue', nbinsx=15),
    row=2, col=2
)

# Update axes
fig.update_yaxes(title_text="Beak Length (mm)", row=1, col=1)
fig.update_yaxes(title_text="Beak Length (mm)", row=1, col=2)
fig.update_yaxes(title_text="Mean Beak Length (mm)", row=2, col=1)
fig.update_yaxes(title_text="Frequency", row=2, col=2)
fig.update_xaxes(title_text="Beak Length (mm)", row=2, col=2)

fig.update_layout(
    height=800,
    title_text="Urban vs. Rural House Sparrow Beak Analysis",
    showlegend=True,
    barmode='overlay'
)

fig.show()

print("\nVISUAL INTERPRETATION:")
print("‚Ä¢ Violin plots show full distribution shape")
print("‚Ä¢ Box plots reveal median, quartiles, and outliers")
print("‚Ä¢ Error bars show 95% confidence intervals (if they don't overlap, significant difference likely)")
print("‚Ä¢ Histogram overlap shows degree of separation between populations")

## PART 5: SAMPLE SIZE EFFECTS SIMULATOR

In [None]:
def sample_size_simulator(true_diff=0.92, std_dev=0.8, sample_size=5,
                          num_simulations=1000, alpha=0.05):
    """
    Demonstrate how sample size affects ability to detect real differences
    """
    
    significant_count = 0
    p_values = []
    
    # Run simulations
    for _ in range(num_simulations):
        # Generate samples
        group1 = np.random.normal(loc=11.85, scale=std_dev, size=sample_size)
        group2 = np.random.normal(loc=11.85-true_diff, scale=std_dev, size=sample_size)
        
        # Test
        _, p = stats.ttest_ind(group1, group2)
        p_values.append(p)
        
        if p < alpha:
            significant_count += 1
    
    power = significant_count / num_simulations
    
    print("="*70)
    print("SAMPLE SIZE EFFECT SIMULATION")
    print("="*70)
    print(f"\nSIMULATION PARAMETERS:")
    print("-" * 70)
    print(f"  True difference: {true_diff} mm")
    print(f"  Standard deviation: {std_dev} mm")
    print(f"  Sample size per group: {sample_size}")
    print(f"  Number of simulations: {num_simulations}")
    print(f"  Significance level: {alpha}")
    
    print(f"\nRESULTS:")
    print("=" * 70)
    print(f"  Significant results: {significant_count}/{num_simulations}")
    print(f"  Statistical Power: {power*100:.1f}%")
    print(f"  (Probability of detecting true difference with n={sample_size})")
    
    print(f"\nINTERPRETATION:")
    print("-" * 70)
    
    if power >= 0.80:
        print(f"  ‚úì EXCELLENT power ({power*100:.0f}%)")
        print(f"  Sample size of {sample_size} is adequate to detect this difference")
    elif power >= 0.50:
        print(f"  ‚ö† MODERATE power ({power*100:.0f}%)")
        print(f"  Sample size of {sample_size} will miss real difference {(1-power)*100:.0f}% of time")
        print(f"  Recommendation: Increase sample size")
    else:
        print(f"  ‚úó POOR power ({power*100:.0f}%)")
        print(f"  Sample size of {sample_size} is INSUFFICIENT")
        print(f"  Will MISS real difference {(1-power)*100:.0f}% of time (Type II error)")
        print(f"  ‚ö† THIS IS SAMPLE SIZE NEGLECT!")
    
    print("\n" + "="*70)
    
    return power

# Demonstrate with different sample sizes
print("DEMONSTRATION: Sample Size Neglect")
print("\nScenario: True difference exists (0.92 mm), but sample size varies\n")

sample_sizes = [5, 10, 20, 30, 50]
powers = []

for n in sample_sizes:
    power = sample_size_simulator(sample_size=n, num_simulations=1000)
    powers.append(power)
    print()

In [None]:
# Visualize power analysis

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=sample_sizes,
    y=powers,
    mode='lines+markers',
    marker=dict(size=12, color=powers, colorscale='RdYlGn', 
                showscale=True, cmin=0, cmax=1,
                colorbar=dict(title="Power")),
    line=dict(width=3),
    name='Statistical Power'
))

# Add reference line at 80% power
fig.add_hline(y=0.80, line_dash="dash", line_color="green",
              annotation_text="Adequate Power (80%)",
              annotation_position="right")

fig.update_layout(
    title="Statistical Power vs. Sample Size",
    xaxis_title="Sample Size per Group",
    yaxis_title="Statistical Power (Probability of Detecting True Difference)",
    height=500,
    yaxis_range=[0, 1]
)

fig.show()

print("\nKEY LESSON: SAMPLE SIZE MATTERS!")
print("="*70)
print("\nWith n=5: Only {}% power - will MISS real difference {}% of time!".format(
    int(powers[0]*100), int((1-powers[0])*100)))
print("With n=30: {}% power - adequate to detect real differences".format(
    int(powers[3]*100)))
print("\nThis is why 'concluding all urban birds are fearless after observing")
print("three bold individuals' is SAMPLE SIZE NEGLECT (from Chapter 2.3)!")
print("="*70)

## PART 6: CORRELATION VS. CAUSATION

In [None]:
def correlation_causation_demo():
    """
    Demonstrate spurious correlations and third variables
    """
    
    # Generate correlated but non-causal data
    np.random.seed(42)
    
    # Example 1: Ice cream sales and shark attacks (both driven by temperature/season)
    months = np.arange(12)
    temperature = 15 + 15*np.sin(2*np.pi*months/12)  # Seasonal variation
    
    ice_cream_sales = 100 + 50*np.sin(2*np.pi*months/12) + np.random.normal(0, 5, 12)
    shark_attacks = 5 + 8*np.sin(2*np.pi*months/12) + np.random.normal(0, 1, 12)
    
    # Calculate correlation
    corr, p_value = stats.pearsonr(ice_cream_sales, shark_attacks)
    
    print("="*70)
    print("CORRELATION VS. CAUSATION: Classic Examples")
    print("="*70)
    
    print("\nEXAMPLE 1: Ice Cream Sales vs. Shark Attacks")
    print("-" * 70)
    print(f"  Correlation coefficient: r = {corr:.3f}")
    print(f"  p-value: {p_value:.4f}")
    print(f"  Statistical significance: {'YES' if p_value < 0.05 else 'NO'}")
    
    print(f"\n  ‚úó DOES NOT MEAN: Ice cream causes shark attacks!")
    print(f"  ‚úì ACTUALLY MEANS: Both vary with temperature/season")
    print(f"  ‚úì Third variable: Warmer weather increases both")
    
    # Anatomical example
    print("\n" + "="*70)
    print("ANATOMICAL EXAMPLE: Body Size and Migration Distance")
    print("="*70)
    
    # Simulate data
    metabolic_rate = np.random.uniform(50, 500, 30)  # Hidden third variable
    body_size = 10 + 0.5*metabolic_rate + np.random.normal(0, 10, 30)
    migration_distance = 100 + 3*metabolic_rate + np.random.normal(0, 50, 30)
    
    corr_bs_md, p_bs_md = stats.pearsonr(body_size, migration_distance)
    
    print(f"\n  Observed correlation: Body size ‚Üî Migration distance")
    print(f"  r = {corr_bs_md:.3f}, p = {p_bs_md:.4f}")
    
    print(f"\n  ‚ö† TEMPTING CONCLUSION: 'Larger birds migrate farther'")
    print(f"  ‚úì BETTER INTERPRETATION: Both driven by metabolic capacity")
    print(f"  ‚úì Third variable: Higher metabolic rate enables:")
    print(f"      ‚Ä¢ Larger body size (more energy available)")
    print(f"      ‚Ä¢ Longer migrations (flight endurance)")
    
    print("\n" + "="*70)
    print("KEY PRINCIPLE: Correlation ‚â† Causation")
    print("="*70)
    print("\n  Always consider:")
    print("    1. Could a third variable explain both?")
    print("    2. Is the relationship experimentally testable?")
    print("    3. What's the proposed mechanism?")
    print("    4. Are there counterexamples?")
    print("\n" + "="*70)
    
    return {
        'ice_cream_sales': ice_cream_sales,
        'shark_attacks': shark_attacks,
        'temperature': temperature,
        'body_size': body_size,
        'migration_distance': migration_distance,
        'metabolic_rate': metabolic_rate
    }

demo_data = correlation_causation_demo()

## PART 7: COGNITIVE BIAS DETECTOR

In [None]:
def bias_detector_quiz():
    """
    Interactive quiz to identify cognitive biases from Chapter 2.3
    """
    
    scenarios = [
        {
            'scenario': "You observe 4 pigeons in an urban park, all of which approach humans for food. You conclude 'Urban pigeons have lost their fear of humans.'",
            'bias': 'Sample Size Neglect',
            'explanation': 'Drawing broad conclusions from only 4 individuals. Sample size is insufficient. Could have observed only the boldest individuals by chance.'
        },
        {
            'scenario': "You notice that birds with bright plumage are often seen near flowers. You conclude 'Bright plumage causes birds to seek flowers.'",
            'bias': 'Post Hoc Reasoning / Correlation-Causation Error',
            'explanation': 'Confusing correlation with causation. Third variable (nectar feeding ecology) likely drives both bright coloration AND flower-seeking behavior.'
        },
        {
            'scenario': "You're studying desert lizards and hypothesize they have large heads. While collecting data, you unconsciously notice and measure large-headed individuals more often.",
            'bias': 'Confirmation Bias',
            'explanation': 'Selectively noticing evidence that confirms pre-existing hypothesis while overlooking contradictory data (small-headed lizards).'
        },
        {
            'scenario': "You observe that in a small woodland area, three tree cavities each have one owl. You conclude 'Owls maintain strict territorial spacing.'",
            'bias': 'Clustering Illusion',
            'explanation': 'Perceiving pattern in what could be random distribution. With only 3 observations and limited space, even spacing could occur by chance.'
        }
    ]
    
    print("="*70)
    print("COGNITIVE BIAS DETECTOR: Can You Spot the Errors?")
    print("="*70)
    print("\nFrom Chapter 2.3 - Common Pattern Recognition Errors:")
    print("  1. Confirmation Bias")
    print("  2. Sample Size Neglect")
    print("  3. Post Hoc Reasoning")
    print("  4. Clustering Illusion")
    print("\n" + "="*70)
    
    for i, item in enumerate(scenarios, 1):
        print(f"\nSCENARIO {i}:")
        print("-" * 70)
        print(item['scenario'])
        print(f"\nANSWER: {item['bias']}")
        print(f"\nEXPLANATION:")
        print(f"  {item['explanation']}")
        print("\n" + "="*70)
    
    print("\nREMEMBER:")
    print("  ‚Ä¢ Our brains naturally seek patterns (evolutionary advantage)")
    print("  ‚Ä¢ But this can lead to seeing patterns that don't exist")
    print("  ‚Ä¢ Statistical thinking helps distinguish signal from noise")
    print("  ‚Ä¢ Always question your conclusions with: sample size, alternative explanations, third variables")
    print("="*70)

bias_detector_quiz()

## PART 8: EXPORT AND SUMMARY

In [None]:
# Export sparrow data and analysis
from google.colab import files
from datetime import datetime

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

print("="*70)
print("EXPORTING STATISTICAL ANALYSIS")
print("="*70)

# Export sparrow data
csv_filename = f'sparrow_beak_data_{timestamp}.csv'
sparrow_data.to_csv(csv_filename, index=False)
print(f"\n‚úì Exported: {csv_filename}")

# Export summary
summary_filename = f'statistical_analysis_summary_{timestamp}.txt'
with open(summary_filename, 'w') as f:
    f.write("="*70 + "\n")
    f.write("STATISTICAL PATTERN RECOGNITION - ANALYSIS SUMMARY\n")
    f.write("="*70 + "\n\n")
    
    f.write("URBAN VS. RURAL SPARROW BEAK ANALYSIS\n")
    f.write("-" * 70 + "\n")
    f.write(f"Urban mean: {urban_mean:.2f} mm (n=30)\n")
    f.write(f"Rural mean: {rural_mean:.2f} mm (n=30)\n")
    f.write(f"Difference: {difference:.2f} mm ({percent_diff:.1f}%)\n")
    f.write(f"\nt-statistic: {results['t_statistic']:.4f}\n")
    f.write(f"p-value: {results['p_value']:.4f}\n")
    f.write(f"Cohen's d: {results['cohens_d']:.4f}\n")
    f.write(f"Significant: {'YES' if results['significant'] else 'NO'}\n")
    
    f.write("\n" + "="*70 + "\n")
    f.write("FOUR COGNITIVE BIASES TO AVOID\n")
    f.write("="*70 + "\n")
    f.write("\n1. CONFIRMATION BIAS\n")
    f.write("   Notice only evidence supporting beliefs\n")
    f.write("   Solution: Actively seek contradictory data\n")
    
    f.write("\n2. SAMPLE SIZE NEGLECT\n")
    f.write("   Conclusions from insufficient observations\n")
    f.write("   Solution: Calculate statistical power, increase n\n")
    
    f.write("\n3. POST HOC REASONING\n")
    f.write("   Temporal sequence ‚â† causation\n")
    f.write("   Solution: Look for mechanisms, third variables\n")
    
    f.write("\n4. CLUSTERING ILLUSION\n")
    f.write("   Patterns in random distributions\n")
    f.write("   Solution: Statistical tests, null models\n")
    
    f.write("\n" + "="*70 + "\n")
    f.write("KEY PRINCIPLES\n")
    f.write("="*70 + "\n")
    f.write("‚Ä¢ Correlation ‚â† Causation\n")
    f.write("‚Ä¢ Sample size matters (power analysis)\n")
    f.write("‚Ä¢ Effect size > statistical significance\n")
    f.write("‚Ä¢ Always consider alternative explanations\n")
    f.write("‚Ä¢ p < 0.05 is conventional, not absolute\n")
    
    f.write("\n" + "="*70 + "\n")
    f.write("END OF REPORT\n")
    f.write("="*70 + "\n")

print(f"‚úì Exported: {summary_filename}")

files.download(csv_filename)
files.download(summary_filename)

print("\n‚úì Export complete!")
print("="*70)

---

## CONGRATULATIONS, PATTERN HUNTER!

You have mastered:
- ‚úÖ Statistical significance testing (t-tests, p-values)
- ‚úÖ Effect size calculation and interpretation (Cohen's d)
- ‚úÖ Sample size effects and statistical power
- ‚úÖ Correlation vs. causation distinction
- ‚úÖ Four cognitive biases (confirmation, sample size, post hoc, clustering)

### Pattern Hunter Skills Earned:
- **Statistical Literacy**: Interpret p-values and effect sizes correctly
- **Critical Thinking**: Identify cognitive biases in reasoning
- **Experimental Design**: Calculate appropriate sample sizes
- **Causal Inference**: Distinguish correlation from causation

---

### Key Takeaways:

**From Urban Sparrow Analysis:**
- 8.4% beak difference between populations
- Statistically significant (p < 0.05)
- Large effect size (biologically meaningful)
- Requires mechanistic follow-up

**Sample Size Matters:**
- n=5: Only ~30% power (MISS 70% of real differences!)
- n=30: ~85% power (adequate)
- This is why "3 bold pigeons" doesn't prove anything!

**Always Remember:**
- p < 0.05 doesn't mean "true" - it means "unlikely to be chance"
- Effect size > statistical significance
- Correlation ‚â† Causation (look for third variables!)
- Our brains see patterns even in random data

---

### Connect to Chapter 2:
- Return to **Section 2.3** for theoretical context
- Apply to your own observations from **Lab 2.1**
- Proceed to **Section 2.4** (Functional Analysis)

---

**The Statistical Code**: Signal rises above noise only with proper tools and thinking.

*Happy Pattern Hunting!* üîçüìä