# üß™ Hypothesis Testing
## Making Confident Conclusions from Data

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/The-Pattern-Hunter/interactive-ecology-biometry/blob/main/unit-4-biometry/notebooks/05_hypothesis_testing.ipynb)

---

> *"Is this pattern REAL, or just random chance?"*

### üéØ Learning Objectives

By the end of this notebook, you will:
1. Understand the **logic of hypothesis testing**
2. Know **Null vs. Alternative hypotheses**
3. Perform and interpret **Chi-square (œá¬≤) test**
4. Perform and interpret **t-test**
5. Understand **p-values** in biological context
6. Apply to **real ecological examples**

---

## ü©∫ The Stethoscope Analogy - Final Step!

### Step 7: Making the Diagnosis

**Doctor's Process:**
1. Listen with stethoscope (collect data)
2. Observe patterns (analyze)
3. **Make diagnosis** (hypothesis test)
4. How confident are you? (p-value)

**Ecologist's Process:**
1. Collect data (sampling)
2. Analyze patterns (distributions, statistics)
3. **Test hypothesis** (statistical test)
4. How confident? (p-value)

### The Question:
**Is what I'm seeing a REAL pattern, or just random noise?**

In [None]:
# Setup
!pip install numpy scipy plotly pandas -q

import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from scipy import stats
import pandas as pd

np.random.seed(42)

print("‚úÖ Ready to test hypotheses!")
print("üß™ Let's make confident conclusions!")

---

## üìä Part 1: The Logic of Hypothesis Testing

### The Core Concept:

**We assume there's NO effect (Null Hypothesis), then look for evidence to reject that assumption.**

It's like a court trial:
- **Null Hypothesis (H‚ÇÄ)**: Defendant is innocent (no effect, no difference)
- **Alternative Hypothesis (H‚ÇÅ)**: Defendant is guilty (there IS an effect)
- **Evidence**: Your data
- **Verdict**: Reject H‚ÇÄ or Fail to reject H‚ÇÄ
- **Confidence**: p-value (strength of evidence)

### Key Terms:

| Term | Meaning |
|------|----------|
| **H‚ÇÄ (Null)** | No effect, no difference, status quo |
| **H‚ÇÅ (Alternative)** | There IS an effect or difference |
| **p-value** | Probability of seeing data this extreme if H‚ÇÄ is true |
| **Œ± (alpha)** | Significance level (usually 0.05 = 5%) |
| **Reject H‚ÇÄ** | Evidence strong enough to say effect is real |
| **Fail to reject H‚ÇÄ** | Not enough evidence (doesn't prove H‚ÇÄ true!) |

---

## üî¨ Part 2: Chi-Square Test (œá¬≤)

### When to Use:
**Comparing OBSERVED frequencies to EXPECTED frequencies (categorical data)**

### Ecological Examples:
- Do seed germination rates follow Mendelian ratios?
- Are animals distributed randomly across habitats?
- Do flower colors match expected genetic ratios?

### Formula:
$$œá¬≤ = \sum \frac{(Observed - Expected)¬≤}{Expected}$$

---

### Example 1: Mendelian Genetics (Seed Color)

**Scenario**: Cross two heterozygous pea plants (Yy √ó Yy)

**Expected ratio**: 3 Yellow : 1 Green (3:1)

**You plant 100 seeds and get**: 68 Yellow, 32 Green

**Question**: Does this match expected 3:1 ratio?

In [None]:
# Seed color chi-square test
observed = np.array([68, 32])  # Yellow, Green
expected_ratio = np.array([3, 1])
total = np.sum(observed)
expected = (expected_ratio / np.sum(expected_ratio)) * total

print("üå± Seed Color Inheritance Test\n")
print("H‚ÇÄ: Observed follows 3:1 ratio (Mendelian inheritance)")
print("H‚ÇÅ: Observed does NOT follow 3:1 ratio\n")

print("Color  | Observed | Expected | (O-E)¬≤ / E")
print("-------|----------|----------|------------")

chi_square_terms = []
for i, color in enumerate(['Yellow', 'Green']):
    O = observed[i]
    E = expected[i]
    term = (O - E)**2 / E
    chi_square_terms.append(term)
    print(f"{color:6} |   {O:3}    |  {E:5.1f}   |   {term:.3f}")

chi_square = np.sum(chi_square_terms)
df = len(observed) - 1  # degrees of freedom
p_value = 1 - stats.chi2.cdf(chi_square, df)

print(f"\nüìä Chi-square statistic: œá¬≤ = {chi_square:.3f}")
print(f"   Degrees of freedom: df = {df}")
print(f"   p-value: {p_value:.4f}")

alpha = 0.05
print(f"\nüéØ Decision (Œ± = {alpha}):")
if p_value < alpha:
    print(f"   p < {alpha} ‚Üí REJECT H‚ÇÄ")
    print("   ‚úÖ Observed does NOT follow 3:1 ratio")
    print("   ‚Üí Something other than simple Mendelian inheritance")
else:
    print(f"   p ‚â• {alpha} ‚Üí FAIL TO REJECT H‚ÇÄ")
    print("   ‚úÖ Observed is consistent with 3:1 ratio")
    print("   ‚Üí Data supports Mendelian inheritance")

print(f"\nüí° Interpretation:")
print(f"   p = {p_value:.4f} means: If the true ratio is 3:1,")
print(f"   there's a {p_value*100:.2f}% chance of seeing data this extreme (or more) by random chance.")

In [None]:
# Visualize observed vs expected
fig = go.Figure()

categories = ['Yellow', 'Green']

fig.add_trace(go.Bar(
    x=categories,
    y=expected,
    name='Expected (3:1)',
    marker_color='lightblue',
    opacity=0.6
))

fig.add_trace(go.Bar(
    x=categories,
    y=observed,
    name='Observed',
    marker_color='orange',
    opacity=0.8
))

fig.update_layout(
    title=f"üå± Observed vs Expected Seed Colors<br><sub>œá¬≤ = {chi_square:.2f}, p = {p_value:.4f}</sub>",
    xaxis_title="Seed Color",
    yaxis_title="Count",
    barmode='group',
    height=400,
    template='plotly_white'
)

fig.show()

---

### Example 2: Habitat Preference Test

**Scenario**: Are butterflies distributed randomly across 4 habitat types?

**H‚ÇÄ**: Butterflies choose habitats randomly (equal preference)

**H‚ÇÅ**: Butterflies prefer some habitats over others

In [None]:
# Habitat preference test
habitats = ['Forest', 'Meadow', 'Wetland', 'Urban']
observed_counts = np.array([45, 62, 28, 15])  # 150 butterflies total
total_butterflies = np.sum(observed_counts)

# If random, expect equal distribution
expected_counts = np.array([total_butterflies / 4] * 4)

print("ü¶ã Butterfly Habitat Preference Test\n")
print("H‚ÇÄ: Butterflies choose habitats randomly (equal preference)")
print("H‚ÇÅ: Butterflies have habitat preferences\n")

print("Habitat  | Observed | Expected | (O-E)¬≤ / E")
print("---------|----------|----------|------------")

chi_terms = []
for i, habitat in enumerate(habitats):
    O = observed_counts[i]
    E = expected_counts[i]
    term = (O - E)**2 / E
    chi_terms.append(term)
    print(f"{habitat:8} |   {O:3}    |  {E:5.1f}   |   {term:.3f}")

chi_sq = np.sum(chi_terms)
df = len(habitats) - 1
p_val = 1 - stats.chi2.cdf(chi_sq, df)

print(f"\nüìä Results:")
print(f"   œá¬≤ = {chi_sq:.3f}")
print(f"   df = {df}")
print(f"   p-value = {p_val:.4f}")

print(f"\nüéØ Conclusion:")
if p_val < 0.05:
    print(f"   p < 0.05 ‚Üí REJECT H‚ÇÄ")
    print("   ‚úÖ Butterflies show habitat preference!")
    print(f"   ‚Üí Meadow is preferred ({observed_counts[1]} vs {expected_counts[1]:.0f} expected)")
    print(f"   ‚Üí Urban is avoided ({observed_counts[3]} vs {expected_counts[3]:.0f} expected)")
else:
    print(f"   p ‚â• 0.05 ‚Üí FAIL TO REJECT H‚ÇÄ")
    print("   Distribution is consistent with random choice")

# Visualize
fig = go.Figure()

fig.add_trace(go.Bar(
    x=habitats,
    y=expected_counts,
    name='Expected (random)',
    marker_color='lightgray',
    opacity=0.6
))

fig.add_trace(go.Bar(
    x=habitats,
    y=observed_counts,
    name='Observed',
    marker_color='purple',
    opacity=0.8
))

fig.update_layout(
    title=f"ü¶ã Butterfly Distribution Across Habitats<br><sub>œá¬≤ = {chi_sq:.2f}, p = {p_val:.4f}</sub>",
    xaxis_title="Habitat Type",
    yaxis_title="Number of Butterflies",
    barmode='group',
    height=450,
    template='plotly_white'
)

fig.show()

---

## üìä Part 3: t-Test

### When to Use:
**Comparing MEANS of two groups (continuous data)**

### Three Types:
1. **One-sample t-test**: Compare sample mean to known value
2. **Independent t-test**: Compare means of two separate groups
3. **Paired t-test**: Compare means of same group measured twice

### Ecological Examples:
- Does fertilizer affect plant growth? (independent)
- Do birds gain weight during migration? (paired)
- Is average tree height different from 15m? (one-sample)

---

### Example 3: Independent t-test (Treatment vs Control)

**Scenario**: Does a new fertilizer increase plant growth?

**H‚ÇÄ**: Œº_treatment = Œº_control (no difference)

**H‚ÇÅ**: Œº_treatment ‚â† Œº_control (there IS a difference)

In [None]:
# Generate realistic data
np.random.seed(42)

# Control group (no fertilizer)
control = np.random.normal(25, 5, 30)  # mean=25cm, sd=5, n=30

# Treatment group (with fertilizer) - slightly taller
treatment = np.random.normal(29, 5, 30)  # mean=29cm, sd=5, n=30

print("üå± Fertilizer Effect on Plant Growth\n")
print("H‚ÇÄ: Œº_treatment = Œº_control (fertilizer has NO effect)")
print("H‚ÇÅ: Œº_treatment ‚â† Œº_control (fertilizer HAS an effect)\n")

# Calculate descriptive statistics
print("Group     | n  | Mean  | Std Dev")
print("----------|-------|-------|--------")
print(f"Control   | {len(control):2} | {np.mean(control):5.2f} | {np.std(control, ddof=1):5.2f}")
print(f"Treatment | {len(treatment):2} | {np.mean(treatment):5.2f} | {np.std(treatment, ddof=1):5.2f}")
print(f"Difference|    | {np.mean(treatment) - np.mean(control):5.2f} |")

# Perform independent t-test
t_stat, p_value = stats.ttest_ind(treatment, control)

print(f"\nüìä t-test Results:")
print(f"   t-statistic = {t_stat:.3f}")
print(f"   p-value = {p_value:.4f}")
print(f"   df = {len(control) + len(treatment) - 2}")

print(f"\nüéØ Conclusion (Œ± = 0.05):")
if p_value < 0.05:
    print(f"   p < 0.05 ‚Üí REJECT H‚ÇÄ")
    print("   ‚úÖ Fertilizer DOES affect plant growth!")
    print(f"   ‚Üí Treatment plants are {np.mean(treatment) - np.mean(control):.2f}cm taller on average")
else:
    print(f"   p ‚â• 0.05 ‚Üí FAIL TO REJECT H‚ÇÄ")
    print("   No significant evidence that fertilizer affects growth")

print(f"\nüí° Interpretation:")
print(f"   If fertilizer had NO effect, there's only a {p_value*100:.2f}% chance")
print(f"   of seeing a difference this large (or larger) by random chance.")

In [None]:
# Visualize the comparison
fig = go.Figure()

fig.add_trace(go.Box(
    y=control,
    name='Control',
    marker_color='lightcoral',
    boxmean='sd'
))

fig.add_trace(go.Box(
    y=treatment,
    name='Treatment (Fertilizer)',
    marker_color='lightgreen',
    boxmean='sd'
))

fig.update_layout(
    title=f"üå± Plant Height: Control vs Treatment<br><sub>t = {t_stat:.2f}, p = {p_value:.4f}</sub>",
    yaxis_title="Plant Height (cm)",
    height=500,
    template='plotly_white'
)

fig.show()

# Show distributions
fig2 = go.Figure()

fig2.add_trace(go.Histogram(
    x=control,
    name='Control',
    marker_color='lightcoral',
    opacity=0.6,
    nbinsx=15
))

fig2.add_trace(go.Histogram(
    x=treatment,
    name='Treatment',
    marker_color='lightgreen',
    opacity=0.6,
    nbinsx=15
))

fig2.add_vline(x=np.mean(control), line_dash="dash", line_color="red",
               annotation_text=f"Control mean = {np.mean(control):.1f}")
fig2.add_vline(x=np.mean(treatment), line_dash="dash", line_color="green",
               annotation_text=f"Treatment mean = {np.mean(treatment):.1f}")

fig2.update_layout(
    title="Distribution of Plant Heights",
    xaxis_title="Height (cm)",
    yaxis_title="Frequency",
    barmode='overlay',
    height=400,
    template='plotly_white'
)

fig2.show()

---

### Example 4: Paired t-test (Before vs After)

**Scenario**: Do birds gain weight during migration stopover?

**H‚ÇÄ**: Œº_difference = 0 (no weight change)

**H‚ÇÅ**: Œº_difference ‚â† 0 (weight changes)

In [None]:
# Generate paired data (same 20 birds, weighed twice)
np.random.seed(42)
n_birds = 20

# Weight at arrival (grams)
weight_arrival = np.random.normal(45, 5, n_birds)

# Weight after 5 days (most birds gain weight)
weight_gain = np.random.normal(3, 2, n_birds)  # average gain 3g
weight_departure = weight_arrival + weight_gain

print("üê¶ Bird Weight Change During Stopover\n")
print("H‚ÇÄ: Œº_difference = 0 (no weight change)")
print("H‚ÇÅ: Œº_difference ‚â† 0 (birds gain or lose weight)\n")

# Create DataFrame
df = pd.DataFrame({
    'Bird_ID': range(1, n_birds+1),
    'Arrival_Weight': weight_arrival,
    'Departure_Weight': weight_departure,
    'Change': weight_departure - weight_arrival
})

print("First 10 birds:")
print(df.head(10).to_string(index=False, float_format='%.2f'))

print(f"\nüìä Summary Statistics:")
print(f"   Mean arrival weight: {np.mean(weight_arrival):.2f} g")
print(f"   Mean departure weight: {np.mean(weight_departure):.2f} g")
print(f"   Mean weight change: {np.mean(df['Change']):.2f} g")
print(f"   SD of change: {np.std(df['Change'], ddof=1):.2f} g")

# Paired t-test
t_stat, p_value = stats.ttest_rel(weight_departure, weight_arrival)

print(f"\nüß™ Paired t-test Results:")
print(f"   t-statistic = {t_stat:.3f}")
print(f"   p-value = {p_value:.4f}")
print(f"   df = {n_birds - 1}")

print(f"\nüéØ Conclusion:")
if p_value < 0.05:
    print(f"   p < 0.05 ‚Üí REJECT H‚ÇÄ")
    print("   ‚úÖ Birds DO change weight during stopover!")
    if np.mean(df['Change']) > 0:
        print(f"   ‚Üí Birds GAIN an average of {np.mean(df['Change']):.2f}g")
    else:
        print(f"   ‚Üí Birds LOSE an average of {abs(np.mean(df['Change'])):.2f}g")
else:
    print(f"   p ‚â• 0.05 ‚Üí FAIL TO REJECT H‚ÇÄ")
    print("   No significant weight change detected")

In [None]:
# Visualize paired data
fig = go.Figure()

# Lines connecting paired measurements
for i in range(n_birds):
    fig.add_trace(go.Scatter(
        x=['Arrival', 'Departure'],
        y=[weight_arrival[i], weight_departure[i]],
        mode='lines+markers',
        line=dict(color='lightgray', width=1),
        marker=dict(size=6),
        showlegend=False,
        hovertemplate=f'Bird {i+1}<br>Weight: %{{y:.1f}}g<extra></extra>'
    ))

# Add mean lines
fig.add_trace(go.Scatter(
    x=['Arrival', 'Departure'],
    y=[np.mean(weight_arrival), np.mean(weight_departure)],
    mode='lines+markers',
    line=dict(color='red', width=4),
    marker=dict(size=12, symbol='diamond'),
    name='Mean'
))

fig.update_layout(
    title=f"üê¶ Bird Weight Change (Paired Data)<br><sub>t = {t_stat:.2f}, p = {p_value:.4f}</sub>",
    xaxis_title="Time Point",
    yaxis_title="Weight (g)",
    height=500,
    template='plotly_white'
)

fig.show()

# Histogram of changes
fig2 = go.Figure()

fig2.add_trace(go.Histogram(
    x=df['Change'],
    nbinsx=15,
    marker_color='skyblue',
    opacity=0.7
))

fig2.add_vline(x=0, line_dash="dash", line_color="red",
               annotation_text="No change")
fig2.add_vline(x=np.mean(df['Change']), line_dash="solid", line_color="blue",
               annotation_text=f"Mean change = {np.mean(df['Change']):.2f}g")

fig2.update_layout(
    title="Distribution of Weight Changes",
    xaxis_title="Weight Change (g)",
    yaxis_title="Frequency",
    height=400,
    template='plotly_white'
)

fig2.show()

---

## üìã Part 4: Understanding p-values

### What p-value REALLY means:

**p-value = Probability of seeing data this extreme (or more extreme) IF the null hypothesis is true**

### Common Misunderstandings:

‚ùå **WRONG**: "p = 0.03 means there's a 3% chance H‚ÇÄ is true"  
‚úÖ **RIGHT**: "p = 0.03 means if H‚ÇÄ were true, we'd see data this extreme only 3% of the time"

‚ùå **WRONG**: "p = 0.08 proves H‚ÇÄ is true"  
‚úÖ **RIGHT**: "p = 0.08 means we don't have strong enough evidence to reject H‚ÇÄ"

### Significance Levels:

| p-value | Interpretation |
|---------|----------------|
| p < 0.001 | Very strong evidence against H‚ÇÄ (***) |
| p < 0.01 | Strong evidence against H‚ÇÄ (**) |
| p < 0.05 | Moderate evidence against H‚ÇÄ (*) |
| p ‚â• 0.05 | Insufficient evidence to reject H‚ÇÄ (n.s.) |

In [None]:
# Interactive p-value visualization
from scipy import stats as sp_stats

# Create scenarios with different effect sizes
np.random.seed(42)

scenarios = []
for diff in [0, 2, 4, 6, 8]:
    group1 = np.random.normal(50, 10, 30)
    group2 = np.random.normal(50 + diff, 10, 30)
    t, p = sp_stats.ttest_ind(group2, group1)
    scenarios.append({
        'difference': diff,
        't_stat': t,
        'p_value': p,
        'group1': group1,
        'group2': group2
    })

print("üìä How Effect Size Affects p-value\n")
print("True Diff | t-stat | p-value | Significant?")
print("----------|--------|---------|-------------")
for s in scenarios:
    sig = "Yes (*)" if s['p_value'] < 0.05 else "No (n.s.)"
    print(f"   {s['difference']:2}     | {s['t_stat']:6.2f} | {s['p_value']:7.4f} | {sig}")

print("\nüí° Pattern:")
print("   ‚Ä¢ Larger true difference ‚Üí Larger t-statistic ‚Üí Smaller p-value")
print("   ‚Ä¢ Small differences may not be statistically significant")
print("   ‚Ä¢ But statistical significance ‚â† biological importance!")

---

## üéì Summary

### Key Takeaways:

‚úÖ **Hypothesis testing** = Making decisions from data  
‚úÖ **H‚ÇÄ** = No effect (what we try to reject)  
‚úÖ **H‚ÇÅ** = There IS an effect  
‚úÖ **Chi-square test** = Comparing observed vs expected frequencies  
‚úÖ **t-test** = Comparing means of groups  
‚úÖ **p-value** = Probability of data if H‚ÇÄ is true  
‚úÖ **p < 0.05** = Conventional cutoff for "significant"  

### Decision Framework:

```
1. State hypotheses (H‚ÇÄ and H‚ÇÅ)
2. Choose appropriate test
3. Calculate test statistic
4. Find p-value
5. Compare to Œ± (usually 0.05)
6. Make decision
7. Interpret in biological context!
```

### Which Test to Use?

| Data Type | Question | Test |
|-----------|----------|------|
| Categorical | Observed vs Expected? | Chi-square |
| Continuous | Two independent groups? | Independent t-test |
| Continuous | Same group, two times? | Paired t-test |
| Continuous | One sample vs value? | One-sample t-test |
| Continuous | 3+ groups? | ANOVA (not covered) |

---

## üéä Congratulations!

### You've Completed the 7-Step Pattern Hunter Journey:

```
1. üî≠ OBSERVE ‚Üí Real biological data
2. üìä DISCOVER ‚Üí See the shape (distributions)
3. ü©∫ UNDERSTAND ‚Üí The 8 filters
4. üî¢ MAP ‚Üí Random variables
5. üéõÔ∏è CHOOSE ‚Üí Right distribution
6. üìè MEASURE ‚Üí Central tendency & dispersion
7. üß™ TEST ‚Üí Hypothesis testing ‚úÖ
```

**You now have the complete statistical stethoscope toolkit!**

---

<div align="center">

**Made with üíö by The Pattern Hunter Team**

**üéâ You've mastered Unit 4: Biometry! üéâ**

[üè† Repository](https://github.com/The-Pattern-Hunter/interactive-ecology-biometry) | 
[üìì Previous: Sampling](04_sampling_techniques.ipynb) | 
[ü©∫ Unit 4 Home](../../)

</div>