# üéØ Hypothesis Testing Fundamentals
## Understanding Power, Errors, p-values, and Confidence

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/The-Pattern-Hunter/interactive-ecology-biometry/blob/main/unit-4-biometry/notebooks/06_hypothesis_testing_fundamentals.ipynb)

---

> *"The p-value is NOT the probability that the null hypothesis is true!"*

### üéØ Learning Objectives

By the end of this notebook, you will deeply understand:
1. **What p-values REALLY mean** (and common misconceptions)
2. **Significance level (Œ±)** - Why 0.05?
3. **Type I and Type II errors** - The two ways to be wrong
4. **Statistical Power** - The probability of detecting real effects
5. **Confidence Intervals** - What they tell us beyond p-values
6. **Degrees of Freedom** - Intuitive understanding
7. **Effect Size** - Statistical vs. biological significance

In [None]:
# Setup
!pip install numpy scipy plotly pandas -q

import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from scipy import stats
import pandas as pd

np.random.seed(42)

print("‚úÖ Ready to deeply understand hypothesis testing!")
print("üéØ Let's demystify p-values, power, and errors!")

---

## üìä Part 1: What is a p-value? (The Truth!)

### ‚ùå Common WRONG Interpretations:

1. ‚ùå "p = 0.03 means there's a 3% chance the null hypothesis is true"
2. ‚ùå "p = 0.03 means there's a 97% chance the alternative is true"
3. ‚ùå "p = 0.03 means we've proven the alternative hypothesis"
4. ‚ùå "p < 0.05 means the result is important"
5. ‚ùå "p > 0.05 means there's no effect"

### ‚úÖ CORRECT Definition:

**p-value = The probability of observing data this extreme (or more extreme) IF the null hypothesis were true.**

In other words:
> "If there's truly NO effect, how surprising is our observed data?"

### üé≤ The Courtroom Analogy

| Legal System | Statistical Testing |
|--------------|---------------------|
| **Assumption**: Innocent until proven guilty | **Assumption**: H‚ÇÄ is true until proven otherwise |
| **Evidence**: Witness testimony, DNA, etc. | **Evidence**: Your data |
| **Verdict**: Guilty or Not Guilty | **Decision**: Reject H‚ÇÄ or Fail to Reject |
| **"Beyond reasonable doubt"** | **p < 0.05 (conventional threshold)** |
| Not guilty ‚â† innocent | Fail to reject ‚â† H‚ÇÄ is true |

---

### Visual Understanding of p-values

In [None]:
# Visualize what p-value represents
from scipy import stats as sp_stats

# Scenario: Testing if mean plant height = 50cm
# H‚ÇÄ: Œº = 50
# We observe sample mean = 53cm, with SE = 1.5

null_mean = 50
observed_mean = 53
se = 1.5

# Calculate t-statistic
t_stat = (observed_mean - null_mean) / se
df = 29  # Assume n=30
p_value = 2 * (1 - sp_stats.t.cdf(abs(t_stat), df))  # Two-tailed

# Create distribution under H‚ÇÄ
x = np.linspace(null_mean - 4*se, null_mean + 4*se, 1000)
y = sp_stats.t.pdf((x - null_mean)/se, df) / se

fig = go.Figure()

# Distribution under H‚ÇÄ
fig.add_trace(go.Scatter(
    x=x, y=y,
    mode='lines',
    fill='tozeroy',
    line=dict(color='lightblue', width=2),
    name='Distribution if H‚ÇÄ is true',
    fillcolor='rgba(173, 216, 230, 0.3)'
))

# Shade p-value region (two-tailed)
critical_value = null_mean + abs(observed_mean - null_mean)
x_right = x[x >= critical_value]
y_right = sp_stats.t.pdf((x_right - null_mean)/se, df) / se

x_left = x[x <= (null_mean - abs(observed_mean - null_mean))]
y_left = sp_stats.t.pdf((x_left - null_mean)/se, df) / se

fig.add_trace(go.Scatter(
    x=x_right, y=y_right,
    fill='tozeroy',
    mode='none',
    fillcolor='rgba(255, 0, 0, 0.3)',
    name=f'p-value region (p={p_value:.4f})',
    showlegend=True
))

fig.add_trace(go.Scatter(
    x=x_left, y=y_left,
    fill='tozeroy',
    mode='none',
    fillcolor='rgba(255, 0, 0, 0.3)',
    showlegend=False
))

# Mark observed value
fig.add_vline(x=observed_mean, line_dash="dash", line_color="red", line_width=3,
              annotation_text=f"Observed = {observed_mean}cm")

# Mark null hypothesis value
fig.add_vline(x=null_mean, line_dash="solid", line_color="black", line_width=2,
              annotation_text=f"H‚ÇÄ: Œº = {null_mean}cm")

fig.update_layout(
    title=f"üéØ What p-value Represents<br><sub>If H‚ÇÄ is true (Œº=50), the red shaded areas show how likely we'd see data this extreme</sub>",
    xaxis_title="Plant Height (cm)",
    yaxis_title="Probability Density",
    height=500,
    template='plotly_white'
)

fig.show()

print(f"\nüìä Interpretation:")
print(f"   Observed sample mean: {observed_mean} cm")
print(f"   Null hypothesis: Œº = {null_mean} cm")
print(f"   t-statistic: {t_stat:.2f}")
print(f"   p-value: {p_value:.4f}")
print(f"\nüí° What this means:")
print(f"   IF the true mean were actually {null_mean}cm,")
print(f"   we'd see data this extreme (or more) only {p_value*100:.2f}% of the time.")
print(f"\n   The red areas = {p_value*100:.2f}% probability")
print(f"   Since {p_value:.4f} < 0.05, we reject H‚ÇÄ")

---

## üéöÔ∏è Part 2: Significance Level (Œ±) - Why 0.05?

### What is Œ±?

**Œ± (alpha)** = The threshold we set BEFORE seeing data
- If p < Œ± ‚Üí Reject H‚ÇÄ
- If p ‚â• Œ± ‚Üí Fail to reject H‚ÇÄ

### Why 0.05?

**Historical Accident!** Ronald Fisher (1920s) suggested it as a convenient cutoff.

**It means**: "We're willing to be wrong 5% of the time when H‚ÇÄ is actually true"

### Different Fields Use Different Œ±:

| Field | Common Œ± | Why? |
|-------|----------|------|
| Ecology, Biology | 0.05 | Historical convention |
| Particle Physics | 0.0000003 | Need very high certainty |
| Social Sciences | 0.05 or 0.10 | Effects often subtle |
| Medical Trials | 0.01 | Patient safety critical |

### The Œ± Level Sets Your Threshold:

In [None]:
# Interactive comparison of different Œ± levels
alpha_levels = [0.10, 0.05, 0.01, 0.001]

fig = go.Figure()

# Standard normal distribution
x = np.linspace(-4, 4, 1000)
y = sp_stats.norm.pdf(x)

for alpha in alpha_levels:
    # Find critical value for two-tailed test
    critical_z = sp_stats.norm.ppf(1 - alpha/2)
    
    # Shade rejection regions
    x_reject_right = x[x >= critical_z]
    y_reject_right = sp_stats.norm.pdf(x_reject_right)
    
    x_reject_left = x[x <= -critical_z]
    y_reject_left = sp_stats.norm.pdf(x_reject_left)
    
    visible = (alpha == 0.05)  # Show 0.05 by default
    
    # Base distribution
    fig.add_trace(go.Scatter(
        x=x, y=y,
        mode='lines',
        line=dict(color='lightblue', width=2),
        fill='tozeroy',
        fillcolor='rgba(173, 216, 230, 0.3)',
        name=f'Œ± = {alpha}',
        visible=visible,
        showlegend=False
    ))
    
    # Rejection region
    fig.add_trace(go.Scatter(
        x=np.concatenate([x_reject_left, x_reject_right]),
        y=np.concatenate([y_reject_left, y_reject_right]),
        fill='tozeroy',
        mode='none',
        fillcolor='rgba(255, 0, 0, 0.5)',
        name=f'Reject H‚ÇÄ (Œ±={alpha})',
        visible=visible
    ))

# Create buttons
buttons = []
for i, alpha in enumerate(alpha_levels):
    visible = [False] * (len(alpha_levels) * 2)
    visible[i*2] = True
    visible[i*2 + 1] = True
    
    critical_z = sp_stats.norm.ppf(1 - alpha/2)
    
    buttons.append(
        dict(
            label=f'Œ± = {alpha}',
            method='update',
            args=[{'visible': visible},
                  {'title': f'üéöÔ∏è Significance Level Œ± = {alpha}<br><sub>Red areas = {alpha*100}% total, Critical value = ¬±{critical_z:.2f}</sub>'}]
        )
    )

fig.update_layout(
    updatemenus=[dict(
        type='buttons',
        direction='down',
        x=0.7, y=1.15,
        buttons=buttons
    )],
    title='üéöÔ∏è Significance Level Œ± = 0.05<br><sub>Red areas = 5% total, Critical value = ¬±1.96</sub>',
    xaxis_title='z-score',
    yaxis_title='Probability Density',
    height=500,
    template='plotly_white'
)

fig.show()

print("\nüí° Key Points:")
print("   ‚Ä¢ Smaller Œ± = More stringent (harder to reject H‚ÇÄ)")
print("   ‚Ä¢ Smaller Œ± = Smaller red rejection regions")
print("   ‚Ä¢ Œ± = 0.05 means we reject H‚ÇÄ if p < 0.05")
print("   ‚Ä¢ Œ± is chosen BEFORE collecting data!")

---

## ‚ö†Ô∏è Part 3: Type I and Type II Errors

### The Four Possible Outcomes

When we test a hypothesis, there are 4 possible situations:

|  | **H‚ÇÄ is Actually TRUE** | **H‚ÇÄ is Actually FALSE** |
|---|------------------------|-------------------------|
| **We Reject H‚ÇÄ** | üö® **Type I Error** (Œ±) | ‚úÖ **Correct Decision** (Power) |
| **We Fail to Reject H‚ÇÄ** | ‚úÖ **Correct Decision** (1-Œ±) | üö® **Type II Error** (Œ≤) |

### Type I Error (False Positive) üö®

**Definition**: Rejecting H‚ÇÄ when it's actually true

**Probability**: Œ± (significance level)

**Example**: 
- H‚ÇÄ: Fertilizer has no effect
- Reality: Fertilizer truly has no effect
- Our conclusion: "Fertilizer works!" ‚ùå WRONG

**Real-world impact**: False discoveries, wasted resources

### Type II Error (False Negative) üö®

**Definition**: Failing to reject H‚ÇÄ when it's actually false

**Probability**: Œ≤ (beta)

**Example**:
- H‚ÇÄ: Pesticide is safe
- Reality: Pesticide is actually harmful
- Our conclusion: "Pesticide is safe" ‚ùå WRONG

**Real-world impact**: Missed discoveries, continued harm

### The Medical Diagnosis Analogy

|  | **Patient is HEALTHY** | **Patient is SICK** |
|---|----------------------|--------------------|
| **Test says SICK** | Type I Error (False Alarm) | Correct (True Positive) |
| **Test says HEALTHY** | Correct (True Negative) | Type II Error (Missed Disease) |

In [None]:
# Visualize Type I and Type II errors
from scipy.stats import norm

# Two scenarios
mu_null = 0  # H‚ÇÄ: mean = 0
mu_alt = 2   # H‚ÇÅ: mean = 2 (true effect)
sigma = 1
alpha = 0.05
n = 30
se = sigma / np.sqrt(n)

# Critical value
critical_value = norm.ppf(1 - alpha) * se  # One-tailed for simplicity

# Create x values
x = np.linspace(-2, 4, 1000)

# Distributions
y_null = norm.pdf(x, mu_null, se)
y_alt = norm.pdf(x, mu_alt, se)

fig = go.Figure()

# Distribution under H‚ÇÄ
fig.add_trace(go.Scatter(
    x=x, y=y_null,
    mode='lines',
    line=dict(color='blue', width=2),
    fill='tozeroy',
    fillcolor='rgba(0, 0, 255, 0.2)',
    name='H‚ÇÄ is true (no effect)'
))

# Type I error region (Œ±)
x_type1 = x[x >= critical_value]
y_type1 = norm.pdf(x_type1, mu_null, se)
fig.add_trace(go.Scatter(
    x=x_type1, y=y_type1,
    fill='tozeroy',
    mode='none',
    fillcolor='rgba(255, 0, 0, 0.5)',
    name=f'Type I Error (Œ±={alpha})'
))

# Distribution under H‚ÇÅ
fig.add_trace(go.Scatter(
    x=x, y=y_alt,
    mode='lines',
    line=dict(color='green', width=2),
    fill='tozeroy',
    fillcolor='rgba(0, 255, 0, 0.2)',
    name='H‚ÇÅ is true (effect exists)'
))

# Type II error region (Œ≤)
x_type2 = x[x < critical_value]
y_type2 = norm.pdf(x_type2, mu_alt, se)
fig.add_trace(go.Scatter(
    x=x_type2, y=y_type2,
    fill='tozeroy',
    mode='none',
    fillcolor='rgba(255, 165, 0, 0.5)',
    name='Type II Error (Œ≤)'
))

# Power region
x_power = x[x >= critical_value]
y_power = norm.pdf(x_power, mu_alt, se)
fig.add_trace(go.Scatter(
    x=x_power, y=y_power,
    fill='tozeroy',
    mode='none',
    fillcolor='rgba(0, 128, 0, 0.6)',
    name='Power (1-Œ≤)'
))

# Critical value line
fig.add_vline(x=critical_value, line_dash="dash", line_color="black", line_width=3,
              annotation_text=f"Critical value = {critical_value:.2f}")

# Calculate beta and power
beta = norm.cdf(critical_value, mu_alt, se)
power = 1 - beta

fig.update_layout(
    title=f"‚ö†Ô∏è Type I and Type II Errors<br><sub>Œ±={alpha} (red), Œ≤={beta:.3f} (orange), Power={power:.3f} (dark green)</sub>",
    xaxis_title="Test Statistic",
    yaxis_title="Probability Density",
    height=600,
    template='plotly_white'
)

fig.show()

print(f"\nüìä Error Probabilities:")
print(f"   Type I Error (Œ±): {alpha*100}% - False positive rate")
print(f"   Type II Error (Œ≤): {beta*100:.1f}% - False negative rate")
print(f"   Statistical Power (1-Œ≤): {power*100:.1f}% - True positive rate")

print(f"\nüí° Interpretation:")
print(f"   ‚Ä¢ RED (Œ±): If no effect exists, we'll wrongly claim one {alpha*100}% of the time")
print(f"   ‚Ä¢ ORANGE (Œ≤): If effect exists, we'll miss it {beta*100:.1f}% of the time")
print(f"   ‚Ä¢ DARK GREEN (Power): If effect exists, we'll detect it {power*100:.1f}% of the time")

### The Trade-off:

**You CANNOT eliminate both errors simultaneously!**

- ‚¨áÔ∏è Decrease Œ± (be more conservative) ‚Üí ‚¨ÜÔ∏è Increase Œ≤ (more false negatives)
- ‚¨ÜÔ∏è Increase Œ± (be less conservative) ‚Üí ‚¨áÔ∏è Decrease Œ≤ (fewer false negatives)

**Solution**: Increase sample size! This reduces both errors.

---

## üí™ Part 4: Statistical Power (1 - Œ≤)

### What is Power?

**Power = The probability of CORRECTLY rejecting H‚ÇÄ when it's actually false**

In other words: "If there's a real effect, what's the chance we'll detect it?"

**Formula**: Power = 1 - Œ≤

### What Affects Power?

1. **Sample Size (n)** ‚¨ÜÔ∏è n ‚Üí ‚¨ÜÔ∏è Power
2. **Effect Size** (how big the difference is) ‚¨ÜÔ∏è Effect ‚Üí ‚¨ÜÔ∏è Power
3. **Significance Level (Œ±)** ‚¨ÜÔ∏è Œ± ‚Üí ‚¨ÜÔ∏è Power (but more Type I errors)
4. **Variability (œÉ)** ‚¨áÔ∏è œÉ ‚Üí ‚¨ÜÔ∏è Power

### Recommended Power:

**Convention: Power ‚â• 0.80 (80%)**

This means: "If there's a real effect, we have at least 80% chance of detecting it"

In [None]:
# Interactive power analysis - Effect of sample size
effect_size = 0.5  # Cohen's d
alpha = 0.05
sample_sizes = np.arange(10, 201, 10)

powers = []
for n in sample_sizes:
    # Calculate power using t-distribution
    ncp = effect_size * np.sqrt(n)  # non-centrality parameter
    critical_t = sp_stats.t.ppf(1 - alpha/2, n-1)
    power = 1 - sp_stats.nct.cdf(critical_t, n-1, ncp) + sp_stats.nct.cdf(-critical_t, n-1, ncp)
    powers.append(power)

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=sample_sizes,
    y=powers,
    mode='lines+markers',
    line=dict(color='green', width=3),
    marker=dict(size=8),
    name='Statistical Power'
))

# Add 80% power line
fig.add_hline(y=0.80, line_dash="dash", line_color="red",
              annotation_text="Recommended Power = 0.80")

# Find n for 80% power
n_for_80 = sample_sizes[np.argmin(np.abs(np.array(powers) - 0.80))]
fig.add_vline(x=n_for_80, line_dash="dot", line_color="blue",
              annotation_text=f"n ‚âà {n_for_80} needed")

fig.update_layout(
    title=f"üí™ Power Analysis: Effect of Sample Size<br><sub>Effect size (Cohen's d) = {effect_size}, Œ± = {alpha}</sub>",
    xaxis_title="Sample Size (n)",
    yaxis_title="Statistical Power (1 - Œ≤)",
    height=500,
    template='plotly_white',
    yaxis=dict(range=[0, 1])
)

fig.show()

print(f"\nüìä Power Analysis Results:")
print(f"   To achieve 80% power with effect size d={effect_size}:")
print(f"   You need approximately n={n_for_80} samples per group")
print(f"\nüí° Key Insights:")
print(f"   ‚Ä¢ Small samples (n<30) have low power (<50%)")
print(f"   ‚Ä¢ Power increases rapidly at first, then plateaus")
print(f"   ‚Ä¢ Doubling n doesn't double power")
print(f"   ‚Ä¢ Always do power analysis BEFORE collecting data!")

In [None]:
# Interactive power analysis - Effect of effect size
n = 30  # Fixed sample size
effect_sizes = np.linspace(0.1, 2.0, 50)
alpha = 0.05

powers_by_effect = []
for d in effect_sizes:
    ncp = d * np.sqrt(n)
    critical_t = sp_stats.t.ppf(1 - alpha/2, n-1)
    power = 1 - sp_stats.nct.cdf(critical_t, n-1, ncp) + sp_stats.nct.cdf(-critical_t, n-1, ncp)
    powers_by_effect.append(power)

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=effect_sizes,
    y=powers_by_effect,
    mode='lines',
    line=dict(color='purple', width=3),
    fill='tozeroy',
    fillcolor='rgba(128, 0, 128, 0.2)'
))

# Add reference lines for Cohen's d categories
fig.add_vline(x=0.2, line_dash="dot", line_color="gray", annotation_text="Small (d=0.2)")
fig.add_vline(x=0.5, line_dash="dot", line_color="gray", annotation_text="Medium (d=0.5)")
fig.add_vline(x=0.8, line_dash="dot", line_color="gray", annotation_text="Large (d=0.8)")

fig.add_hline(y=0.80, line_dash="dash", line_color="red",
              annotation_text="80% Power")

fig.update_layout(
    title=f"üí™ Power vs Effect Size<br><sub>Sample size n={n}, Œ±={alpha}</sub>",
    xaxis_title="Effect Size (Cohen's d)",
    yaxis_title="Statistical Power",
    height=500,
    template='plotly_white'
)

fig.show()

print("\nüí° Cohen's d Effect Size Guidelines:")
print("   ‚Ä¢ Small: d = 0.2 (subtle difference)")
print("   ‚Ä¢ Medium: d = 0.5 (moderate difference)")
print("   ‚Ä¢ Large: d = 0.8 (obvious difference)")
print(f"\n   With n={n} samples:")
small_idx = np.argmin(np.abs(effect_sizes - 0.2))
med_idx = np.argmin(np.abs(effect_sizes - 0.5))
large_idx = np.argmin(np.abs(effect_sizes - 0.8))
print(f"   ‚Ä¢ Small effect (d=0.2): Power = {powers_by_effect[small_idx]:.1%}")
print(f"   ‚Ä¢ Medium effect (d=0.5): Power = {powers_by_effect[med_idx]:.1%}")
print(f"   ‚Ä¢ Large effect (d=0.8): Power = {powers_by_effect[large_idx]:.1%}")

---

## üìè Part 5: Confidence Intervals - Beyond p-values

### What is a Confidence Interval?

**Definition**: A range of plausible values for the population parameter

**95% CI = A range that would contain the true parameter in 95% of repeated experiments**

### ‚ùå Common WRONG Interpretations:

1. ‚ùå "There's a 95% probability the true value is in this interval"
2. ‚ùå "95% of the data falls in this interval"

### ‚úÖ CORRECT Interpretation:

"If we repeated this study 100 times, about 95 of the resulting confidence intervals would contain the true population parameter."

### Why CIs are Better Than p-values:

1. **Show magnitude of effect** (not just significance)
2. **Show precision** (width indicates uncertainty)
3. **Give range of plausible values**
4. **More informative than binary yes/no**

In [None]:
# Demonstrate confidence intervals
np.random.seed(42)

# True population
true_mean = 50
true_sd = 10
n_samples = 30

# Take 20 different samples and calculate CIs
n_experiments = 20
cis = []
sample_means = []

for i in range(n_experiments):
    sample = np.random.normal(true_mean, true_sd, n_samples)
    mean = np.mean(sample)
    se = np.std(sample, ddof=1) / np.sqrt(n_samples)
    
    # 95% CI
    ci_lower = mean - 1.96 * se
    ci_upper = mean + 1.96 * se
    
    contains_true = (ci_lower <= true_mean <= ci_upper)
    
    cis.append((ci_lower, ci_upper, contains_true))
    sample_means.append(mean)

# Plot
fig = go.Figure()

for i, (ci_lower, ci_upper, contains) in enumerate(cis):
    color = 'blue' if contains else 'red'
    
    # CI line
    fig.add_trace(go.Scatter(
        x=[ci_lower, ci_upper],
        y=[i, i],
        mode='lines',
        line=dict(color=color, width=2),
        showlegend=False,
        hovertemplate=f'Experiment {i+1}<br>CI: [{ci_lower:.1f}, {ci_upper:.1f}]<extra></extra>'
    ))
    
    # Sample mean
    fig.add_trace(go.Scatter(
        x=[sample_means[i]],
        y=[i],
        mode='markers',
        marker=dict(color=color, size=10, symbol='circle'),
        showlegend=False
    ))

# True mean line
fig.add_vline(x=true_mean, line_dash="dash", line_color="black", line_width=3,
              annotation_text=f"True Mean = {true_mean}")

n_contain = sum(c[2] for c in cis)

fig.update_layout(
    title=f"üìè 20 Different Experiments, Each with 95% Confidence Interval<br><sub>{n_contain}/20 ({n_contain/20*100:.0f}%) intervals contain the true mean</sub>",
    xaxis_title="Value",
    yaxis_title="Experiment Number",
    height=600,
    template='plotly_white',
    yaxis=dict(range=[-1, n_experiments])
)

fig.show()

print(f"\nüìä Results:")
print(f"   True population mean: {true_mean}")
print(f"   Number of experiments: {n_experiments}")
print(f"   CIs containing true mean: {n_contain}/{n_experiments} ({n_contain/20*100:.0f}%)")
print(f"   CIs missing true mean (RED): {n_experiments-n_contain}")
print(f"\nüí° This is what '95% confidence' means:")
print(f"   In the long run, about 95% of CIs will contain the true parameter")
print(f"   Any single CI either contains it (100%) or doesn't (0%) - we just don't know which!")

### CI Width Tells Us About Precision:

In [None]:
# Compare CI width with different sample sizes
true_mean = 50
true_sd = 10
sample_sizes = [10, 30, 100, 300]

fig = go.Figure()

for i, n in enumerate(sample_sizes):
    sample = np.random.normal(true_mean, true_sd, n)
    mean = np.mean(sample)
    se = true_sd / np.sqrt(n)  # Using known SD for comparison
    
    ci_lower = mean - 1.96 * se
    ci_upper = mean + 1.96 * se
    ci_width = ci_upper - ci_lower
    
    # Draw CI
    fig.add_trace(go.Scatter(
        x=[ci_lower, ci_upper],
        y=[i, i],
        mode='lines',
        line=dict(color='blue', width=4),
        name=f'n={n} (width={ci_width:.1f})'
    ))
    
    # Mean point
    fig.add_trace(go.Scatter(
        x=[mean],
        y=[i],
        mode='markers',
        marker=dict(color='red', size=12),
        showlegend=False,
        hovertemplate=f'n={n}<br>Mean={mean:.2f}<br>CI: [{ci_lower:.2f}, {ci_upper:.2f}]<extra></extra>'
    ))

fig.add_vline(x=true_mean, line_dash="dash", line_color="black",
              annotation_text="True Mean")

fig.update_layout(
    title="üìè Confidence Interval Width vs Sample Size<br><sub>Larger n = Narrower CI = More Precision</sub>",
    xaxis_title="Value",
    yaxis=dict(ticktext=[f'n={n}' for n in sample_sizes],
               tickvals=list(range(len(sample_sizes)))),
    height=500,
    template='plotly_white'
)

fig.show()

print("\nüí° Key Insight:")
print("   ‚Ä¢ Larger sample size ‚Üí Narrower CI ‚Üí More precision")
print("   ‚Ä¢ Narrow CI = We know the true value more precisely")
print("   ‚Ä¢ Wide CI = More uncertainty about true value")
print("   ‚Ä¢ CI width decreases proportional to 1/‚àön")

---

## üî¢ Part 6: Degrees of Freedom (df) - Intuitive Understanding

### What Are Degrees of Freedom?

**Simple Definition**: The number of values that are free to vary after we impose certain constraints.

### The Ice Cream Analogy:

Imagine you have 5 scoops of ice cream to distribute among 5 friends, and the average must be exactly 1 scoop per person:

- Friend 1: You can choose ANY amount (0, 2, 3, etc.) ‚úÖ FREE
- Friend 2: You can choose ANY amount ‚úÖ FREE
- Friend 3: You can choose ANY amount ‚úÖ FREE
- Friend 4: You can choose ANY amount ‚úÖ FREE
- Friend 5: You MUST take whatever's left ‚ùå NOT FREE

**Degrees of Freedom = 5 - 1 = 4**

Once you know 4 values and the mean, the 5th value is determined!

### Why "n - 1" for Sample Standard Deviation?

When calculating sample SD:
1. We use the sample mean (xÃÑ) - this is a CONSTRAINT
2. Once we know (n-1) deviations from xÃÑ, the last one is determined
3. Therefore: df = n - 1

In [None]:
# Demonstrate degrees of freedom with constraints
n = 5
target_mean = 10
target_sum = n * target_mean  # = 50

print("üç¶ The Ice Cream Distribution Example:")
print(f"   You have {n} scoops to distribute among {n} friends")
print(f"   Constraint: Average must be exactly {target_mean} scoops")
print(f"   Total sum must be: {target_sum} scoops\n")

# Choose values for first 4 friends
values = np.array([8, 12, 15, 7])  # First 4 values (chosen freely)
sum_so_far = np.sum(values)
last_value = target_sum - sum_so_far  # 5th value is DETERMINED

print("   Friend 1: 8 scoops ‚úÖ (your choice)")
print("   Friend 2: 12 scoops ‚úÖ (your choice)")
print("   Friend 3: 15 scoops ‚úÖ (your choice)")
print("   Friend 4: 7 scoops ‚úÖ (your choice)")
print(f"   Friend 5: {last_value} scoops ‚ùå (MUST be this value!)")

all_values = np.append(values, last_value)
actual_mean = np.mean(all_values)

print(f"\n   Check: Mean = {actual_mean} ‚úÖ")
print(f"   Degrees of Freedom: n - 1 = {n} - 1 = {n-1}")
print(f"\nüí° Once you know {n-1} values and the mean, the last value is determined!")

### Why Does df Matter?

**Different df ‚Üí Different t-distributions**

In [None]:
# Show how t-distribution changes with df
x = np.linspace(-4, 4, 1000)
df_values = [1, 3, 10, 30, 100]

fig = go.Figure()

# Standard normal (infinite df)
fig.add_trace(go.Scatter(
    x=x,
    y=sp_stats.norm.pdf(x),
    mode='lines',
    line=dict(color='black', width=3, dash='dash'),
    name='Normal (df=‚àû)'
))

colors = ['red', 'orange', 'green', 'blue', 'purple']
for df, color in zip(df_values, colors):
    fig.add_trace(go.Scatter(
        x=x,
        y=sp_stats.t.pdf(x, df),
        mode='lines',
        line=dict(color=color, width=2),
        name=f't-distribution (df={df})'
    ))

fig.update_layout(
    title="üî¢ t-Distribution vs Degrees of Freedom<br><sub>As df increases, t-distribution approaches Normal</sub>",
    xaxis_title="Value",
    yaxis_title="Probability Density",
    height=500,
    template='plotly_white'
)

fig.show()

print("\nüí° Key Observations:")
print("   ‚Ä¢ Low df (small n): Fatter tails, more spread out")
print("   ‚Ä¢ High df (large n): Approaches Normal distribution")
print("   ‚Ä¢ df=30+: Practically identical to Normal")
print("   ‚Ä¢ Lower df ‚Üí Need larger t-value to be significant")

# Show critical values
print("\nüìä Critical t-values for p=0.05 (two-tailed):")
for df in [5, 10, 30, 100]:
    crit = sp_stats.t.ppf(0.975, df)
    print(f"   df={df:3}: t_crit = {crit:.3f}")
print(f"   Normal:  z_crit = {sp_stats.norm.ppf(0.975):.3f}")

### Common df in Different Tests:

| Test | Degrees of Freedom |
|------|-------------------|
| **One-sample t-test** | df = n - 1 |
| **Independent t-test** | df = n‚ÇÅ + n‚ÇÇ - 2 |
| **Paired t-test** | df = n - 1 (n = number of pairs) |
| **Chi-square** | df = (rows - 1) √ó (columns - 1) |
| **ANOVA** | df_between = k - 1, df_within = N - k |

---

## üìä Part 7: Effect Size - Statistical vs Biological Significance

### The Problem with p-values Alone:

**p-value depends on sample size!**

With large enough n:
- Even TINY differences become "statistically significant" (p < 0.05)
- But they might be biologically meaningless!

### Example: Height Difference

In [None]:
# Demonstrate: tiny effect, large n ‚Üí significant p-value
np.random.seed(42)

# Two groups with TINY difference
group1_mean = 170.0  # cm
group2_mean = 170.5  # cm (only 5mm difference!)
sd = 10

sample_sizes = [10, 50, 100, 500, 1000]
results = []

for n in sample_sizes:
    group1 = np.random.normal(group1_mean, sd, n)
    group2 = np.random.normal(group2_mean, sd, n)
    
    t_stat, p_val = sp_stats.ttest_ind(group1, group2)
    
    # Calculate Cohen's d
    pooled_sd = np.sqrt((np.var(group1, ddof=1) + np.var(group2, ddof=1)) / 2)
    cohens_d = (np.mean(group2) - np.mean(group1)) / pooled_sd
    
    results.append({
        'n': n,
        'p_value': p_val,
        'cohens_d': cohens_d,
        'significant': 'Yes*' if p_val < 0.05 else 'No'
    })

df_results = pd.DataFrame(results)

print("üéØ Same Effect Size, Different Sample Sizes:\n")
print(f"   True difference: {group2_mean - group1_mean} cm (only 5mm!)\n")
print("Sample Size | p-value  | Cohen's d | Significant?")
print("------------|----------|-----------|-------------")
for _, row in df_results.iterrows():
    print(f"   n={row['n']:4}   | {row['p_value']:8.4f} |   {row['cohens_d']:5.3f}   |    {row['significant']}")

print("\nüí° Key Lesson:")
print("   ‚Ä¢ With n=1000, we get p<0.05 (statistically significant)")
print("   ‚Ä¢ But the actual difference is only 5mm (0.3% of height)")
print("   ‚Ä¢ Cohen's d ‚âà 0.05 (trivial effect size)")
print("   ‚Ä¢ Statistically significant ‚â† Biologically important!")

# Visualize
fig = make_subplots(rows=1, cols=2, subplot_titles=('p-value vs Sample Size', "Cohen's d (Effect Size)"))

fig.add_trace(go.Scatter(
    x=df_results['n'],
    y=df_results['p_value'],
    mode='lines+markers',
    marker=dict(size=10, color='blue'),
    name='p-value'
), row=1, col=1)

fig.add_hline(y=0.05, line_dash="dash", line_color="red", row=1, col=1,
              annotation_text="Œ±=0.05")

fig.add_trace(go.Scatter(
    x=df_results['n'],
    y=df_results['cohens_d'],
    mode='lines+markers',
    marker=dict(size=10, color='green'),
    name="Cohen's d"
), row=1, col=2)

fig.add_hline(y=0.2, line_dash="dot", line_color="gray", row=1, col=2,
              annotation_text="Small effect")

fig.update_xaxes(title_text="Sample Size", type="log", row=1, col=1)
fig.update_xaxes(title_text="Sample Size", type="log", row=1, col=2)
fig.update_yaxes(title_text="p-value", row=1, col=1)
fig.update_yaxes(title_text="Cohen's d", row=1, col=2)

fig.update_layout(height=400, template='plotly_white', showlegend=False)
fig.show()

### Cohen's d Effect Size Guidelines:

| Cohen's d | Interpretation | Example |
|-----------|----------------|----------|
| d < 0.2 | **Trivial** | Barely noticeable |
| d = 0.2 | **Small** | Subtle but detectable |
| d = 0.5 | **Medium** | Clearly noticeable |
| d = 0.8 | **Large** | Obvious difference |
| d > 1.2 | **Very Large** | Dramatic difference |

### The Right Approach:

**Always report BOTH:**
1. ‚úÖ **Statistical significance** (p-value)
2. ‚úÖ **Effect size** (Cohen's d, confidence interval)
3. ‚úÖ **Biological significance** (does it matter in practice?)

---

## üéì Summary: The Complete Picture

### What We've Learned:

#### 1. **p-value**
‚úÖ = Probability of data this extreme IF H‚ÇÄ is true  
‚ùå ‚â† Probability that H‚ÇÄ is true  
‚ùå ‚â† Proof that H‚ÇÅ is true  

#### 2. **Significance Level (Œ±)**
- Chosen BEFORE data collection
- Convention: Œ± = 0.05
- = Acceptable Type I error rate

#### 3. **Type I Error (Œ±)**
- False positive
- Rejecting true H‚ÇÄ
- "Crying wolf"

#### 4. **Type II Error (Œ≤)**
- False negative
- Failing to reject false H‚ÇÄ
- "Missing the signal"

#### 5. **Statistical Power (1-Œ≤)**
- Probability of detecting real effects
- Aim for ‚â• 80%
- Increases with: larger n, larger effect, higher Œ±

#### 6. **Confidence Intervals**
- Range of plausible values
- Shows magnitude + precision
- More informative than p-values alone

#### 7. **Degrees of Freedom**
- Number of independent values
- = n - (number of constraints)
- Affects critical values

#### 8. **Effect Size**
- Magnitude of difference
- Independent of sample size
- Cohen's d: 0.2=small, 0.5=medium, 0.8=large

### The Golden Rules:

1. ‚úÖ **Set Œ± before collecting data**
2. ‚úÖ **Do power analysis before study**
3. ‚úÖ **Report effect sizes AND p-values**
4. ‚úÖ **Report confidence intervals**
5. ‚úÖ **Consider biological significance**
6. ‚ùå **Never p-hack** (don't try multiple tests until significant)
7. ‚ùå **Never cherry-pick** (report all tests, not just significant ones)
8. ‚ùå **Never confuse** statistical with biological significance

---

## üìö Recommended Next Steps:

1. Review Notebook 05 (Hypothesis Testing Applications)
2. Practice with real datasets
3. Always calculate effect sizes
4. Always report confidence intervals
5. Think critically about biological meaning

---

<div align="center">

**Made with üíö by The Pattern Hunter Team**

**üéâ You now understand hypothesis testing at a deep level! üéâ**

[üè† Repository](https://github.com/The-Pattern-Hunter/interactive-ecology-biometry) | 
[üìì Previous: Hypothesis Testing](05_hypothesis_testing.ipynb) | 
[ü©∫ Unit 4 Home](../../)

</div>