# The Shape of Uncertainty and Hypothesis Testing

**An Interactive Exploration**

*Developed for students at Kuchinda College, Sambalpur University*

---

## Core Question

**How do we decide if an observation is "unusual" or "expected"?**

The answer lies in understanding the **shape of uncertainty** - the probability distribution that describes what we expect to see under different scenarios.

In this notebook, you'll:
1. See how different distributions create different "shapes of uncertainty"
2. Understand how these shapes define what's "rare" vs "common"
3. Explore how changing parameters affects hypothesis testing decisions
4. Apply these concepts to real biological scenarios

In [None]:
# Import required libraries
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from ipywidgets import interact, FloatSlider, IntSlider, Dropdown, fixed
import warnings
warnings.filterwarnings('ignore')

# Set style for better visualization
plt.style.use('seaborn-v0_8-darkgrid')
np.random.seed(42)

## Part 1: The Shape of Uncertainty - Coin Flipping Example

Let's start with the simplest case: flipping a coin multiple times.

**Question:** If we flip a coin 10 times, how many heads do we expect?

The answer depends on whether the coin is fair or biased. The **shape of uncertainty** shows us all possible outcomes and their probabilities.

In [None]:
def plot_binomial_hypothesis(n_flips=10, true_p=0.5, observed_heads=7, null_p=0.5):
    """
    Visualize how the shape of uncertainty helps in hypothesis testing.
    
    Parameters:
    - n_flips: number of coin flips
    - true_p: true probability of heads (what nature actually is)
    - observed_heads: what we actually observed
    - null_p: probability under null hypothesis (usually 0.5 for fair coin)
    """
    
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Possible outcomes
    x = np.arange(0, n_flips + 1)
    
    # Left plot: Distribution under NULL hypothesis
    null_probs = stats.binom.pmf(x, n_flips, null_p)
    axes[0].bar(x, null_probs, alpha=0.7, color='steelblue', edgecolor='black')
    
    # Highlight the observed value
    axes[0].bar(observed_heads, stats.binom.pmf(observed_heads, n_flips, null_p), 
                color='red', alpha=0.8, edgecolor='black', linewidth=2)
    
    # Calculate p-value (two-tailed)
    p_value = stats.binomtest(observed_heads, n_flips, null_p, alternative='two-sided').pvalue
    
    # Shade the rejection region (alpha = 0.05)
    critical_low = stats.binom.ppf(0.025, n_flips, null_p)
    critical_high = stats.binom.ppf(0.975, n_flips, null_p)
    
    for i in x:
        if i <= critical_low or i >= critical_high:
            axes[0].bar(i, null_probs[i], alpha=0.3, color='orange', edgecolor='black')
    
    axes[0].set_xlabel('Number of Heads', fontsize=12)
    axes[0].set_ylabel('Probability', fontsize=12)
    axes[0].set_title(f'Shape of Uncertainty under NULL (p={null_p})\nObserved: {observed_heads} heads', 
                     fontsize=13, fontweight='bold')
    axes[0].set_xticks(x)
    axes[0].axvline(observed_heads, color='red', linestyle='--', linewidth=2, alpha=0.7)
    
    # Add text annotation
    decision = "REJECT NULL" if p_value < 0.05 else "FAIL TO REJECT NULL"
    color = 'red' if p_value < 0.05 else 'green'
    axes[0].text(0.5, 0.95, f'p-value = {p_value:.4f}\n{decision}', 
                transform=axes[0].transAxes, fontsize=11, verticalalignment='top',
                bbox=dict(boxstyle='round', facecolor=color, alpha=0.3))
    
    # Right plot: Distribution under TRUE parameter (what nature actually is)
    true_probs = stats.binom.pmf(x, n_flips, true_p)
    axes[1].bar(x, true_probs, alpha=0.7, color='green', edgecolor='black')
    axes[1].bar(observed_heads, stats.binom.pmf(observed_heads, n_flips, true_p), 
                color='red', alpha=0.8, edgecolor='black', linewidth=2)
    
    axes[1].set_xlabel('Number of Heads', fontsize=12)
    axes[1].set_ylabel('Probability', fontsize=12)
    axes[1].set_title(f'Shape of Uncertainty under TRUE (p={true_p})\nObserved: {observed_heads} heads', 
                     fontsize=13, fontweight='bold')
    axes[1].set_xticks(x)
    axes[1].axvline(observed_heads, color='red', linestyle='--', linewidth=2, alpha=0.7)
    
    # Calculate probability of observing this under true distribution
    prob_under_true = stats.binom.pmf(observed_heads, n_flips, true_p)
    axes[1].text(0.5, 0.95, f'P(observe {observed_heads}|true p={true_p}) = {prob_under_true:.4f}', 
                transform=axes[1].transAxes, fontsize=11, verticalalignment='top',
                bbox=dict(boxstyle='round', facecolor='lightgreen', alpha=0.5))
    
    plt.tight_layout()
    plt.show()
    
    # Print interpretation
    print("\n" + "="*80)
    print("INTERPRETATION")
    print("="*80)
    print(f"\nYou flipped a coin {n_flips} times and got {observed_heads} heads.\n")
    print(f"NULL HYPOTHESIS: The coin is fair (p = {null_p})")
    print(f"ALTERNATIVE: The coin is NOT fair (p ≠ {null_p})\n")
    print(f"Under the null hypothesis, the probability of getting {observed_heads} or more extreme is {p_value:.4f}")
    print(f"\nDecision (α = 0.05): {decision}\n")
    
    if p_value < 0.05:
        print(f"✗ The observed result ({observed_heads} heads) falls in the REJECTION REGION (shaded orange).")
        print(f"  This outcome is too rare under the null hypothesis to be explained by chance alone.")
        print(f"  We have evidence that the coin is NOT fair.\n")
    else:
        print(f"✓ The observed result ({observed_heads} heads) is CONSISTENT with the null hypothesis.")
        print(f"  This outcome is reasonably common under a fair coin.")
        print(f"  We don't have sufficient evidence to say the coin is biased.\n")
    
    print(f"The TRUE probability is p = {true_p} (but in real experiments, we don't know this!)")
    print(f"Under the true distribution, observing {observed_heads} heads has probability {prob_under_true:.4f}")
    print("="*80)

### Interactive Exploration: Play with the parameters!

**Instructions:**
- `n_flips`: How many times to flip the coin
- `true_p`: The ACTUAL probability of heads (what nature really is, but unknown to us)
- `observed_heads`: What we actually observed in our experiment
- `null_p`: What we assume under the null hypothesis (usually 0.5 for fair coin)

**Try these scenarios:**
1. Set true_p = 0.5 (fair coin), observed_heads = 5. See how it looks "expected"
2. Set true_p = 0.5, observed_heads = 9. See how it becomes "unusual"
3. Set true_p = 0.7 (biased coin), observed_heads = 7. Even though coin IS biased, we might fail to detect it!

In [None]:
interact(plot_binomial_hypothesis,
         n_flips=IntSlider(min=5, max=50, step=5, value=10, description='n_flips:'),
         true_p=FloatSlider(min=0.1, max=0.9, step=0.1, value=0.5, description='true_p:'),
         observed_heads=IntSlider(min=0, max=50, step=1, value=7, description='observed:'),
         null_p=fixed(0.5));

## Part 2: Comparing Two Groups - The t-test

Now let's move to a more realistic biological scenario:

**Scenario:** You're studying earthworm body weight in two sites:
- **Site A:** Clean agricultural land (control)
- **Site B:** Near coal mining area (potentially contaminated)

**Question:** Is there a significant difference in body weight?

The **shape of uncertainty** here is the sampling distribution of the difference in means.

In [None]:
def plot_ttest_hypothesis(n_per_group=20, mean_diff=0.5, std_dev=1.0, alpha=0.05):
    """
    Visualize t-test hypothesis testing with the shape of uncertainty.
    
    Parameters:
    - n_per_group: sample size per group
    - mean_diff: true difference in means (Site B - Site A)
    - std_dev: standard deviation (assumed equal for both groups)
    - alpha: significance level
    """
    
    # Simulate data
    np.random.seed(42)
    control = np.random.normal(loc=5.0, scale=std_dev, size=n_per_group)  # Site A (clean)
    treatment = np.random.normal(loc=5.0 + mean_diff, scale=std_dev, size=n_per_group)  # Site B (contaminated)
    
    # Perform t-test
    t_stat, p_value = stats.ttest_ind(treatment, control)
    
    # Calculate degrees of freedom
    df = 2 * n_per_group - 2
    
    # Critical value
    t_critical = stats.t.ppf(1 - alpha/2, df)
    
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    
    # Plot 1: Raw data
    axes[0, 0].scatter(np.ones(n_per_group) + np.random.normal(0, 0.05, n_per_group), 
                      control, alpha=0.6, s=100, label='Site A (Control)', color='blue')
    axes[0, 0].scatter(2 * np.ones(n_per_group) + np.random.normal(0, 0.05, n_per_group), 
                      treatment, alpha=0.6, s=100, label='Site B (Mining)', color='orange')
    
    axes[0, 0].hlines(control.mean(), 0.7, 1.3, colors='blue', linewidth=3, label=f'Mean A: {control.mean():.2f}')
    axes[0, 0].hlines(treatment.mean(), 1.7, 2.3, colors='orange', linewidth=3, label=f'Mean B: {treatment.mean():.2f}')
    
    axes[0, 0].set_xlim(0.5, 2.5)
    axes[0, 0].set_xticks([1, 2])
    axes[0, 0].set_xticklabels(['Site A\n(Clean)', 'Site B\n(Mining)'])
    axes[0, 0].set_ylabel('Earthworm Body Weight (g)', fontsize=11)
    axes[0, 0].set_title('Raw Data: Earthworm Weights at Two Sites', fontsize=12, fontweight='bold')
    axes[0, 0].legend(loc='upper left')
    axes[0, 0].grid(True, alpha=0.3)
    
    # Plot 2: Distribution of each group
    axes[0, 1].hist(control, bins=15, alpha=0.5, label='Site A', color='blue', edgecolor='black')
    axes[0, 1].hist(treatment, bins=15, alpha=0.5, label='Site B', color='orange', edgecolor='black')
    axes[0, 1].axvline(control.mean(), color='blue', linestyle='--', linewidth=2)
    axes[0, 1].axvline(treatment.mean(), color='orange', linestyle='--', linewidth=2)
    axes[0, 1].set_xlabel('Body Weight (g)', fontsize=11)
    axes[0, 1].set_ylabel('Frequency', fontsize=11)
    axes[0, 1].set_title('Distribution of Weights\n(The variability within each group)', fontsize=12, fontweight='bold')
    axes[0, 1].legend()
    axes[0, 1].grid(True, alpha=0.3)
    
    # Plot 3: Sampling distribution under NULL (difference = 0)
    x_range = np.linspace(-4, 4, 1000)
    null_dist = stats.t.pdf(x_range, df)
    
    axes[1, 0].plot(x_range, null_dist, 'b-', linewidth=2, label='Null Distribution')
    axes[1, 0].fill_between(x_range, null_dist, where=(x_range < -t_critical), alpha=0.3, color='red', label='Rejection Region')
    axes[1, 0].fill_between(x_range, null_dist, where=(x_range > t_critical), alpha=0.3, color='red')
    axes[1, 0].axvline(t_stat, color='darkred', linestyle='--', linewidth=2.5, label=f'Observed t = {t_stat:.2f}')
    axes[1, 0].axvline(-t_critical, color='red', linestyle=':', linewidth=1.5, alpha=0.7)
    axes[1, 0].axvline(t_critical, color='red', linestyle=':', linewidth=1.5, alpha=0.7, label=f'Critical t = ±{t_critical:.2f}')
    
    axes[1, 0].set_xlabel('t-statistic', fontsize=11)
    axes[1, 0].set_ylabel('Probability Density', fontsize=11)
    axes[1, 0].set_title(f'Shape of Uncertainty under NULL\n(No difference between sites)', fontsize=12, fontweight='bold')
    axes[1, 0].legend(loc='upper right', fontsize=9)
    axes[1, 0].grid(True, alpha=0.3)
    
    # Plot 4: Effect size and power visualization
    observed_diff = treatment.mean() - control.mean()
    pooled_std = np.sqrt((control.var() + treatment.var()) / 2)
    cohen_d = observed_diff / pooled_std
    
    axes[1, 1].axis('off')
    
    decision = "REJECT NULL" if p_value < alpha else "FAIL TO REJECT NULL"
    color = 'red' if p_value < alpha else 'green'
    
    summary_text = f"""
    HYPOTHESIS TEST SUMMARY
    {'='*50}
    
    Sample sizes: n₁ = n₂ = {n_per_group}
    
    Site A mean: {control.mean():.3f} g
    Site B mean: {treatment.mean():.3f} g
    Observed difference: {observed_diff:.3f} g
    
    Standard deviations: {std_dev:.2f} g (assumed equal)
    Pooled SD: {pooled_std:.3f} g
    
    t-statistic: {t_stat:.3f}
    Degrees of freedom: {df}
    p-value: {p_value:.4f}
    
    Significance level α: {alpha}
    Critical t: ±{t_critical:.3f}
    
    Cohen's d (effect size): {cohen_d:.3f}
    
    DECISION: {decision}
    
    {'='*50}
    
    INTERPRETATION:
    """
    
    if p_value < alpha:
        summary_text += f"""
    ✗ The observed t-statistic ({t_stat:.2f}) falls in the
      REJECTION REGION (|t| > {t_critical:.2f}).
      
    ✗ p-value ({p_value:.4f}) < α ({alpha})
    
    ✗ CONCLUSION: There IS a statistically significant
      difference in earthworm body weights between
      the clean site and mining site.
      
    Mining contamination appears to affect earthworm
    body weight.
        """
    else:
        summary_text += f"""
    ✓ The observed t-statistic ({t_stat:.2f}) falls within
      the ACCEPTANCE REGION (|t| ≤ {t_critical:.2f}).
      
    ✓ p-value ({p_value:.4f}) ≥ α ({alpha})
    
    ✓ CONCLUSION: There is NO statistically significant
      difference in earthworm body weights between
      the clean site and mining site.
      
    The observed difference could be due to random
    sampling variation.
        """
    
    axes[1, 1].text(0.1, 0.5, summary_text, fontsize=10, family='monospace',
                   verticalalignment='center',
                   bbox=dict(boxstyle='round', facecolor=color, alpha=0.2))
    
    plt.tight_layout()
    plt.show()

### Interactive Exploration: Earthworm Body Weight Study

**Instructions:**
- `n_per_group`: Number of earthworms sampled from each site
- `mean_diff`: TRUE difference in mean weights (Site B - Site A) - unknown to us in real study!
- `std_dev`: Biological variability (how much weights vary within each site)
- `alpha`: Significance level (usually 0.05)

**Explore these scenarios:**
1. **No effect:** mean_diff = 0, n_per_group = 20. See how often we correctly fail to reject.
2. **Small effect:** mean_diff = 0.3, n_per_group = 20. Might not detect it!
3. **Increase sample size:** mean_diff = 0.3, n_per_group = 50. Better chance of detection!
4. **High variability:** mean_diff = 0.5, std_dev = 2.0. Harder to detect even real effects.
5. **Large effect:** mean_diff = 1.5, n_per_group = 20. Easy to detect!

In [None]:
interact(plot_ttest_hypothesis,
         n_per_group=IntSlider(min=10, max=100, step=10, value=20, description='n per group:'),
         mean_diff=FloatSlider(min=0, max=2.0, step=0.1, value=0.5, description='true diff:'),
         std_dev=FloatSlider(min=0.5, max=3.0, step=0.5, value=1.0, description='std dev:'),
         alpha=FloatSlider(min=0.01, max=0.10, step=0.01, value=0.05, description='alpha:'));

## Part 3: The Shape Changes Everything - Different Distributions

Different types of data follow different probability distributions. The **shape** determines:
- What's considered "rare" vs "common"
- Where rejection regions fall
- The power to detect effects

Let's compare three common distributions.

In [None]:
def compare_distributions(df_param=5):
    """
    Compare Normal, t-distribution, and Chi-square distributions.
    Shows how different shapes affect what we consider "extreme".
    """
    
    fig, axes = plt.subplots(1, 3, figsize=(16, 5))
    
    x = np.linspace(-4, 4, 1000)
    
    # Normal distribution
    normal = stats.norm.pdf(x, 0, 1)
    axes[0].plot(x, normal, 'b-', linewidth=2, label='Normal(0,1)')
    axes[0].fill_between(x, normal, where=(np.abs(x) > 1.96), alpha=0.3, color='red', 
                         label='Rejection region\n(α=0.05, two-tailed)')
    axes[0].axvline(-1.96, color='red', linestyle='--', linewidth=1.5, alpha=0.7)
    axes[0].axvline(1.96, color='red', linestyle='--', linewidth=1.5, alpha=0.7)
    axes[0].set_xlabel('Value', fontsize=11)
    axes[0].set_ylabel('Probability Density', fontsize=11)
    axes[0].set_title('Normal Distribution\n(Large samples, many variables)', fontsize=12, fontweight='bold')
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)
    axes[0].text(0, -0.05, '95% of data\nwithin ±1.96 SD', ha='center', transform=axes[0].transAxes,
                bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.5))
    
    # t-distribution
    t_dist = stats.t.pdf(x, df_param)
    normal_compare = stats.norm.pdf(x, 0, 1)
    t_critical = stats.t.ppf(0.975, df_param)
    
    axes[1].plot(x, t_dist, 'g-', linewidth=2, label=f't-distribution (df={df_param})')
    axes[1].plot(x, normal_compare, 'b--', linewidth=1.5, alpha=0.5, label='Normal (for comparison)')
    axes[1].fill_between(x, t_dist, where=(np.abs(x) > t_critical), alpha=0.3, color='red',
                        label=f'Rejection region\n(α=0.05, critical t=±{t_critical:.2f})')
    axes[1].axvline(-t_critical, color='red', linestyle='--', linewidth=1.5, alpha=0.7)
    axes[1].axvline(t_critical, color='red', linestyle='--', linewidth=1.5, alpha=0.7)
    axes[1].set_xlabel('Value', fontsize=11)
    axes[1].set_ylabel('Probability Density', fontsize=11)
    axes[1].set_title('t-Distribution\n(Small samples, unknown variance)', fontsize=12, fontweight='bold')
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)
    axes[1].text(0, -0.05, 'Fatter tails→\nmore extreme values\nthan Normal', ha='center', 
                transform=axes[1].transAxes,
                bbox=dict(boxstyle='round', facecolor='lightgreen', alpha=0.5))
    
    # Chi-square distribution
    x_chi = np.linspace(0, 20, 1000)
    chi_sq = stats.chi2.pdf(x_chi, df_param)
    chi_critical = stats.chi2.ppf(0.95, df_param)
    
    axes[2].plot(x_chi, chi_sq, 'orange', linewidth=2, label=f'χ² (df={df_param})')
    axes[2].fill_between(x_chi, chi_sq, where=(x_chi > chi_critical), alpha=0.3, color='red',
                        label=f'Rejection region\n(α=0.05, critical χ²={chi_critical:.2f})')
    axes[2].axvline(chi_critical, color='red', linestyle='--', linewidth=1.5, alpha=0.7)
    axes[2].set_xlabel('Value', fontsize=11)
    axes[2].set_ylabel('Probability Density', fontsize=11)
    axes[2].set_title('Chi-Square Distribution\n(Variance tests, goodness-of-fit)', fontsize=12, fontweight='bold')
    axes[2].legend()
    axes[2].grid(True, alpha=0.3)
    axes[2].text(0.5, -0.05, 'Skewed, only positive→\nused for variance,\ncategorical data', 
                ha='center', transform=axes[2].transAxes,
                bbox=dict(boxstyle='round', facecolor='lightyellow', alpha=0.5))
    
    plt.tight_layout()
    plt.show()
    
    print("\n" + "="*80)
    print("KEY INSIGHTS: How Shape Affects Hypothesis Testing")
    print("="*80)
    print("\n1. NORMAL DISTRIBUTION:")
    print("   - Symmetric, bell-shaped")
    print("   - Extreme values (>±1.96 SD) are very rare (5% total)")
    print("   - Used when: Large samples, continuous data, symmetric variation\n")
    
    print("2. t-DISTRIBUTION:")
    print("   - Symmetric but FATTER TAILS than normal")
    print(f"   - With df={df_param}, critical value is ±{t_critical:.2f} (vs ±1.96 for normal)")
    print("   - Extreme values are MORE COMMON than in normal distribution")
    print("   - Used when: Small samples, estimating population variance from sample\n")
    
    print("3. CHI-SQUARE DISTRIBUTION:")
    print("   - SKEWED (not symmetric), only positive values")
    print(f"   - With df={df_param}, critical value (95th percentile) is {chi_critical:.2f}")
    print("   - Shape changes dramatically with degrees of freedom")
    print("   - Used when: Testing variances, categorical data, goodness-of-fit\n")
    
    print("BOTTOM LINE:")
    print("The SHAPE determines what counts as 'rare' or 'extreme'.")
    print("Using the wrong shape gives wrong p-values and wrong conclusions!")
    print("="*80)

### Interactive Comparison: How Does Sample Size (df) Affect the Shape?

In [None]:
interact(compare_distributions,
         df_param=IntSlider(min=2, max=30, step=1, value=5, 
                           description='Degrees of Freedom:'));

## Part 4: Real Data Example - Gene Expression in Contaminated Sites

Let's apply everything to a realistic genomics scenario.

**Scenario:** You measured stress gene expression in earthworms from mining vs. control sites.
Gene expression data is often **log-normally distributed** (skewed), not normal!

This shows why understanding the shape is crucial.

In [None]:
def gene_expression_example(n_sample=30, fold_change=2.0, cv=0.5, transform='log'):
    """
    Simulate gene expression data and show effect of transformation.
    
    Parameters:
    - n_sample: sample size per group
    - fold_change: multiplicative effect of contamination
    - cv: coefficient of variation (biological noise)
    - transform: 'none' or 'log' transformation
    """
    
    np.random.seed(42)
    
    # Generate log-normal data (realistic for gene expression)
    baseline_mean = 100  # baseline expression level
    baseline_sd = baseline_mean * cv
    
    control = np.random.lognormal(mean=np.log(baseline_mean), sigma=cv, size=n_sample)
    treatment = np.random.lognormal(mean=np.log(baseline_mean * fold_change), sigma=cv, size=n_sample)
    
    fig, axes = plt.subplots(2, 3, figsize=(16, 10))
    
    if transform == 'log':
        control_transformed = np.log2(control)
        treatment_transformed = np.log2(treatment)
        ylabel = 'log₂(Expression)'
    else:
        control_transformed = control
        treatment_transformed = treatment
        ylabel = 'Raw Expression'
    
    # Row 1: Original scale
    # Plot 1: Raw data points
    axes[0, 0].scatter(np.ones(n_sample) + np.random.normal(0, 0.05, n_sample), 
                      control, alpha=0.6, s=80, label='Control', color='blue')
    axes[0, 0].scatter(2 * np.ones(n_sample) + np.random.normal(0, 0.05, n_sample), 
                      treatment, alpha=0.6, s=80, label='Mining Site', color='red')
    axes[0, 0].hlines(control.mean(), 0.7, 1.3, colors='blue', linewidth=3)
    axes[0, 0].hlines(treatment.mean(), 1.7, 2.3, colors='red', linewidth=3)
    axes[0, 0].set_xlim(0.5, 2.5)
    axes[0, 0].set_xticks([1, 2])
    axes[0, 0].set_xticklabels(['Control', 'Mining'])
    axes[0, 0].set_ylabel('Raw Expression Level', fontsize=11)
    axes[0, 0].set_title('Original Data\n(Log-normal distribution)', fontsize=12, fontweight='bold')
    axes[0, 0].legend()
    axes[0, 0].grid(True, alpha=0.3)
    
    # Plot 2: Histogram of raw data
    axes[0, 1].hist(control, bins=20, alpha=0.5, color='blue', edgecolor='black', label='Control')
    axes[0, 1].hist(treatment, bins=20, alpha=0.5, color='red', edgecolor='black', label='Mining')
    axes[0, 1].axvline(control.mean(), color='blue', linestyle='--', linewidth=2)
    axes[0, 1].axvline(treatment.mean(), color='red', linestyle='--', linewidth=2)
    axes[0, 1].set_xlabel('Raw Expression', fontsize=11)
    axes[0, 1].set_ylabel('Frequency', fontsize=11)
    axes[0, 1].set_title('Distribution Shape\n(SKEWED - violates t-test assumption)', 
                        fontsize=12, fontweight='bold')
    axes[0, 1].legend()
    axes[0, 1].grid(True, alpha=0.3)
    
    # Plot 3: t-test on raw data
    t_raw, p_raw = stats.ttest_ind(treatment, control)
    axes[0, 2].axis('off')
    result_text_raw = f"""
    T-TEST ON RAW DATA
    {'='*40}
    
    Control mean: {control.mean():.2f}
    Mining mean: {treatment.mean():.2f}
    
    t-statistic: {t_raw:.3f}
    p-value: {p_raw:.4f}
    
    Decision: {'REJECT' if p_raw < 0.05 else 'FAIL TO REJECT'}
    
    ⚠️  WARNING: Data is skewed!
    t-test assumes normality.
    This p-value may be unreliable.
    
    {'='*40}
    """
    axes[0, 2].text(0.1, 0.5, result_text_raw, fontsize=10, family='monospace',
                   verticalalignment='center',
                   bbox=dict(boxstyle='round', facecolor='orange', alpha=0.3))
    
    # Row 2: After log transformation
    # Plot 4: Transformed data
    axes[1, 0].scatter(np.ones(n_sample) + np.random.normal(0, 0.05, n_sample), 
                      control_transformed, alpha=0.6, s=80, label='Control', color='blue')
    axes[1, 0].scatter(2 * np.ones(n_sample) + np.random.normal(0, 0.05, n_sample), 
                      treatment_transformed, alpha=0.6, s=80, label='Mining Site', color='red')
    axes[1, 0].hlines(control_transformed.mean(), 0.7, 1.3, colors='blue', linewidth=3)
    axes[1, 0].hlines(treatment_transformed.mean(), 1.7, 2.3, colors='red', linewidth=3)
    axes[1, 0].set_xlim(0.5, 2.5)
    axes[1, 0].set_xticks([1, 2])
    axes[1, 0].set_xticklabels(['Control', 'Mining'])
    axes[1, 0].set_ylabel(ylabel, fontsize=11)
    axes[1, 0].set_title(f'After {transform.upper()} Transform\n(More symmetric)', fontsize=12, fontweight='bold')
    axes[1, 0].legend()
    axes[1, 0].grid(True, alpha=0.3)
    
    # Plot 5: Histogram of transformed data
    axes[1, 1].hist(control_transformed, bins=20, alpha=0.5, color='blue', 
                   edgecolor='black', label='Control')
    axes[1, 1].hist(treatment_transformed, bins=20, alpha=0.5, color='red', 
                   edgecolor='black', label='Mining')
    axes[1, 1].axvline(control_transformed.mean(), color='blue', linestyle='--', linewidth=2)
    axes[1, 1].axvline(treatment_transformed.mean(), color='red', linestyle='--', linewidth=2)
    axes[1, 1].set_xlabel(ylabel, fontsize=11)
    axes[1, 1].set_ylabel('Frequency', fontsize=11)
    axes[1, 1].set_title('Distribution Shape\n(More NORMAL - better for t-test)', 
                        fontsize=12, fontweight='bold')
    axes[1, 1].legend()
    axes[1, 1].grid(True, alpha=0.3)
    
    # Plot 6: t-test on transformed data
    t_trans, p_trans = stats.ttest_ind(treatment_transformed, control_transformed)
    axes[1, 2].axis('off')
    result_text_trans = f"""
    T-TEST ON TRANSFORMED DATA
    {'='*40}
    
    Control mean: {control_transformed.mean():.3f}
    Mining mean: {treatment_transformed.mean():.3f}
    
    t-statistic: {t_trans:.3f}
    p-value: {p_trans:.4f}
    
    Decision: {'REJECT' if p_trans < 0.05 else 'FAIL TO REJECT'}
    
    ✓ Data is more symmetric
    ✓ t-test assumptions better met
    ✓ p-value is more reliable
    
    Fold-change: {2**(treatment_transformed.mean() - control_transformed.mean()):.2f}x
    
    {'='*40}
    """
    color_trans = 'lightgreen' if p_trans < 0.05 else 'lightyellow'
    axes[1, 2].text(0.1, 0.5, result_text_trans, fontsize=10, family='monospace',
                   verticalalignment='center',
                   bbox=dict(boxstyle='round', facecolor=color_trans, alpha=0.5))
    
    plt.tight_layout()
    plt.show()
    
    print("\n" + "="*80)
    print("LESSON: The Shape of Your Data Matters!")
    print("="*80)
    print(f"\nRaw data p-value: {p_raw:.4f}")
    print(f"Transformed data p-value: {p_trans:.4f}")
    print(f"\nDifference in p-values: {abs(p_raw - p_trans):.4f}")
    print("\nWhy transformation helps:")
    print("1. Gene expression data is multiplicative (fold-changes)")
    print("2. Log transformation converts multiplication to addition")
    print("3. Makes the distribution more symmetric (more normal)")
    print("4. t-test assumptions are better satisfied")
    print("5. Statistical inference becomes more reliable")
    print("\nThis is why bioinformaticians almost always log-transform expression data!")
    print("="*80)

### Interactive Example: Gene Expression Analysis

**Try different scenarios:**
- Increase `fold_change` to see stronger effects
- Increase `cv` (coefficient of variation) to add more biological noise
- Toggle between 'none' and 'log' transformation to see the difference
- Increase sample size to improve power

In [None]:
interact(gene_expression_example,
         n_sample=IntSlider(min=10, max=100, step=10, value=30, description='Sample size:'),
         fold_change=FloatSlider(min=1.0, max=5.0, step=0.5, value=2.0, description='Fold change:'),
         cv=FloatSlider(min=0.2, max=1.0, step=0.1, value=0.5, description='Variability:'),
         transform=Dropdown(options=['none', 'log'], value='log', description='Transform:'));

## Summary: Key Takeaways

### The Shape of Uncertainty is Essential for Hypothesis Testing Because:

1. **Defines "Rare" vs "Common"**
   - The distribution tells us what outcomes to expect under the null hypothesis
   - Without knowing the shape, we can't judge if our observation is unusual

2. **Determines Critical Values**
   - Different distributions have different tails
   - The shape determines where we draw the line between "accept" and "reject"

3. **Affects Statistical Power**
   - Narrow distributions (low variance) → easier to detect effects
   - Wide distributions (high variance) → need larger samples

4. **Reveals Assumption Violations**
   - If your data shape doesn't match the assumed distribution, p-values are wrong
   - This is why we check assumptions (normality, equal variance, etc.)

5. **Guides Appropriate Transformations**
   - Log-normal data → log transform
   - Count data → appropriate discrete distributions
   - Proportions → logit transform

### Practical Implications for Your Research:

- **Always visualize your data first** - look at the shape!
- **Check assumptions** - is the distribution appropriate?
- **Consider transformations** - especially for gene expression, enzyme activity, or count data
- **Sample size matters** - especially with high variability
- **Effect size matters** - small effects need larger samples to detect

---

### Questions for Reflection:

1. Why might a significant p-value with raw data become non-significant after transformation (or vice versa)?
2. In the coin-flipping example, why does increasing the number of flips make extreme outcomes more detectable?
3. For your earthworm research, what kind of transformations might be appropriate for different measurements (weight, length, enzyme activity, gene expression)?
4. How would you explain to a non-statistician why we need to check data distribution before running a t-test?

---

*This notebook was designed for students at Kuchinda College, Department of Zoology.*
*For questions or suggestions, contact your instructor.*

## Extension Activities

### Activity 1: Design Your Own Study
Use the interactive widgets above to design a study for detecting earthworm weight differences:
- What sample size do you need to detect a 0.5g difference?
- How does biological variability affect your power?
- What if the effect is only 0.2g - is it detectable?

### Activity 2: Real Data Challenge
Bring your own data from lab measurements:
- Plot histograms to assess the shape
- Identify if transformations are needed
- Run appropriate tests
- Interpret results in biological context

### Activity 3: Type I and Type II Errors
Modify the code to:
- Simulate 1000 experiments with no true effect (mean_diff=0)
- Count how many times you reject the null (should be ~5%)
- Simulate 1000 experiments with a real effect
- Count how many times you correctly detect it (statistical power)