# Hypothesis Testing Guide

A comprehensive exploration of statistical inference and hypothesis testing using the RustLab ecosystem. This notebook demonstrates parametric and non-parametric tests, A/B testing workflows, and statistical decision-making.

## Learning Objectives

- **Statistical Inference**: Understanding p-values, significance levels, and test statistics
- **Parametric Tests**: t-tests for means comparison and their assumptions
- **Non-Parametric Tests**: Distribution-free alternatives for robust analysis
- **A/B Testing**: Complete workflow for business experimentation
- **Power Analysis**: Sample size determination and effect size calculation

## Mathematical Foundation

Hypothesis testing framework:

- **Null Hypothesis (H‚ÇÄ)**: No effect or difference exists
- **Alternative Hypothesis (H‚ÇÅ)**: Effect or difference exists
- **Test Statistic**: Quantifies evidence against H‚ÇÄ
- **P-value**: Probability of observing test statistic under H‚ÇÄ
- **Significance Level (Œ±)**: Threshold for rejecting H‚ÇÄ (typically 0.05)

In [2]:
// üì¶ Setup: Dependencies and Imports
:dep rustlab-stats = { path = ".." }
:dep rustlab-math = { path = "../../rustlab-math" }
:dep rustlab-plotting = { path = "../../rustlab-plotting" }

// Global imports - these persist across all cells
use rustlab_stats::prelude::*;
use rustlab_math::*;
use rustlab_plotting::*;

// Test that everything is working
{
    let test_sample = vec64![23.1, 24.2, 22.8, 23.9, 24.1];
    let test_mean = test_sample.mean();
    let test_std = test_sample.std(None);
    
    let setup_msg = format!("üéØ Setup complete! Test sample mean: {:.2}, std: {:.2}", test_mean, test_std);
    println!("{}", setup_msg);
    println!("üìä Ready for hypothesis testing and statistical inference");
}

üéØ Setup complete! Test sample mean: 23.62, std: 0.63
üìä Ready for hypothesis testing and statistical inference


()

## 1. One-Sample t-Test: Testing Against a Known Value

Testing whether a sample mean differs significantly from a hypothesized population value.

In [7]:
    use rustlab_stats::prelude::*;
    use rustlab_math::*;
    use rustlab_plotting::*;
    
    // Quality control scenario: testing if parts meet specification
    let part_measurements = vec64![23.1, 24.2, 22.8, 23.9, 24.1, 23.5, 22.9, 24.3, 23.8, 23.2];
    let specification_target = 24.0;  // Target dimension in mm
    let alpha = 0.05;  // Significance level
    
    println!("üî¨ One-Sample t-Test: Quality Control Analysis");
    println!("{}", "=".repeat(50));
    
    // Calculate sample statistics
    let sample_mean = part_measurements.mean();
    let sample_std = part_measurements.std(None);
    let n = part_measurements.len() as f64;
    
    // Perform one-sample t-test (using correct method name and parameters)
    let t_result = part_measurements.ttest_1samp(specification_target, Alternative::TwoSided);
    
    println!("üìä Sample Statistics:");
    let stats_n = format!("   Sample size (n): {}", n as usize);
    println!("{}", stats_n);
    let stats_mean = format!("   Sample mean: {:.3} mm", sample_mean);
    println!("{}", stats_mean);
    let stats_std = format!("   Sample std dev: {:.3} mm", sample_std);
    println!("{}", stats_std);
    let stats_target = format!("   Target value: {:.1} mm", specification_target);
    println!("{}", stats_target);
    
    println!();
    println!("üß™ Hypothesis Test Results:");
    let hyp_h0 = format!("   H‚ÇÄ: Œº = {:.1} (parts meet specification)", specification_target);
    println!("{}", hyp_h0);
    let hyp_h1 = format!("   H‚ÇÅ: Œº ‚â† {:.1} (parts don't meet specification)", specification_target);
    println!("{}", hyp_h1);
    let hyp_t = format!("   t-statistic: {:.3}", t_result.statistic);
    println!("{}", hyp_t);
    let hyp_df = format!("   Degrees of freedom: {}", n as usize - 1);
    println!("{}", hyp_df);
    let hyp_p = format!("   p-value: {:.6}", t_result.p_value);
    println!("{}", hyp_p);
    
    // Statistical decision
    let decision = if t_result.p_value < alpha {
        "Reject H‚ÇÄ"
    } else {
        "Fail to reject H‚ÇÄ"
    };
    
    println!();
    println!("‚öñÔ∏è Statistical Decision (Œ± = {}):", alpha);
    let decision_msg = format!("   Decision: {}", decision);
    println!("{}", decision_msg);
    
    let interpretation = if t_result.p_value < alpha {
        "Parts do NOT meet specification (significant difference)"
    } else {
        "Parts meet specification (no significant difference)"
    };
    let interp_msg = format!("   Interpretation: {}", interpretation);
    println!("{}", interp_msg);
    
    // Effect size (Cohen's d)
    let cohens_d = (sample_mean - specification_target) / sample_std;
    let effect_size_interpretation = match cohens_d.abs() {
        d if d < 0.2 => "negligible",
        d if d < 0.5 => "small",
        d if d < 0.8 => "medium",
        _ => "large",
    };
    
    println!();
    println!("üìà Effect Size Analysis:");
    let effect_d = format!("   Cohen's d: {:.3}", cohens_d);
    println!("{}", effect_d);
    let effect_size = format!("   Effect size: {} effect", effect_size_interpretation);
    println!("{}", effect_size);
    
    println!();
    println!("üí° Key Insight: Effect size provides practical significance beyond statistical significance!");


üî¨ One-Sample t-Test: Quality Control Analysis
üìä Sample Statistics:
   Sample size (n): 10
   Sample mean: 23.580 mm
   Sample std dev: 0.555 mm
   Target value: 24.0 mm

üß™ Hypothesis Test Results:
   H‚ÇÄ: Œº = 24.0 (parts meet specification)
   H‚ÇÅ: Œº ‚â† 24.0 (parts don't meet specification)
   t-statistic: -2.391
   Degrees of freedom: 9
   p-value: 0.083937

‚öñÔ∏è Statistical Decision (Œ± = 0.05):
   Decision: Fail to reject H‚ÇÄ
   Interpretation: Parts meet specification (no significant difference)

üìà Effect Size Analysis:
   Cohen's d: -0.756
   Effect size: medium effect

üí° Key Insight: Effect size provides practical significance beyond statistical significance!


## 2. Two-Sample t-Test: Comparing Two Groups

Comparing means between two independent groups (e.g., treatment vs control).

In [8]:
    use rustlab_stats::prelude::*;
    use rustlab_math::*;
    use rustlab_plotting::*;
    
    // A/B Testing scenario: website conversion rates
    let control_group = vec64![0.12, 0.15, 0.11, 0.14, 0.13, 0.16, 0.10, 0.15, 0.12, 0.13, 0.14, 0.11];
    let treatment_group = vec64![0.16, 0.18, 0.15, 0.19, 0.17, 0.20, 0.14, 0.18, 0.16, 0.17, 0.19, 0.15];
    let alpha = 0.05;
    
    println!("üî¨ Two-Sample t-Test: A/B Testing Analysis");
    println!("{}", "=".repeat(45));
    
    // Calculate group statistics
    let control_mean = control_group.mean();
    let control_std = control_group.std(None);
    let control_n = control_group.len();
    
    let treatment_mean = treatment_group.mean();
    let treatment_std = treatment_group.std(None);
    let treatment_n = treatment_group.len();
    
    println!("üìä Group Statistics:");
    println!("   Control Group:");
    let ctrl_stats = format!("     n = {}, mean = {:.3}, std = {:.3}", control_n, control_mean, control_std);
    println!("{}", ctrl_stats);
    
    println!("   Treatment Group:");
    let treat_stats = format!("     n = {}, mean = {:.3}, std = {:.3}", treatment_n, treatment_mean, treatment_std);
    println!("{}", treat_stats);
    
    // Perform two-sample t-test (using correct method name and parameters)
    let t_result = control_group.ttest_ind(&treatment_group, Alternative::TwoSided);
    
    println!();
    println!("üß™ Hypothesis Test Results:");
    println!("   H‚ÇÄ: Œº‚ÇÅ = Œº‚ÇÇ (no difference between groups)");
    println!("   H‚ÇÅ: Œº‚ÇÅ ‚â† Œº‚ÇÇ (significant difference exists)");
    let t_stat = format!("   t-statistic: {:.3}", t_result.statistic);
    println!("{}", t_stat);
    let df = control_n + treatment_n - 2;
    let df_msg = format!("   Degrees of freedom: {}", df);
    println!("{}", df_msg);
    let p_val = format!("   p-value: {:.6}", t_result.p_value);
    println!("{}", p_val);
    
    // Statistical decision
    let decision = if t_result.p_value < alpha {
        "Reject H‚ÇÄ"
    } else {
        "Fail to reject H‚ÇÄ"
    };
    
    println!();
    println!("‚öñÔ∏è Statistical Decision (Œ± = {}):", alpha);
    let decision_msg = format!("   Decision: {}", decision);
    println!("{}", decision_msg);
    
    let interpretation = if t_result.p_value < alpha {
        "Treatment significantly outperforms control"
    } else {
        "No significant difference between groups"
    };
    let interp_msg = format!("   Interpretation: {}", interpretation);
    println!("{}", interp_msg);
    
    // Effect size and practical significance
    let pooled_std = {
        let n1 = control_n as f64;
        let n2 = treatment_n as f64;
        let var1 = control_std.powi(2);
        let var2 = treatment_std.powi(2);
        (((n1 - 1.0) * var1 + (n2 - 1.0) * var2) / (n1 + n2 - 2.0)).sqrt()
    };
    
    let cohens_d = (treatment_mean - control_mean) / pooled_std;
    let improvement_pct = ((treatment_mean - control_mean) / control_mean) * 100.0;
    
    println!();
    println!("üìà Effect Size and Business Impact:");
    let effect_d = format!("   Cohen's d: {:.3}", cohens_d);
    println!("{}", effect_d);
    let improvement = format!("   Relative improvement: {:.1}%", improvement_pct);
    println!("{}", improvement);
    let abs_diff = format!("   Absolute difference: {:.4}", treatment_mean - control_mean);
    println!("{}", abs_diff);
    
    // Business recommendation
    println!();
    println!("üíº Business Recommendation:");
    if t_result.p_value < alpha && improvement_pct > 5.0 {
        println!("   ‚úÖ Implement treatment - statistically and practically significant");
    } else if t_result.p_value < alpha {
        println!("   ‚ö†Ô∏è Statistically significant but small practical impact");
    } else {
        println!("   ‚ùå Continue testing - no significant improvement detected");
    }


üî¨ Two-Sample t-Test: A/B Testing Analysis
üìä Group Statistics:
   Control Group:
     n = 12, mean = 0.130, std = 0.019
   Treatment Group:
     n = 12, mean = 0.170, std = 0.019

üß™ Hypothesis Test Results:
   H‚ÇÄ: Œº‚ÇÅ = Œº‚ÇÇ (no difference between groups)
   H‚ÇÅ: Œº‚ÇÅ ‚â† Œº‚ÇÇ (significant difference exists)
   t-statistic: -5.272
   Degrees of freedom: 22
   p-value: 0.000183

‚öñÔ∏è Statistical Decision (Œ± = 0.05):
   Decision: Reject H‚ÇÄ
   Interpretation: Treatment significantly outperforms control

üìà Effect Size and Business Impact:
   Cohen's d: 2.152
   Relative improvement: 30.8%
   Absolute difference: 0.0400

üíº Business Recommendation:
   ‚úÖ Implement treatment - statistically and practically significant


()

## 3. Non-Parametric Testing: Mann-Whitney U Test

Distribution-free alternative when t-test assumptions are violated (non-normal data, unequal variances).

In [9]:
    use rustlab_stats::prelude::*;
    use rustlab_math::*;
    use rustlab_plotting::*;
    
    // Highly skewed data: response times (milliseconds)
    let old_system = vec64![120.0, 130.0, 125.0, 340.0, 135.0, 128.0, 890.0, 145.0, 132.0, 127.0];
    let new_system = vec64![95.0, 102.0, 98.0, 88.0, 105.0, 92.0, 110.0, 89.0, 96.0, 101.0];
    let alpha = 0.05;
    
    println!("üî¨ Non-Parametric Testing: Mann-Whitney U Test");
    println!("{}", "=".repeat(50));
    
    // Calculate descriptive statistics (including robust measures)
    let old_mean = old_system.mean();
    let old_median = old_system.median();
    let old_std = old_system.std(None);
    let old_mad = old_system.mad();
    
    let new_mean = new_system.mean();
    let new_median = new_system.median();
    let new_std = new_system.std(None);
    let new_mad = new_system.mad();
    
    println!("üìä Descriptive Statistics:");
    println!("   Old System (with outliers):");
    let old_classical = format!("     Mean: {:.1} ms, Std: {:.1} ms (affected by outliers)", old_mean, old_std);
    println!("{}", old_classical);
    let old_robust = format!("     Median: {:.1} ms, MAD: {:.1} ms (robust measures)", old_median, old_mad);
    println!("{}", old_robust);
    
    println!("   New System:");
    let new_classical = format!("     Mean: {:.1} ms, Std: {:.1} ms", new_mean, new_std);
    println!("{}", new_classical);
    let new_robust = format!("     Median: {:.1} ms, MAD: {:.1} ms", new_median, new_mad);
    println!("{}", new_robust);
    
    // Check for outliers using IQR method
    let old_q1 = old_system.quantile(0.25, None);
    let old_q3 = old_system.quantile(0.75, None);
    let old_iqr = old_q3 - old_q1;
    let old_outlier_threshold = old_q3 + 1.5 * old_iqr;
    
    let outliers_count = old_system.iter().filter(|&&x| x > old_outlier_threshold).count();
    
    println!();
    println!("üéØ Why Non-Parametric Testing?");
    let outlier_msg = format!("   Old system has {} outliers (> {:.1} ms)", outliers_count, old_outlier_threshold);
    println!("{}", outlier_msg);
    println!("   Data is heavily right-skewed (mean >> median)");
    println!("   t-test assumptions violated ‚Üí Use Mann-Whitney U");
    
    // Perform Mann-Whitney U test (using correct method name and parameters)
    let u_result = old_system.mannwhitneyu(&new_system, Alternative::TwoSided);
    
    println!();
    println!("üß™ Mann-Whitney U Test Results:");
    println!("   H‚ÇÄ: Distributions are identical (no location shift)");
    println!("   H‚ÇÅ: New system has different response times");
    let u_stat = format!("   U-statistic: {:.1}", u_result.statistic);
    println!("{}", u_stat);
    let u_p = format!("   p-value: {:.6}", u_result.p_value);
    println!("{}", u_p);
    
    // Statistical decision
    let decision = if u_result.p_value < alpha {
        "Reject H‚ÇÄ"
    } else {
        "Fail to reject H‚ÇÄ"
    };
    
    println!();
    println!("‚öñÔ∏è Statistical Decision (Œ± = {}):", alpha);
    let decision_msg = format!("   Decision: {}", decision);
    println!("{}", decision_msg);
    
    let interpretation = if u_result.p_value < alpha {
        "New system significantly different (robust to outliers)"
    } else {
        "No significant difference in performance"
    };
    let interp_msg = format!("   Interpretation: {}", interpretation);
    println!("{}", interp_msg);
    
    // Effect size for non-parametric test (rank-biserial correlation)
    let n1 = old_system.len() as f64;
    let n2 = new_system.len() as f64;
    let r = 1.0 - (2.0 * u_result.statistic) / (n1 * n2);
    
    println!();
    println!("üìà Non-Parametric Effect Size:");
    let effect_r = format!("   Rank-biserial correlation (r): {:.3}", r);
    println!("{}", effect_r);
    let median_diff = format!("   Median difference: {:.1} ms", old_median - new_median);
    println!("{}", median_diff);
    let improvement_pct = ((old_median - new_median) / old_median) * 100.0;
    let improvement_msg = format!("   Performance improvement: {:.1}%", improvement_pct);
    println!("{}", improvement_msg);
    
    println!();
    println!("üí° Key Insight: Non-parametric tests are robust to outliers and distribution shape!");


üî¨ Non-Parametric Testing: Mann-Whitney U Test
üìä Descriptive Statistics:
   Old System (with outliers):
     Mean: 227.2 ms, Std: 242.1 ms (affected by outliers)
     Median: 131.0 ms, MAD: 5.0 ms (robust measures)
   New System:
     Mean: 97.6 ms, Std: 7.0 ms
     Median: 97.0 ms, MAD: 5.0 ms

üéØ Why Non-Parametric Testing?
   Old system has 2 outliers (> 165.4 ms)
   Data is heavily right-skewed (mean >> median)
   t-test assumptions violated ‚Üí Use Mann-Whitney U

üß™ Mann-Whitney U Test Results:
   H‚ÇÄ: Distributions are identical (no location shift)
   H‚ÇÅ: New system has different response times
   U-statistic: 0.0
   p-value: 0.000000

‚öñÔ∏è Statistical Decision (Œ± = 0.05):
   Decision: Reject H‚ÇÄ
   Interpretation: New system significantly different (robust to outliers)

üìà Non-Parametric Effect Size:
   Rank-biserial correlation (r): 1.000
   Median difference: 34.0 ms
   Performance improvement: 26.0%

üí° Key Insight: Non-parametric tests are robust to outl

## 4. Summary and Best Practices

Key takeaways for conducting robust hypothesis tests and making sound statistical decisions.

In [10]:
    use rustlab_stats::prelude::*;
    use rustlab_math::*;
    use rustlab_plotting::*;
    
    println!("üéØ Hypothesis Testing: Summary and Best Practices");
    println!("{}", "=".repeat(55));
    
    // Demonstration dataset for best practices
    let demo_data_a = vec64![12.1, 11.8, 12.5, 13.2, 12.0, 11.9, 12.8, 12.3];
    let demo_data_b = vec64![13.1, 12.9, 13.4, 13.8, 13.2, 13.0, 13.5, 13.1];
    
    println!("üìã Complete Hypothesis Testing Workflow:");
    println!("{}", "-".repeat(40));
    
    // Step 1: Data exploration
    println!("1. üìä Data Exploration and Assumption Checking:");
    let mean_a = demo_data_a.mean();
    let mean_b = demo_data_b.mean();
    let std_a = demo_data_a.std(None);
    let std_b = demo_data_b.std(None);
    
    let explore_a = format!("   Group A: n = {}, mean = {:.2}, std = {:.2}", demo_data_a.len(), mean_a, std_a);
    println!("{}", explore_a);
    let explore_b = format!("   Group B: n = {}, mean = {:.2}, std = {:.2}", demo_data_b.len(), mean_b, std_b);
    println!("{}", explore_b);
    
    // Check assumption: equal variances
    let variance_ratio = std_a.powi(2) / std_b.powi(2);
    let equal_variances = variance_ratio > 0.5 && variance_ratio < 2.0;
    let var_check = format!("   Variance ratio: {:.2} (equal variances: {})", variance_ratio, equal_variances);
    println!("{}", var_check);
    
    println!();
    
    // Step 2: Hypothesis formulation
    println!("2. üéØ Hypothesis Formulation:");
    println!("   H‚ÇÄ: Œº‚ÇÅ = Œº‚ÇÇ (no difference between groups)");
    println!("   H‚ÇÅ: Œº‚ÇÅ ‚â† Œº‚ÇÇ (significant difference exists)");
    println!("   Œ± = 0.05 (Type I error rate)");
    
    println!();
    
    // Step 3: Test selection
    println!("3. üîß Statistical Test Selection:");
    let test_choice = if equal_variances {
        "Two-sample t-test (equal variances)"
    } else {
        "Welch's t-test (unequal variances)"
    };
    let test_msg = format!("   Selected test: {}", test_choice);
    println!("{}", test_msg);
    println!("   Rationale: Normal-like data, independent samples");
    
    println!();
    
    // Step 4: Test execution (using correct API)
    println!("4. ‚öóÔ∏è Test Execution:");
    let t_result = if equal_variances {
        demo_data_a.ttest_ind(&demo_data_b, Alternative::TwoSided)
    } else {
        demo_data_a.ttest_welch(&demo_data_b, Alternative::TwoSided)
    };
    
    let test_t = format!("   t-statistic: {:.3}", t_result.statistic);
    println!("{}", test_t);
    let test_p = format!("   p-value: {:.6}", t_result.p_value);
    println!("{}", test_p);
    
    println!();
    
    // Step 5: Decision and interpretation
    println!("5. ‚öñÔ∏è Statistical Decision and Interpretation:");
    let alpha = 0.05;
    let decision = if t_result.p_value < alpha {
        "Reject H‚ÇÄ"
    } else {
        "Fail to reject H‚ÇÄ"
    };
    let decision_msg = format!("   Decision: {}", decision);
    println!("{}", decision_msg);
    
    // Effect size calculation
    let pooled_std = {
        let n1 = demo_data_a.len() as f64;
        let n2 = demo_data_b.len() as f64;
        let var1 = std_a.powi(2);
        let var2 = std_b.powi(2);
        (((n1 - 1.0) * var1 + (n2 - 1.0) * var2) / (n1 + n2 - 2.0)).sqrt()
    };
    let cohens_d = (mean_b - mean_a) / pooled_std;
    
    let effect_msg = format!("   Effect size (Cohen's d): {:.3}", cohens_d);
    println!("{}", effect_msg);
    
    println!();
    
    // Best practices summary
    println!("üèÜ Hypothesis Testing Best Practices:");
    println!("{}", "-".repeat(35));
    
    println!("üìä Before Testing:");
    println!("   ‚Ä¢ Plan sample sizes using power analysis");
    println!("   ‚Ä¢ Formulate hypotheses before seeing data");
    println!("   ‚Ä¢ Choose significance level (Œ±) in advance");
    
    println!();
    println!("üîç During Analysis:");
    println!("   ‚Ä¢ Check test assumptions (normality, equal variances)");
    println!("   ‚Ä¢ Use non-parametric tests for violated assumptions");
    println!("   ‚Ä¢ Consider multiple comparison corrections if needed");
    
    println!();
    println!("üìà After Testing:");
    println!("   ‚Ä¢ Report effect sizes alongside p-values");
    println!("   ‚Ä¢ Interpret practical vs statistical significance");
    println!("   ‚Ä¢ Consider confidence intervals for parameter estimates");
    
    println!();
    println!("‚ö†Ô∏è Common Pitfalls to Avoid:");
    println!("   ‚Ä¢ p-hacking (testing until significant)");
    println!("   ‚Ä¢ Ignoring effect sizes (practical significance)");
    println!("   ‚Ä¢ Using wrong test for data type/distribution");
    println!("   ‚Ä¢ Confusing correlation with causation");
    
    println!();
    println!("üéØ Next Steps: Explore correlation analysis and advanced statistical modeling!");


üéØ Hypothesis Testing: Summary and Best Practices
üìã Complete Hypothesis Testing Workflow:
----------------------------------------
1. üìä Data Exploration and Assumption Checking:
   Group A: n = 8, mean = 12.32, std = 0.48
   Group B: n = 8, mean = 13.25, std = 0.30
   Variance ratio: 2.64 (equal variances: false)

2. üéØ Hypothesis Formulation:
   H‚ÇÄ: Œº‚ÇÅ = Œº‚ÇÇ (no difference between groups)
   H‚ÇÅ: Œº‚ÇÅ ‚â† Œº‚ÇÇ (significant difference exists)
   Œ± = 0.05 (Type I error rate)

3. üîß Statistical Test Selection:
   Selected test: Welch's t-test (unequal variances)
   Rationale: Normal-like data, independent samples

4. ‚öóÔ∏è Test Execution:
   t-statistic: -4.610
   p-value: 0.002683

5. ‚öñÔ∏è Statistical Decision and Interpretation:
   Decision: Reject H‚ÇÄ
   Effect size (Cohen's d): 2.305

üèÜ Hypothesis Testing Best Practices:
-----------------------------------
üìä Before Testing:
   ‚Ä¢ Plan sample sizes using power analysis
   ‚Ä¢ Formulate hypotheses befo