# DTSA 5003: Statistical Inference and Hypothesis Testing in Data Science Applications

## Course Overview and Quick Reference Guide

This notebook serves as a comprehensive overview and quick reference guide for the key concepts, techniques, and implementations covered in this course.

### Course Objectives
- Understanding hypothesis testing principles
- Implementing various statistical tests
- Analyzing test results
- Applying hypothesis testing in data science

In [None]:
# Import common libraries
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import seaborn as sns
from typing import Tuple, Dict, Optional

# Display settings
%matplotlib inline
plt.style.use('seaborn')
np.random.seed(42)

## Week 1: Introduction to Hypothesis Testing

### Key Concepts
- 

### Important Terms
- 

### Code Examples

In [None]:
def one_sample_ttest(data: np.ndarray, mu0: float, alpha: float = 0.05) -> Dict:
    """Perform one-sample t-test"""
    t_stat, p_value = stats.ttest_1samp(data, mu0)
    return {
        't_statistic': t_stat,
        'p_value': p_value,
        'reject_null': p_value < alpha
    }

## Week 2: Z-tests and T-tests

### Key Concepts
- 

### Important Tests
- 

### Code Examples

In [None]:
def two_sample_ttest(group1: np.ndarray, group2: np.ndarray, alpha: float = 0.05) -> Dict:
    """Perform two-sample t-test"""
    t_stat, p_value = stats.ttest_ind(group1, group2)
    return {
        't_statistic': t_stat,
        'p_value': p_value,
        'reject_null': p_value < alpha
    }

## Week 3: ANOVA and F-tests

### Key Concepts
- 

### Important Tests
- 

### Code Examples

In [None]:
def one_way_anova(*groups: np.ndarray, alpha: float = 0.05) -> Dict:
    """Perform one-way ANOVA"""
    f_stat, p_value = stats.f_oneway(*groups)
    return {
        'f_statistic': f_stat,
        'p_value': p_value,
        'reject_null': p_value < alpha
    }

## Week 4: Chi-Square Tests

### Key Concepts
- 

### Important Tests
- 

### Code Examples

In [None]:
def chi_square_test(observed: np.ndarray, expected: Optional[np.ndarray] = None, alpha: float = 0.05) -> Dict:
    """Perform chi-square test"""
    if expected is None:
        chi2, p_value = stats.chisquare(observed)
    else:
        chi2, p_value = stats.chisquare(observed, expected)
        
    return {
        'chi2_statistic': chi2,
        'p_value': p_value,
        'reject_null': p_value < alpha
    }

## Week 5: Non-parametric Tests

### Key Concepts
- 

### Important Tests
- 

### Code Examples

In [None]:
def mann_whitney_test(group1: np.ndarray, group2: np.ndarray, alpha: float = 0.05) -> Dict:
    """Perform Mann-Whitney U test"""
    stat, p_value = stats.mannwhitneyu(group1, group2)
    return {
        'statistic': stat,
        'p_value': p_value,
        'reject_null': p_value < alpha
    }

## Week 6: Multiple Testing

### Key Concepts
- 

### Important Methods
- 

### Code Examples

In [None]:
def bonferroni_correction(p_values: np.ndarray, alpha: float = 0.05) -> np.ndarray:
    """Apply Bonferroni correction for multiple testing"""
    n_tests = len(p_values)
    return p_values < (alpha / n_tests)

def benjamini_hochberg(p_values: np.ndarray, alpha: float = 0.05) -> np.ndarray:
    """Apply Benjamini-Hochberg procedure"""
    n_tests = len(p_values)
    sorted_indices = np.argsort(p_values)
    sorted_p = p_values[sorted_indices]
    
    threshold = np.arange(1, n_tests + 1) * alpha / n_tests
    reject = sorted_p <= threshold
    
    result = np.zeros_like(p_values, dtype=bool)
    result[sorted_indices] = reject
    return result

## Week 7: Power Analysis

### Key Concepts
- 

### Important Methods
- 

### Code Examples

In [None]:
def power_analysis_ttest(effect_size: float, n: int, alpha: float = 0.05) -> float:
    """Calculate power for one-sample t-test"""
    from statsmodels.stats.power import TTestPower
    power_analysis = TTestPower()
    return power_analysis.power(effect_size=effect_size, nobs=n, alpha=alpha)

## Week 8: Advanced Topics in Hypothesis Testing

### Key Concepts
- 

### Important Methods
- 

### Code Examples

In [None]:
def permutation_test(group1: np.ndarray, group2: np.ndarray, n_permutations: int = 1000, alpha: float = 0.05) -> Dict:
    """Perform permutation test"""
    observed_diff = np.mean(group1) - np.mean(group2)
    combined = np.concatenate([group1, group2])
    n1 = len(group1)
    
    perm_diffs = []
    for _ in range(n_permutations):
        np.random.shuffle(combined)
        perm_diff = np.mean(combined[:n1]) - np.mean(combined[n1:])
        perm_diffs.append(perm_diff)
        
    p_value = np.mean(np.abs(perm_diffs) >= np.abs(observed_diff))
    
    return {
        'observed_difference': observed_diff,
        'p_value': p_value,
        'reject_null': p_value < alpha
    }

## Additional Resources and References

### Useful Libraries
- SciPy: Statistical tests
- StatsModels: Advanced statistical analysis
- Scikit-learn: Machine learning metrics
- Pingouin: Statistical testing

### External Links
- Course materials
- Statistical testing resources
- Practice problems

### Personal Notes
- Key formulas
- Test selection guide
- Common pitfalls