# Module 03: Statistical Inference

**Difficulty**: ⭐⭐ Intermediate

**Estimated Time**: 90 minutes

**Prerequisites**: 
- Module 01: Descriptive Statistics
- Module 02: Probability Fundamentals
- Understanding of mean, standard deviation, and normal distribution

## Learning Objectives

By the end of this notebook, you will be able to:
1. Understand sampling distributions and their relationship to population parameters
2. Apply the Central Limit Theorem to make inferences about populations
3. Calculate and interpret confidence intervals
4. Conduct hypothesis tests using p-values
5. Understand Type I and Type II errors
6. Apply statistical inference to A/B testing scenarios

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# Configure visualization
%matplotlib inline
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")

# Set random seed for reproducibility
np.random.seed(42)

# Display options
np.set_printoptions(precision=4, suppress=True)
pd.set_option('display.precision', 4)

print("Setup complete!")

## 1. Population vs Sample

### Key Concepts:

**Population**: The entire group we want to study
- Population mean: $\mu$
- Population standard deviation: $\sigma$
- Usually impossible or impractical to measure

**Sample**: A subset of the population
- Sample mean: $\bar{x}$
- Sample standard deviation: $s$
- Used to estimate population parameters

**Statistical Inference**: Using sample data to make conclusions about the population

**Why sampling?**
- Cost-effective
- Time-efficient  
- Sometimes destructive testing (you can't test all lightbulbs!)
- Population may be infinite

In [None]:
# Example: Population vs Sample
# Imagine the TRUE population of heights (which we normally don't know)

# Create a population of 10,000 people
population_mean = 170  # cm
population_std = 10  # cm
population_size = 10000

population = np.random.normal(population_mean, population_std, population_size)

# Take a sample of 100 people
sample_size = 100
sample = np.random.choice(population, size=sample_size, replace=False)

print("=== Population Parameters (TRUE values) ===")
print(f"Population mean (μ): {population_mean} cm")
print(f"Population std (σ): {population_std} cm")
print(f"Actual population mean: {np.mean(population):.2f} cm")
print(f"Actual population std: {np.std(population):.2f} cm")

print(f"\n=== Sample Statistics (estimates) ===")
print(f"Sample size (n): {sample_size}")
print(f"Sample mean (x̄): {np.mean(sample):.2f} cm")
print(f"Sample std (s): {np.std(sample, ddof=1):.2f} cm")  # ddof=1 for sample std

print(f"\nDifference between sample mean and true mean: {abs(np.mean(sample) - population_mean):.2f} cm")
print("\nThe sample mean is our ESTIMATE of the population mean.")

In [None]:
# Visualize population vs sample

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Population
axes[0].hist(population, bins=50, edgecolor='black', alpha=0.7, color='lightblue')
axes[0].axvline(np.mean(population), color='red', linestyle='--', linewidth=2.5,
               label=f'Population μ = {np.mean(population):.2f}')
axes[0].set_title(f'Population (N = {population_size:,})', fontsize=13, fontweight='bold')
axes[0].set_xlabel('Height (cm)')
axes[0].set_ylabel('Frequency')
axes[0].legend(fontsize=11)

# Sample
axes[1].hist(sample, bins=20, edgecolor='black', alpha=0.7, color='lightgreen')
axes[1].axvline(np.mean(sample), color='blue', linestyle='--', linewidth=2.5,
               label=f'Sample x̄ = {np.mean(sample):.2f}')
axes[1].axvline(np.mean(population), color='red', linestyle='--', linewidth=2.5,
               label=f'True μ = {np.mean(population):.2f}', alpha=0.7)
axes[1].set_title(f'Sample (n = {sample_size})', fontsize=13, fontweight='bold')
axes[1].set_xlabel('Height (cm)')
axes[1].set_ylabel('Frequency')
axes[1].legend(fontsize=11)

plt.tight_layout()
plt.show()

## 2. Sampling Distribution and the Central Limit Theorem

### Sampling Distribution
The distribution of a statistic (like the mean) if we repeated our sampling many times.

### Central Limit Theorem (CLT)
**One of the most important theorems in statistics!**

For a population with mean $\mu$ and standard deviation $\sigma$, the sampling distribution of the sample mean $\bar{x}$ approaches a normal distribution as sample size $n$ increases:

$$\bar{x} \sim N\left(\mu, \frac{\sigma}{\sqrt{n}}\right)$$

**Key insights**:
1. The mean of the sampling distribution equals the population mean: $E[\bar{x}] = \mu$
2. The standard deviation of the sampling distribution (standard error): $SE = \frac{\sigma}{\sqrt{n}}$
3. Works even if the population isn't normally distributed!
4. Larger samples give more precise estimates (smaller SE)

In [None]:
# Demonstrate the Central Limit Theorem
# Take many samples and look at the distribution of their means

num_samples = 10000  # Number of samples to take
sample_size = 30  # Size of each sample

sample_means = []

# Take many samples and calculate their means
for i in range(num_samples):
    sample = np.random.choice(population, size=sample_size, replace=False)
    sample_means.append(np.mean(sample))

sample_means = np.array(sample_means)

# Theoretical values from CLT
theoretical_mean = population_mean
theoretical_se = population_std / np.sqrt(sample_size)

print("=== Central Limit Theorem Demonstration ===")
print(f"Number of samples: {num_samples:,}")
print(f"Sample size: {sample_size}")
print(f"\nTheoretical (from CLT):")
print(f"  Mean of sampling distribution: {theoretical_mean:.2f}")
print(f"  Standard error (SE): {theoretical_se:.2f}")
print(f"\nActual (from simulation):")
print(f"  Mean of sample means: {np.mean(sample_means):.2f}")
print(f"  Standard deviation of sample means: {np.std(sample_means):.2f}")
print(f"\nThe simulation confirms the CLT predictions!")

In [None]:
# Visualize the sampling distribution

plt.figure(figsize=(12, 6))
plt.hist(sample_means, bins=50, density=True, edgecolor='black', 
         alpha=0.7, label='Sampling distribution (simulated)')

# Overlay the theoretical normal distribution
x = np.linspace(sample_means.min(), sample_means.max(), 100)
theoretical_dist = stats.norm.pdf(x, theoretical_mean, theoretical_se)
plt.plot(x, theoretical_dist, 'r-', linewidth=3, 
         label=f'Theoretical Normal(μ={theoretical_mean}, SE={theoretical_se:.2f})')

plt.axvline(theoretical_mean, color='green', linestyle='--', linewidth=2,
           label=f'Population mean = {theoretical_mean}')
plt.xlabel('Sample Mean', fontsize=12)
plt.ylabel('Density', fontsize=12)
plt.title(f'Sampling Distribution of Sample Mean (n={sample_size})', 
         fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("The histogram matches the theoretical normal curve - this is the CLT in action!")

In [None]:
# Effect of sample size on standard error

sample_sizes = [5, 10, 30, 50, 100, 200]
standard_errors = []

fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

for idx, n in enumerate(sample_sizes):
    # Take many samples of size n
    means = []
    for _ in range(1000):
        sample = np.random.choice(population, size=n, replace=False)
        means.append(np.mean(sample))
    
    means = np.array(means)
    se = np.std(means)
    standard_errors.append(se)
    
    # Plot
    axes[idx].hist(means, bins=30, edgecolor='black', alpha=0.7)
    axes[idx].axvline(population_mean, color='red', linestyle='--', linewidth=2)
    axes[idx].set_title(f'n={n}, SE={se:.2f}', fontweight='bold')
    axes[idx].set_xlabel('Sample Mean')
    axes[idx].set_xlim(150, 190)

plt.tight_layout()
plt.show()

print("=== Sample Size Effect ===")
for n, se in zip(sample_sizes, standard_errors):
    print(f"n={n:3d}: SE={se:.2f}")
print("\nAs sample size increases, the standard error DECREASES!")
print("Larger samples give more precise estimates.")

## 3. Confidence Intervals

A **confidence interval** gives a range of plausible values for a population parameter.

### For a population mean (when σ is known):

$$CI = \bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}$$

### For a population mean (when σ is unknown - use t-distribution):

$$CI = \bar{x} \pm t_{\alpha/2, n-1} \cdot \frac{s}{\sqrt{n}}$$

**Interpretation**: We are X% confident that the true population mean lies within this interval.

**Common confidence levels**:
- 90% CI: $z = 1.645$
- 95% CI: $z = 1.96$ (most common)
- 99% CI: $z = 2.576$

In [None]:
# Calculate a 95% confidence interval

# Take a sample
sample_size = 50
sample_data = np.random.choice(population, size=sample_size, replace=False)

# Sample statistics
sample_mean = np.mean(sample_data)
sample_std = np.std(sample_data, ddof=1)
sample_se = sample_std / np.sqrt(sample_size)

# 95% confidence interval using t-distribution
confidence_level = 0.95
alpha = 1 - confidence_level
t_critical = stats.t.ppf(1 - alpha/2, df=sample_size - 1)

margin_of_error = t_critical * sample_se
ci_lower = sample_mean - margin_of_error
ci_upper = sample_mean + margin_of_error

print("=== 95% Confidence Interval ===")
print(f"Sample size: {sample_size}")
print(f"Sample mean: {sample_mean:.2f} cm")
print(f"Sample std: {sample_std:.2f} cm")
print(f"Standard error: {sample_se:.2f} cm")
print(f"\nt-critical value (df={sample_size-1}): {t_critical:.3f}")
print(f"Margin of error: {margin_of_error:.2f} cm")
print(f"\n95% Confidence Interval: [{ci_lower:.2f}, {ci_upper:.2f}] cm")
print(f"\nInterpretation: We are 95% confident that the true population mean")
print(f"is between {ci_lower:.2f} cm and {ci_upper:.2f} cm.")

# Check if the true mean is in the interval
contains_true_mean = ci_lower <= population_mean <= ci_upper
print(f"\nTrue population mean: {population_mean} cm")
print(f"Does the CI contain the true mean? {contains_true_mean}")

In [None]:
# Visualize what "95% confident" means
# If we repeated the sampling 100 times, about 95 CIs would contain the true mean

num_trials = 100
sample_size = 30
confidence_level = 0.95

ci_contains_mean = []
ci_lowers = []
ci_uppers = []
sample_means_list = []

for i in range(num_trials):
    # Take a sample
    sample = np.random.choice(population, size=sample_size, replace=False)
    s_mean = np.mean(sample)
    s_std = np.std(sample, ddof=1)
    s_se = s_std / np.sqrt(sample_size)
    
    # Calculate CI
    t_crit = stats.t.ppf(1 - (1-confidence_level)/2, df=sample_size - 1)
    moe = t_crit * s_se
    ci_low = s_mean - moe
    ci_high = s_mean + moe
    
    ci_lowers.append(ci_low)
    ci_uppers.append(ci_high)
    sample_means_list.append(s_mean)
    
    # Check if CI contains true mean
    ci_contains_mean.append(ci_low <= population_mean <= ci_high)

# Count how many CIs contain the true mean
num_containing = sum(ci_contains_mean)
percentage = (num_containing / num_trials) * 100

print(f"=== Confidence Interval Simulation ===")
print(f"Number of samples: {num_trials}")
print(f"Confidence level: {confidence_level*100:.0f}%")
print(f"\nCIs containing true mean: {num_containing}/{num_trials} ({percentage:.1f}%)")
print(f"Expected: ~{confidence_level*100:.0f}%")

In [None]:
# Visualize the confidence intervals

plt.figure(figsize=(12, 10))

# Plot only first 50 CIs for clarity
for i in range(min(50, num_trials)):
    color = 'green' if ci_contains_mean[i] else 'red'
    alpha = 0.7 if ci_contains_mean[i] else 1.0
    plt.plot([ci_lowers[i], ci_uppers[i]], [i, i], color=color, alpha=alpha, linewidth=1.5)
    plt.scatter(sample_means_list[i], i, color=color, s=20, zorder=3)

plt.axvline(population_mean, color='blue', linestyle='--', linewidth=2.5, 
           label=f'True population mean = {population_mean}')
plt.xlabel('Height (cm)', fontsize=12)
plt.ylabel('Sample Number', fontsize=12)
plt.title(f'95% Confidence Intervals from {min(50, num_trials)} Samples\nGreen = Contains true mean, Red = Misses', 
         fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.show()

print(f"\nAbout {confidence_level*100:.0f}% of the intervals (green) contain the true mean.")
print(f"About {(1-confidence_level)*100:.0f}% miss the true mean (red).")

## 4. Hypothesis Testing

**Hypothesis testing** is a formal process for making decisions based on data.

### The Process:

1. **State hypotheses**:
   - Null hypothesis ($H_0$): The status quo, no effect
   - Alternative hypothesis ($H_a$ or $H_1$): What we're testing for

2. **Choose significance level** ($\alpha$): Usually 0.05 (5%)

3. **Calculate test statistic**:
   $$t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}$$

4. **Find p-value**: Probability of observing data this extreme if $H_0$ is true

5. **Make decision**:
   - If p-value < $\alpha$: Reject $H_0$ (result is statistically significant)
   - If p-value ≥ $\alpha$: Fail to reject $H_0$ (insufficient evidence)

### Types of Tests:
- **Two-tailed**: $H_a: \mu \neq \mu_0$
- **Right-tailed**: $H_a: \mu > \mu_0$
- **Left-tailed**: $H_a: \mu < \mu_0$

In [None]:
# Example: Testing if a new diet increases average weight loss
# H0: μ = 0 kg (no weight loss)
# Ha: μ > 0 kg (positive weight loss)

# Sample of 30 people on the diet
weight_loss = np.array([2.1, 3.5, 1.8, 4.2, 2.9, 3.1, 1.5, 2.8, 3.9, 2.4,
                       3.3, 2.7, 1.9, 3.6, 2.2, 3.8, 2.5, 3.4, 1.7, 2.9,
                       3.2, 2.6, 3.7, 2.3, 3.5, 2.8, 3.1, 2.4, 3.3, 2.7])

n = len(weight_loss)
sample_mean = np.mean(weight_loss)
sample_std = np.std(weight_loss, ddof=1)
hypothesized_mean = 0  # H0: no weight loss

# Calculate t-statistic
t_statistic = (sample_mean - hypothesized_mean) / (sample_std / np.sqrt(n))

# Calculate p-value (one-tailed test)
p_value = 1 - stats.t.cdf(t_statistic, df=n-1)

alpha = 0.05

print("=== Hypothesis Test: Weight Loss Diet ===")
print(f"H0: μ = {hypothesized_mean} kg (no effect)")
print(f"Ha: μ > {hypothesized_mean} kg (diet works)")
print(f"\nSample size: {n}")
print(f"Sample mean: {sample_mean:.2f} kg")
print(f"Sample std: {sample_std:.2f} kg")
print(f"\nt-statistic: {t_statistic:.4f}")
print(f"p-value: {p_value:.6f}")
print(f"Significance level (α): {alpha}")

print(f"\n{'='*50}")
if p_value < alpha:
    print(f"CONCLUSION: Reject H0 (p < {alpha})")
    print(f"The diet DOES lead to significant weight loss!")
else:
    print(f"CONCLUSION: Fail to reject H0 (p ≥ {alpha})")
    print(f"Insufficient evidence that the diet works.")

print(f"\nInterpretation: There's only a {p_value*100:.4f}% chance of seeing")
print(f"this much weight loss (or more) if the diet has no effect.")

In [None]:
# Visualize the hypothesis test

# Create t-distribution
x = np.linspace(-4, 6, 1000)
y = stats.t.pdf(x, df=n-1)

plt.figure(figsize=(12, 6))
plt.plot(x, y, linewidth=2.5, label=f't-distribution (df={n-1})')

# Shade rejection region (right tail)
t_critical = stats.t.ppf(1 - alpha, df=n-1)
x_reject = x[x >= t_critical]
y_reject = stats.t.pdf(x_reject, df=n-1)
plt.fill_between(x_reject, y_reject, alpha=0.3, color='red', 
                label=f'Rejection region (α={alpha})')

# Shade p-value region
x_pvalue = x[x >= t_statistic]
y_pvalue = stats.t.pdf(x_pvalue, df=n-1)
plt.fill_between(x_pvalue, y_pvalue, alpha=0.5, color='orange',
                label=f'p-value = {p_value:.4f}')

# Mark critical value and test statistic
plt.axvline(t_critical, color='red', linestyle='--', linewidth=2,
           label=f't-critical = {t_critical:.3f}')
plt.axvline(t_statistic, color='blue', linestyle='--', linewidth=2.5,
           label=f't-statistic = {t_statistic:.3f}')

plt.xlabel('t-value', fontsize=12)
plt.ylabel('Probability Density', fontsize=12)
plt.title('One-Tailed Hypothesis Test Visualization', fontsize=14, fontweight='bold')
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"Since t-statistic ({t_statistic:.3f}) > t-critical ({t_critical:.3f}),")
print(f"we reject H0 at the {alpha} significance level.")

## 5. Type I and Type II Errors

### Error Types:

|  | $H_0$ is True | $H_0$ is False |
|---|---|---|
| **Reject $H_0$** | Type I Error (α) | Correct! (Power = 1-β) |
| **Fail to Reject $H_0$** | Correct! | Type II Error (β) |

**Type I Error (False Positive)**:
- Rejecting $H_0$ when it's actually true
- Probability = $\alpha$ (significance level)
- Example: Concluding a drug works when it doesn't

**Type II Error (False Negative)**:
- Failing to reject $H_0$ when it's actually false
- Probability = $\beta$
- Example: Missing a real effect

**Statistical Power**:
- Probability of correctly rejecting a false $H_0$
- Power = $1 - \beta$
- Higher power = better ability to detect real effects

In [None]:
# Simulate Type I and Type II errors

# Scenario: Testing if a coin is biased
# H0: p = 0.5 (fair coin)
# Ha: p ≠ 0.5 (biased coin)

num_simulations = 10000
n_flips = 100
alpha = 0.05

# Type I Error: H0 is TRUE (coin is fair, p=0.5)
type_i_errors = 0
for _ in range(num_simulations):
    # Flip a FAIR coin
    flips = np.random.binomial(n_flips, 0.5)
    p_hat = flips / n_flips
    
    # Two-tailed test
    se = np.sqrt(0.5 * 0.5 / n_flips)
    z = (p_hat - 0.5) / se
    p_value = 2 * (1 - stats.norm.cdf(abs(z)))
    
    # If we reject H0 (but it's true!), that's a Type I error
    if p_value < alpha:
        type_i_errors += 1

type_i_rate = type_i_errors / num_simulations

print("=== Type I Error Simulation ===")
print(f"Scenario: Coin IS fair (p = 0.5)")
print(f"Number of tests: {num_simulations:,}")
print(f"Significance level (α): {alpha}")
print(f"\nType I errors: {type_i_errors:,}")
print(f"Type I error rate: {type_i_rate:.4f}")
print(f"Expected rate: {alpha}")
print(f"\nInterpretation: About {type_i_rate*100:.1f}% of the time, we incorrectly")
print(f"concluded the fair coin was biased!")

In [None]:
# Type II Error: H0 is FALSE (coin is biased, p=0.6)
true_p = 0.6  # Coin is actually biased
type_ii_errors = 0

for _ in range(num_simulations):
    # Flip a BIASED coin (p=0.6)
    flips = np.random.binomial(n_flips, true_p)
    p_hat = flips / n_flips
    
    # Two-tailed test (using null hypothesis p=0.5)
    se = np.sqrt(0.5 * 0.5 / n_flips)
    z = (p_hat - 0.5) / se
    p_value = 2 * (1 - stats.norm.cdf(abs(z)))
    
    # If we fail to reject H0 (but it's false!), that's a Type II error
    if p_value >= alpha:
        type_ii_errors += 1

type_ii_rate = type_ii_errors / num_simulations
power = 1 - type_ii_rate

print("=== Type II Error Simulation ===")
print(f"Scenario: Coin IS biased (p = {true_p})")
print(f"Number of tests: {num_simulations:,}")
print(f"\nType II errors (β): {type_ii_errors:,}")
print(f"Type II error rate: {type_ii_rate:.4f}")
print(f"Statistical Power (1-β): {power:.4f}")
print(f"\nInterpretation: {type_ii_rate*100:.1f}% of the time, we failed to detect")
print(f"that the coin was biased. Our test has {power*100:.1f}% power.")

## 6. A/B Testing

**A/B testing** compares two groups to determine if there's a significant difference.

**Common applications**:
- Website design changes
- Marketing campaigns
- Product features
- Medical treatments

### Two-Sample t-test:

$$t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}$$

$H_0$: The two groups have the same mean ($\mu_1 = \mu_2$)

$H_a$: The two groups have different means ($\mu_1 \neq \mu_2$)

In [None]:
# A/B Testing Example: Website redesign
# Does the new design increase time spent on site?

np.random.seed(42)

# Control group (old design) - average 5 minutes
control_group = np.random.normal(loc=5.0, scale=1.5, size=200)

# Treatment group (new design) - average 5.5 minutes
treatment_group = np.random.normal(loc=5.5, scale=1.5, size=200)

# Calculate statistics
control_mean = np.mean(control_group)
control_std = np.std(control_group, ddof=1)
treatment_mean = np.mean(treatment_group)
treatment_std = np.std(treatment_group, ddof=1)

print("=== A/B Test: Website Redesign ===")
print(f"\nControl Group (Old Design):")
print(f"  n = {len(control_group)}")
print(f"  Mean = {control_mean:.2f} minutes")
print(f"  Std = {control_std:.2f} minutes")

print(f"\nTreatment Group (New Design):")
print(f"  n = {len(treatment_group)}")
print(f"  Mean = {treatment_mean:.2f} minutes")
print(f"  Std = {treatment_std:.2f} minutes")

print(f"\nDifference in means: {treatment_mean - control_mean:.2f} minutes")

In [None]:
# Perform two-sample t-test

# Using scipy's ttest_ind
t_stat, p_value = stats.ttest_ind(treatment_group, control_group)

alpha = 0.05

print("=== Two-Sample t-Test Results ===")
print(f"H0: μ_treatment = μ_control (no difference)")
print(f"Ha: μ_treatment ≠ μ_control (there is a difference)")
print(f"\nt-statistic: {t_stat:.4f}")
print(f"p-value: {p_value:.6f}")
print(f"Significance level: {alpha}")

print(f"\n{'='*50}")
if p_value < alpha:
    print(f"CONCLUSION: Reject H0 (p < {alpha})")
    print(f"The new design SIGNIFICANTLY increases time on site!")
    print(f"Average increase: {treatment_mean - control_mean:.2f} minutes")
else:
    print(f"CONCLUSION: Fail to reject H0 (p ≥ {alpha})")
    print(f"No significant difference between designs.")

# Calculate effect size (Cohen's d)
pooled_std = np.sqrt((control_std**2 + treatment_std**2) / 2)
cohens_d = (treatment_mean - control_mean) / pooled_std

print(f"\nEffect Size (Cohen's d): {cohens_d:.3f}")
if abs(cohens_d) < 0.2:
    print("Effect size: Small")
elif abs(cohens_d) < 0.5:
    print("Effect size: Medium")
else:
    print("Effect size: Large")

In [None]:
# Visualize the A/B test results

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Histograms
axes[0].hist(control_group, bins=30, alpha=0.6, label='Control (Old)', 
            edgecolor='black', color='blue')
axes[0].hist(treatment_group, bins=30, alpha=0.6, label='Treatment (New)', 
            edgecolor='black', color='red')
axes[0].axvline(control_mean, color='blue', linestyle='--', linewidth=2.5,
               label=f'Control mean = {control_mean:.2f}')
axes[0].axvline(treatment_mean, color='red', linestyle='--', linewidth=2.5,
               label=f'Treatment mean = {treatment_mean:.2f}')
axes[0].set_xlabel('Time on Site (minutes)', fontsize=12)
axes[0].set_ylabel('Frequency', fontsize=12)
axes[0].set_title('Distribution Comparison', fontsize=13, fontweight='bold')
axes[0].legend(fontsize=10)

# Box plots
data_for_box = [control_group, treatment_group]
bp = axes[1].boxplot(data_for_box, labels=['Control', 'Treatment'],
                     patch_artist=True)
bp['boxes'][0].set_facecolor('lightblue')
bp['boxes'][1].set_facecolor('lightcoral')
axes[1].set_ylabel('Time on Site (minutes)', fontsize=12)
axes[1].set_title('Box Plot Comparison', fontsize=13, fontweight='bold')
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print(f"The treatment group clearly shows higher time on site.")
print(f"This visual difference is confirmed by the statistical test (p = {p_value:.6f}).")

## 7. Practice Exercises

### Exercise 1: Confidence Interval Calculation

A sample of 40 customers spent an average of $\$85$ with a standard deviation of $\$12$ at a store.

Tasks:
1. Calculate a 90% confidence interval for the population mean
2. Calculate a 99% confidence interval for the population mean
3. Explain why the 99% CI is wider than the 90% CI

In [None]:
# Your code here
n = 40
sample_mean = 85
sample_std = 12
se = sample_std / np.sqrt(n)

print("=== Exercise 1 Solution ===")
print(f"Sample size: {n}")
print(f"Sample mean: ${sample_mean}")
print(f"Sample std: ${sample_std}")
print(f"Standard error: ${se:.2f}")

# 1. 90% CI
conf_90 = 0.90
alpha_90 = 1 - conf_90
t_crit_90 = stats.t.ppf(1 - alpha_90/2, df=n-1)
moe_90 = t_crit_90 * se
ci_90_lower = sample_mean - moe_90
ci_90_upper = sample_mean + moe_90

print(f"\n1. 90% Confidence Interval:")
print(f"   t-critical (df={n-1}): {t_crit_90:.3f}")
print(f"   Margin of error: ${moe_90:.2f}")
print(f"   CI: [${ci_90_lower:.2f}, ${ci_90_upper:.2f}]")

# 2. 99% CI
conf_99 = 0.99
alpha_99 = 1 - conf_99
t_crit_99 = stats.t.ppf(1 - alpha_99/2, df=n-1)
moe_99 = t_crit_99 * se
ci_99_lower = sample_mean - moe_99
ci_99_upper = sample_mean + moe_99

print(f"\n2. 99% Confidence Interval:")
print(f"   t-critical (df={n-1}): {t_crit_99:.3f}")
print(f"   Margin of error: ${moe_99:.2f}")
print(f"   CI: [${ci_99_lower:.2f}, ${ci_99_upper:.2f}]")

print(f"\n3. Why is the 99% CI wider?")
print(f"   90% CI width: ${ci_90_upper - ci_90_lower:.2f}")
print(f"   99% CI width: ${ci_99_upper - ci_99_lower:.2f}")
print(f"\n   Explanation: To be MORE confident (99% vs 90%) that we've")
print(f"   captured the true population mean, we need a WIDER interval.")
print(f"   Higher confidence requires more uncertainty (wider range).")

### Exercise 2: Hypothesis Test

A coffee shop claims their average wait time is 5 minutes. You suspect it's actually longer.
You measure wait times for 25 customers and get: mean = 5.8 minutes, std = 1.2 minutes.

Tasks:
1. Set up the null and alternative hypotheses
2. Calculate the t-statistic and p-value
3. Make a decision at α = 0.05
4. Visualize the hypothesis test

In [None]:
# Your code here
n = 25
sample_mean = 5.8
sample_std = 1.2
claimed_mean = 5.0
alpha = 0.05

print("=== Exercise 2 Solution ===")

# 1. Hypotheses
print("1. Hypotheses:")
print(f"   H0: μ = {claimed_mean} minutes (wait time is as claimed)")
print(f"   Ha: μ > {claimed_mean} minutes (wait time is longer)")
print(f"   This is a ONE-TAILED (right-tailed) test.")

# 2. Calculate t-statistic and p-value
se = sample_std / np.sqrt(n)
t_stat = (sample_mean - claimed_mean) / se
p_value = 1 - stats.t.cdf(t_stat, df=n-1)

print(f"\n2. Test Statistics:")
print(f"   Sample mean: {sample_mean} minutes")
print(f"   Standard error: {se:.4f}")
print(f"   t-statistic: {t_stat:.4f}")
print(f"   p-value: {p_value:.6f}")

# 3. Decision
print(f"\n3. Decision at α = {alpha}:")
if p_value < alpha:
    print(f"   p-value ({p_value:.6f}) < α ({alpha})")
    print(f"   REJECT H0")
    print(f"   Conclusion: The wait time IS significantly longer than claimed.")
else:
    print(f"   p-value ({p_value:.6f}) ≥ α ({alpha})")
    print(f"   FAIL TO REJECT H0")
    print(f"   Conclusion: Insufficient evidence that wait time is longer.")

# 4. Visualization
x = np.linspace(-4, 6, 1000)
y = stats.t.pdf(x, df=n-1)

plt.figure(figsize=(12, 6))
plt.plot(x, y, linewidth=2.5, label=f't-distribution (df={n-1})')

# Critical value
t_crit = stats.t.ppf(1 - alpha, df=n-1)
x_reject = x[x >= t_crit]
y_reject = stats.t.pdf(x_reject, df=n-1)
plt.fill_between(x_reject, y_reject, alpha=0.3, color='red',
                label=f'Rejection region (α={alpha})')

# p-value region
x_pvalue = x[x >= t_stat]
y_pvalue = stats.t.pdf(x_pvalue, df=n-1)
plt.fill_between(x_pvalue, y_pvalue, alpha=0.5, color='orange',
                label=f'p-value = {p_value:.4f}')

plt.axvline(t_crit, color='red', linestyle='--', linewidth=2,
           label=f't-critical = {t_crit:.3f}')
plt.axvline(t_stat, color='blue', linestyle='--', linewidth=2.5,
           label=f't-statistic = {t_stat:.3f}')

plt.xlabel('t-value', fontsize=12)
plt.ylabel('Probability Density', fontsize=12)
plt.title('One-Tailed Hypothesis Test: Coffee Shop Wait Times', 
         fontsize=14, fontweight='bold')
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

### Exercise 3: A/B Test

An e-commerce site tests two checkout button colors:
- Red button: 180 users, 54 conversions (30%)
- Blue button: 200 users, 70 conversions (35%)

Tasks:
1. State the hypotheses
2. Perform a two-proportion z-test
3. Calculate the p-value
4. Make a conclusion at α = 0.05

In [None]:
# Your code here
n_red = 180
conversions_red = 54
n_blue = 200
conversions_blue = 70

p_red = conversions_red / n_red
p_blue = conversions_blue / n_blue

alpha = 0.05

print("=== Exercise 3 Solution: A/B Test ===")

# 1. Hypotheses
print("1. Hypotheses:")
print("   H0: p_red = p_blue (no difference in conversion rates)")
print("   Ha: p_red ≠ p_blue (conversion rates are different)")

print(f"\nData:")
print(f"   Red button: {conversions_red}/{n_red} = {p_red:.4f} ({p_red*100:.2f}%)")
print(f"   Blue button: {conversions_blue}/{n_blue} = {p_blue:.4f} ({p_blue*100:.2f}%)")
print(f"   Difference: {(p_blue - p_red)*100:.2f} percentage points")

# 2. Two-proportion z-test
# Pooled proportion
p_pool = (conversions_red + conversions_blue) / (n_red + n_blue)

# Standard error
se = np.sqrt(p_pool * (1 - p_pool) * (1/n_red + 1/n_blue))

# Z-statistic
z_stat = (p_blue - p_red) / se

# P-value (two-tailed)
p_value = 2 * (1 - stats.norm.cdf(abs(z_stat)))

print(f"\n2. Test Statistics:")
print(f"   Pooled proportion: {p_pool:.4f}")
print(f"   Standard error: {se:.4f}")
print(f"   z-statistic: {z_stat:.4f}")

print(f"\n3. P-value: {p_value:.6f}")

# 4. Conclusion
print(f"\n4. Decision at α = {alpha}:")
if p_value < alpha:
    print(f"   p-value ({p_value:.6f}) < α ({alpha})")
    print(f"   REJECT H0")
    print(f"   Conclusion: Blue button has SIGNIFICANTLY higher conversion rate!")
    print(f"   Recommendation: Use the BLUE button.")
else:
    print(f"   p-value ({p_value:.6f}) ≥ α ({alpha})")
    print(f"   FAIL TO REJECT H0")
    print(f"   Conclusion: No significant difference in conversion rates.")
    print(f"   Recommendation: Either button is fine, or collect more data.")

# Visualization
categories = ['Red Button', 'Blue Button']
conversion_rates = [p_red * 100, p_blue * 100]
sample_sizes = [n_red, n_blue]

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Conversion rates
bars = axes[0].bar(categories, conversion_rates, color=['red', 'blue'], 
                   alpha=0.7, edgecolor='black')
axes[0].set_ylabel('Conversion Rate (%)', fontsize=12)
axes[0].set_title('Conversion Rate Comparison', fontsize=13, fontweight='bold')
axes[0].grid(True, alpha=0.3, axis='y')

# Add value labels
for bar, rate in zip(bars, conversion_rates):
    height = bar.get_height()
    axes[0].text(bar.get_x() + bar.get_width()/2., height,
                f'{rate:.1f}%', ha='center', va='bottom', fontsize=11, fontweight='bold')

# Stacked bar for conversions
converted = [conversions_red, conversions_blue]
not_converted = [n_red - conversions_red, n_blue - conversions_blue]

axes[1].bar(categories, converted, label='Converted', color='green', alpha=0.7, edgecolor='black')
axes[1].bar(categories, not_converted, bottom=converted, label='Not Converted', 
           color='gray', alpha=0.7, edgecolor='black')
axes[1].set_ylabel('Number of Users', fontsize=12)
axes[1].set_title('Conversion Counts', fontsize=13, fontweight='bold')
axes[1].legend(fontsize=11)
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

## 8. Summary and Key Takeaways

In this module, you learned:

✅ **Population vs Sample**
- Populations are complete groups; samples are subsets
- Sample statistics estimate population parameters
- $\bar{x}$ estimates $\mu$, $s$ estimates $\sigma$

✅ **Central Limit Theorem**
- Sample means follow normal distribution for large n
- Standard error: $SE = \sigma / \sqrt{n}$
- Larger samples → smaller standard error → more precise estimates

✅ **Confidence Intervals**
- Range of plausible values for population parameter
- Higher confidence → wider interval
- 95% CI: About 95% of such intervals contain true parameter

✅ **Hypothesis Testing**
- Formal framework for making decisions from data
- p-value: Probability of data if H₀ is true
- If p < α: Reject H₀ (statistically significant)
- Type I error (α): False positive
- Type II error (β): False negative

✅ **A/B Testing**
- Compare two groups to find significant differences
- Two-sample t-test for means
- Two-proportion z-test for proportions
- Critical for data-driven decision making

### What's Next?

In **Module 04: Linear Algebra Foundations**, you'll learn:
- Vectors and vector operations
- Matrices and matrix operations
- Systems of linear equations
- Applications to data science and machine learning

### Additional Resources

- [Khan Academy - Inference](https://www.khanacademy.org/math/statistics-probability/significance-tests-one-sample)
- [StatQuest - Hypothesis Testing](https://www.youtube.com/watch?v=0oc49DyA3hU)
- [Evan Miller - A/B Testing](https://www.evanmiller.org/ab-testing/)
- [3Blue1Brown - Central Limit Theorem](https://www.youtube.com/watch?v=zeJD6dqJ5lo)

---

**Fantastic progress!** You now understand how to make statistical inferences from sample data - a critical skill for data science.

**Next**: Proceed to `04_linear_algebra_foundations.ipynb`