# Statistics for ML — FAANG-Level Lab

**Goal:** Confidence intervals, hypothesis testing, and interpretation for ML engineering.

**Outcome:** You can quantify uncertainty and avoid p-value traps.


In [None]:
import numpy as np

def check(name: str, cond: bool):
    if not cond:
        raise AssertionError(f'Failed: {name}')
    print(f'OK: {name}')

rng = np.random.default_rng(0)

## Section 1 — Estimators (Mean/Variance)

### Task 1.1: Unbiased sample variance
Implement sample mean and unbiased sample variance (ddof=1) without calling np.var(..., ddof=1).

# HINT:
- mean = sum(x)/n
- unbiased var = sum((x-mean)^2)/(n-1)

**Explain:** Why divide by (n-1) instead of n?

In [None]:
def sample_mean(x):
    # TODO
    ...

def sample_var_unbiased(x):
    # TODO
    ...

x = rng.standard_normal(1000)
m = sample_mean(x)
v = sample_var_unbiased(x)
check('mean_close', abs(m - x.mean()) < 1e-10)
check('var_close', abs(v - x.var(ddof=1)) < 1e-8)

## Section 2 — Confidence Interval for Mean (Normal approx)

### Task 2.1: 95% CI for mean
Compute a 95% CI for mean using normal approximation:
CI = mean ± z * s/sqrt(n), where z≈1.96.

# HINT:
- Use unbiased sample std

**FAANG gotcha:** CI is about the mean, not individual outcomes.

In [None]:
def mean_ci_normal(x, alpha=0.05):
    # TODO: return (lo, hi)
    ...

x = rng.normal(loc=2.0, scale=3.0, size=500)
lo, hi = mean_ci_normal(x)
print('CI', (lo, hi), 'mean', x.mean())
check('order', lo < x.mean() < hi)

### Task 2.2: Coverage simulation
Simulate repeated sampling from Normal(mu=0, sigma=1).
Estimate how often 95% CI contains true mean.

# HINT:
- Run many trials
- Count coverage

**Explain:** Why isn't coverage exactly 0.95 in finite simulation?

In [None]:
def estimate_ci_coverage(trials=2000, n=50):
    # TODO
    ...

cov = estimate_ci_coverage(trials=2000, n=50)
print('coverage', cov)
check('reasonable', 0.92 < cov < 0.98)

## Section 3 — Hypothesis Testing (Two-sample test intuition)

### Task 3.1: Permutation test for A/B (no scipy)
Given samples A and B, test whether mean(B) - mean(A) is significant via permutation.

# HINT:
- Combine samples
- Shuffle and split
- Compute diff distribution
- p-value = fraction of diffs >= observed (two-sided if needed)

**FAANG gotcha:** p-value is not P(H0 true).

In [None]:
def permutation_pvalue(A, B, n_perm=5000, two_sided=True):
    # TODO
    ...

A = rng.normal(0.0, 1.0, size=200)
B = rng.normal(0.2, 1.0, size=200)
p = permutation_pvalue(A, B, n_perm=2000)
print('p-value', p)
check('p_range', 0 <= p <= 1)

## Section 4 — Multiple Comparisons (Gotcha)

### Task 4.1: Bonferroni correction
If you run m tests at alpha=0.05, Bonferroni uses alpha/m per test.

Compute adjusted alpha for m=20 and explain why this matters in feature slicing / metric dashboards.


In [None]:
m = 20
alpha = 0.05
# TODO
alpha_adj = ...
print('alpha_adj', alpha_adj)
check('alpha_adj', abs(alpha_adj - 0.0025) < 1e-12)

---
## Submission Checklist
- All TODOs completed
- Checks pass
- Explain prompts answered
