# Probability & Statistics for ML

## Key Concepts
- Random variables, expectations, variance
- Common distributions (Gaussian, Bernoulli, Categorical)
- Bayes' rule
- Maximum Likelihood Estimation (MLE)

## References
- Bishop PRML: Chapters 1-2
- FOML: Probability sections

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

## 1. Gaussian Distribution

PDF: $p(x|\mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$

Log-likelihood: $\log p(x|\mu, \sigma^2) = -\frac{1}{2}\log(2\pi\sigma^2) - \frac{(x-\mu)^2}{2\sigma^2}$

In [None]:
# Visualize Gaussian distributions
x = np.linspace(-5, 5, 100)

plt.figure(figsize=(10, 4))
for mu, sigma in [(0, 1), (0, 0.5), (1, 1.5)]:
    y = stats.norm.pdf(x, mu, sigma)
    plt.plot(x, y, label=f'μ={mu}, σ={sigma}')

plt.xlabel('x')
plt.ylabel('p(x)')
plt.title('Gaussian Distributions')
plt.legend()
plt.show()

## 2. Maximum Likelihood Estimation (MLE)

For Gaussian with i.i.d. samples $\{x_1, ..., x_n\}$:

$$\hat{\mu}_{MLE} = \frac{1}{n}\sum_{i=1}^n x_i$$

$$\hat{\sigma}^2_{MLE} = \frac{1}{n}\sum_{i=1}^n (x_i - \hat{\mu})^2$$

In [None]:
def gaussian_mle(X):
    """MLE for Gaussian distribution"""
    mu_mle = np.mean(X)
    sigma2_mle = np.mean((X - mu_mle)**2)
    return mu_mle, sigma2_mle

# Generate samples and estimate
np.random.seed(42)
true_mu, true_sigma = 2.0, 1.5
X = np.random.normal(true_mu, true_sigma, size=1000)

mu_mle, sigma2_mle = gaussian_mle(X)
print(f"True: μ={true_mu}, σ²={true_sigma**2}")
print(f"MLE:  μ={mu_mle:.3f}, σ²={sigma2_mle:.3f}")

## 3. Bayes' Rule

$$P(\theta|D) = \frac{P(D|\theta)P(\theta)}{P(D)}$$

- $P(\theta|D)$: Posterior (what we want)
- $P(D|\theta)$: Likelihood
- $P(\theta)$: Prior
- $P(D)$: Evidence (normalizing constant)

In [None]:
# Bayesian coin flip example
# Prior: Beta(a, b) on probability p of heads
# Likelihood: Binomial
# Posterior: Beta(a + heads, b + tails)

def bayesian_coin_update(prior_a, prior_b, n_heads, n_tails):
    """Update Beta prior with observed coin flips"""
    post_a = prior_a + n_heads
    post_b = prior_b + n_tails
    return post_a, post_b

# Start with uniform prior Beta(1, 1)
prior_a, prior_b = 1, 1

# Observe 7 heads, 3 tails
n_heads, n_tails = 7, 3
post_a, post_b = bayesian_coin_update(prior_a, prior_b, n_heads, n_tails)

p = np.linspace(0, 1, 100)
prior_pdf = stats.beta.pdf(p, prior_a, prior_b)
posterior_pdf = stats.beta.pdf(p, post_a, post_b)

plt.figure(figsize=(10, 4))
plt.plot(p, prior_pdf, label='Prior Beta(1,1)')
plt.plot(p, posterior_pdf, label=f'Posterior Beta({post_a},{post_b})')
plt.axvline(n_heads/(n_heads+n_tails), color='r', linestyle='--', label='MLE')
plt.xlabel('p (probability of heads)')
plt.ylabel('Density')
plt.title('Bayesian Update for Coin Flip')
plt.legend()
plt.show()

## 4. Monte Carlo Estimation

Estimate expectations via sampling:
$$E[f(X)] \approx \frac{1}{n}\sum_{i=1}^n f(x_i) \quad \text{where } x_i \sim p(x)$$

In [None]:
def monte_carlo_expectation(f, samples):
    """Estimate E[f(X)] via Monte Carlo"""
    return np.mean(f(samples))

# Estimate E[X^2] for X ~ N(0, 1) (should be 1)
X = np.random.normal(0, 1, size=10000)
f = lambda x: x**2

estimate = monte_carlo_expectation(f, X)
print(f"Monte Carlo estimate of E[X²]: {estimate:.4f}")
print(f"True value: 1.0")

## Exercises

1. Derive the MLE for Bernoulli distribution
2. Implement MLE for multivariate Gaussian
3. Use Monte Carlo to estimate π (hint: unit circle in unit square)
4. Implement Bayesian linear regression with conjugate prior