# Bayesian Statistics for Machine Learning
## Uncertainty Estimation and Probabilistic Models

Welcome to the **science of reasoning under uncertainty**! Bayesian statistics provides the mathematical framework for incorporating prior knowledge and updating beliefs as new evidence arrives.

### What You'll Master
By the end of this notebook, you'll understand:
1. **Bayes' theorem** - The foundation of probabilistic reasoning
2. **Prior and posterior distributions** - Encoding and updating beliefs
3. **Bayesian inference** - Learning from data with uncertainty
4. **Conjugate priors** - Mathematical convenience in Bayesian analysis
5. **MCMC methods** - Sampling from complex distributions
6. **Bayesian neural networks** - Deep learning with uncertainty

### Why This is Revolutionary
- **Uncertainty quantification** - Know how confident your model is
- **Prior knowledge** - Incorporate domain expertise
- **Small data learning** - Make inferences with limited samples
- **Model selection** - Compare models probabilistically

### Real-World Applications
- **Medical diagnosis**: Updating disease probability with test results
- **A/B testing**: Deciding which variant is better
- **Autonomous vehicles**: Reasoning about uncertain environments
- **Finance**: Portfolio optimization under uncertainty

Let's embrace uncertainty and turn it into knowledge! 🎲

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.stats import norm, beta, gamma, poisson, binom, uniform
from scipy.optimize import minimize_scalar
import pandas as pd
from sklearn.datasets import make_classification, load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, roc_auc_score
import warnings
warnings.filterwarnings('ignore')

# Set style
plt.style.use('seaborn-v0_8')
sns.set_palette("viridis")
np.random.seed(42)

print("🎲 Bayesian Statistics toolkit loaded!")
print("Ready to reason under uncertainty!")

## 1. Bayes' Theorem: The Foundation

### The Most Important Equation in Statistics
```
P(θ|D) = P(D|θ) × P(θ) / P(D)
```

Where:
- **P(θ|D)**: Posterior - What we believe after seeing data
- **P(D|θ)**: Likelihood - How probable the data is given our hypothesis
- **P(θ)**: Prior - What we believed before seeing data
- **P(D)**: Evidence - Total probability of observing the data

### The Bayesian Philosophy
1. **Start with prior beliefs** (even if uninformative)
2. **Observe data** and calculate likelihood
3. **Update beliefs** to get posterior distribution
4. **Quantify uncertainty** throughout the process

### Bayesian vs Frequentist
| Aspect | Bayesian | Frequentist |
|--------|----------|-------------|
| Parameters | Random variables | Fixed constants |
| Probability | Degree of belief | Long-run frequency |
| Inference | Posterior distribution | Point estimates + confidence intervals |
| Prior knowledge | Explicitly incorporated | Implicitly ignored |

### Real-World Analogy
Think of Bayes' theorem as **detective work**:
- **Prior**: Your initial suspicion about the suspect
- **Likelihood**: How well the evidence fits your theory
- **Posterior**: Your updated belief after considering evidence
- **Evidence**: All possible explanations for what you observed

In [None]:
def demonstrate_bayes_theorem():
    """Explore Bayes' theorem with classic examples"""
    
    print("🎯 Bayes' Theorem: Updating Beliefs with Evidence")
    print("=" * 55)
    
    fig, axes = plt.subplots(2, 3, figsize=(18, 12))
    
    # 1. Medical diagnosis example
    print("\n1. Medical Diagnosis: The Base Rate Fallacy")
    print("   Disease prevalence: 1 in 1000 people")
    print("   Test accuracy: 99% (both sensitivity and specificity)")
    print("   Question: If you test positive, what's P(disease)?")
    
    # Prior probability
    p_disease = 0.001
    p_no_disease = 1 - p_disease
    
    # Likelihood
    p_positive_given_disease = 0.99  # Sensitivity
    p_positive_given_no_disease = 0.01  # 1 - Specificity
    
    # Evidence (total probability)
    p_positive = (p_positive_given_disease * p_disease + 
                  p_positive_given_no_disease * p_no_disease)
    
    # Posterior
    p_disease_given_positive = (p_positive_given_disease * p_disease) / p_positive
    
    print(f"   P(Disease) = {p_disease:.3f} (prior)")
    print(f"   P(+|Disease) = {p_positive_given_disease:.3f} (likelihood)")
    print(f"   P(+|No Disease) = {p_positive_given_no_disease:.3f}")
    print(f"   P(+) = {p_positive:.3f} (evidence)")
    print(f"   P(Disease|+) = {p_disease_given_positive:.3f} (posterior)")
    print(f"   Surprising result: Only {p_disease_given_positive*100:.1f}% chance of disease!")
    
    # Visualize the medical test scenario
    categories = ['True Positive\n(Disease & +)', 'False Positive\n(No Disease & +)', 
                  'True Negative\n(No Disease & -)', 'False Negative\n(Disease & -)']
    
    values = [p_disease * p_positive_given_disease,
              p_no_disease * p_positive_given_no_disease,
              p_no_disease * (1 - p_positive_given_no_disease),
              p_disease * (1 - p_positive_given_disease)]
    
    colors = ['green', 'red', 'lightblue', 'orange']
    
    wedges, texts, autotexts = axes[0, 0].pie(values, labels=categories, colors=colors, 
                                             autopct='%1.3f', startangle=90)
    axes[0, 0].set_title('Medical Test Outcomes\n(Per 1000 People)')
    
    # Make text readable
    for autotext in autotexts:
        autotext.set_color('black')
        autotext.set_fontsize(8)
    
    # 2. Coin flip with uncertain fairness
    print("\n2. Coin Fairness: Learning from Flips")
    print("   Prior: Uniform belief about coin bias")
    print("   Data: 7 heads out of 10 flips")
    
    # Beta-Binomial conjugate prior
    # Prior: Beta(1, 1) - uniform
    alpha_prior, beta_prior = 1, 1
    
    # Data
    heads, total_flips = 7, 10
    tails = total_flips - heads
    
    # Posterior: Beta(alpha + heads, beta + tails)
    alpha_posterior = alpha_prior + heads
    beta_posterior = beta_prior + tails
    
    # Plot prior and posterior
    p_values = np.linspace(0, 1, 1000)
    prior_dist = beta.pdf(p_values, alpha_prior, beta_prior)
    posterior_dist = beta.pdf(p_values, alpha_posterior, beta_posterior)
    
    axes[0, 1].plot(p_values, prior_dist, 'b--', label='Prior Beta(1,1)', linewidth=2)
    axes[0, 1].plot(p_values, posterior_dist, 'r-', label=f'Posterior Beta({alpha_posterior},{beta_posterior})', linewidth=2)
    axes[0, 1].axvline(x=0.5, color='gray', linestyle=':', alpha=0.7, label='Fair coin')
    axes[0, 1].axvline(x=heads/total_flips, color='orange', linestyle=':', alpha=0.7, label='Observed rate')
    axes[0, 1].fill_between(p_values, posterior_dist, alpha=0.3, color='red')
    axes[0, 1].set_xlabel('Probability of Heads (θ)')
    axes[0, 1].set_ylabel('Density')
    axes[0, 1].set_title('Bayesian Coin Flip Analysis')
    axes[0, 1].legend()
    axes[0, 1].grid(True, alpha=0.3)
    
    # Calculate credible interval
    credible_interval = beta.interval(0.95, alpha_posterior, beta_posterior)
    posterior_mean = alpha_posterior / (alpha_posterior + beta_posterior)
    
    print(f"   Prior: Uniform over [0, 1]")
    print(f"   Likelihood: Binomial with {heads} heads in {total_flips} flips")
    print(f"   Posterior mean: {posterior_mean:.3f}")
    print(f"   95% credible interval: [{credible_interval[0]:.3f}, {credible_interval[1]:.3f}]")
    
    # 3. Sequential updating
    print("\n3. Sequential Learning: Evidence Accumulation")
    print("   How beliefs evolve as we see more data")
    
    # Simulate sequential coin flips
    true_p = 0.3  # True bias
    n_flips = 50
    flips = np.random.binomial(1, true_p, n_flips)
    
    # Track evolution of posterior
    posterior_means = []
    credible_intervals = []
    
    alpha, beta_param = 1, 1  # Start with uniform prior
    
    for i in range(1, n_flips + 1):
        # Update with each flip
        if flips[i-1] == 1:  # Heads
            alpha += 1
        else:  # Tails
            beta_param += 1
        
        # Calculate posterior statistics
        mean = alpha / (alpha + beta_param)
        interval = beta.interval(0.95, alpha, beta_param)
        
        posterior_means.append(mean)
        credible_intervals.append(interval)
    
    flip_numbers = np.arange(1, n_flips + 1)
    posterior_means = np.array(posterior_means)
    credible_intervals = np.array(credible_intervals)
    
    axes[0, 2].plot(flip_numbers, posterior_means, 'b-', linewidth=2, label='Posterior mean')
    axes[0, 2].fill_between(flip_numbers, credible_intervals[:, 0], credible_intervals[:, 1], 
                           alpha=0.3, color='blue', label='95% credible interval')
    axes[0, 2].axhline(y=true_p, color='red', linestyle='--', label=f'True value ({true_p})')
    axes[0, 2].set_xlabel('Number of Flips')
    axes[0, 2].set_ylabel('Estimated P(Heads)')
    axes[0, 2].set_title('Sequential Bayesian Learning')
    axes[0, 2].legend()
    axes[0, 2].grid(True, alpha=0.3)
    
    print(f"   True bias: {true_p}")
    print(f"   Final estimate: {posterior_means[-1]:.3f}")
    print(f"   Final credible interval: [{credible_intervals[-1, 0]:.3f}, {credible_intervals[-1, 1]:.3f}]")
    
    # 4. Effect of different priors
    print("\n4. Prior Sensitivity Analysis")
    print("   How different priors affect conclusions")
    
    # Same data: 7 heads in 10 flips
    heads, total = 7, 10
    
    priors = {
        'Uniform': (1, 1),
        'Optimistic': (2, 1),  # Expect more heads
        'Pessimistic': (1, 2),  # Expect more tails
        'Strong Fair': (10, 10)  # Strong belief in fairness
    }
    
    p_range = np.linspace(0, 1, 1000)
    colors = ['blue', 'green', 'red', 'purple']
    
    for i, (name, (a_prior, b_prior)) in enumerate(priors.items()):
        # Posterior parameters
        a_post = a_prior + heads
        b_post = b_prior + (total - heads)
        
        # Plot posterior
        posterior = beta.pdf(p_range, a_post, b_post)
        axes[1, 0].plot(p_range, posterior, color=colors[i], linewidth=2, label=f'{name} Prior')
        
        # Calculate statistics
        post_mean = a_post / (a_post + b_post)
        post_interval = beta.interval(0.95, a_post, b_post)
        
        print(f"   {name}: Mean = {post_mean:.3f}, CI = [{post_interval[0]:.3f}, {post_interval[1]:.3f}]")
    
    axes[1, 0].axvline(x=0.5, color='gray', linestyle=':', alpha=0.7, label='Fair coin')
    axes[1, 0].axvline(x=0.7, color='orange', linestyle=':', alpha=0.7, label='Observed rate')
    axes[1, 0].set_xlabel('Probability of Heads')
    axes[1, 0].set_ylabel('Posterior Density')
    axes[1, 0].set_title('Effect of Different Priors')
    axes[1, 0].legend()
    axes[1, 0].grid(True, alpha=0.3)
    
    # 5. Bayesian A/B testing
    print("\n5. Bayesian A/B Testing")
    print("   Which variant is better?")
    
    # A/B test data
    visitors_A, conversions_A = 1000, 120
    visitors_B, conversions_B = 1100, 140
    
    # Prior for conversion rates (uniform)
    alpha_prior, beta_prior = 1, 1
    
    # Posteriors
    alpha_A = alpha_prior + conversions_A
    beta_A = beta_prior + (visitors_A - conversions_A)
    
    alpha_B = alpha_prior + conversions_B
    beta_B = beta_prior + (visitors_B - conversions_B)
    
    # Sample from posteriors
    n_samples = 100000
    samples_A = beta.rvs(alpha_A, beta_A, size=n_samples)
    samples_B = beta.rvs(alpha_B, beta_B, size=n_samples)
    
    # Probability that B > A
    prob_B_better = np.mean(samples_B > samples_A)
    
    # Plot posterior distributions
    conv_range = np.linspace(0.08, 0.18, 1000)
    posterior_A = beta.pdf(conv_range, alpha_A, beta_A)
    posterior_B = beta.pdf(conv_range, alpha_B, beta_B)
    
    axes[1, 1].plot(conv_range, posterior_A, 'b-', linewidth=2, label=f'Variant A ({conversions_A}/{visitors_A})')
    axes[1, 1].plot(conv_range, posterior_B, 'r-', linewidth=2, label=f'Variant B ({conversions_B}/{visitors_B})')
    axes[1, 1].fill_between(conv_range, posterior_A, alpha=0.3, color='blue')
    axes[1, 1].fill_between(conv_range, posterior_B, alpha=0.3, color='red')
    axes[1, 1].set_xlabel('Conversion Rate')
    axes[1, 1].set_ylabel('Posterior Density')
    axes[1, 1].set_title('Bayesian A/B Test Comparison')
    axes[1, 1].legend()
    axes[1, 1].grid(True, alpha=0.3)
    
    print(f"   Variant A: {conversions_A}/{visitors_A} = {conversions_A/visitors_A:.3f}")
    print(f"   Variant B: {conversions_B}/{visitors_B} = {conversions_B/visitors_B:.3f}")
    print(f"   P(B better than A) = {prob_B_better:.3f}")
    
    if prob_B_better > 0.95:
        print(f"   Decision: Strong evidence for B")
    elif prob_B_better > 0.9:
        print(f"   Decision: Moderate evidence for B")
    elif prob_B_better < 0.1:
        print(f"   Decision: Strong evidence for A")
    elif prob_B_better < 0.05:
        print(f"   Decision: Moderate evidence for A")
    else:
        print(f"   Decision: Inconclusive - need more data")
    
    # 6. Bayesian model comparison
    print("\n6. Bayesian Model Comparison")
    print("   Using Bayes factors to compare models")
    
    # Generate some data that clearly follows a trend
    np.random.seed(42)
    x = np.linspace(0, 10, 20)
    y_true = 2 * x + 1  # Linear trend
    y_observed = y_true + np.random.normal(0, 1, len(x))  # Add noise
    
    # Model comparison: constant vs linear
    # Constant model: y = c
    constant_pred = np.mean(y_observed)
    constant_mse = np.mean((y_observed - constant_pred)**2)
    
    # Linear model: y = ax + b
    coeffs = np.polyfit(x, y_observed, 1)
    linear_pred = coeffs[0] * x + coeffs[1]
    linear_mse = np.mean((y_observed - linear_pred)**2)
    
    # Plot data and models
    axes[1, 2].scatter(x, y_observed, alpha=0.7, color='black', label='Observed data')
    axes[1, 2].axhline(y=constant_pred, color='red', linestyle='--', linewidth=2, label=f'Constant model (MSE={constant_mse:.2f})')
    axes[1, 2].plot(x, linear_pred, color='blue', linewidth=2, label=f'Linear model (MSE={linear_mse:.2f})')
    axes[1, 2].plot(x, y_true, color='green', linestyle=':', linewidth=2, label='True relationship')
    axes[1, 2].set_xlabel('x')
    axes[1, 2].set_ylabel('y')
    axes[1, 2].set_title('Bayesian Model Comparison')
    axes[1, 2].legend()
    axes[1, 2].grid(True, alpha=0.3)
    
    # Simple BIC approximation to Bayes factor
    n = len(x)
    k_constant = 1  # Number of parameters
    k_linear = 2
    
    bic_constant = n * np.log(constant_mse) + k_constant * np.log(n)
    bic_linear = n * np.log(linear_mse) + k_linear * np.log(n)
    
    bayes_factor_approx = np.exp((bic_constant - bic_linear) / 2)
    
    print(f"   Constant model MSE: {constant_mse:.3f}")
    print(f"   Linear model MSE: {linear_mse:.3f}")
    print(f"   Approximate Bayes factor (Linear/Constant): {bayes_factor_approx:.2f}")
    
    if bayes_factor_approx > 10:
        print(f"   Decision: Strong evidence for linear model")
    elif bayes_factor_approx > 3:
        print(f"   Decision: Moderate evidence for linear model")
    else:
        print(f"   Decision: Weak evidence for model preference")
    
    plt.tight_layout()
    plt.show()
    
    print("\n🎯 Key Bayesian Concepts:")
    print("• Prior beliefs get updated with evidence via Bayes' theorem")
    print("• Uncertainty is quantified through probability distributions")
    print("• Conjugate priors make calculations analytically tractable")
    print("• Sequential learning: beliefs evolve as more data arrives")
    print("• Model comparison uses Bayes factors, not just point estimates")

demonstrate_bayes_theorem()