# Week 20: Bayesian Methods for Quantitative Trading

## Comprehensive Theory and Implementation Guide

---

### üìö **Learning Objectives**

By the end of this module, you will understand:

1. **Bayesian Inference Fundamentals** - Bayes' theorem, priors, likelihoods, and posteriors
2. **Conjugate Priors** - Efficient closed-form posterior computation
3. **MCMC Methods** - Sampling from complex posterior distributions
4. **Bayesian Regression** - Uncertainty quantification in predictions
5. **Portfolio Optimization** - Black-Litterman and robust allocation
6. **Risk Management** - Bayesian VaR and tail risk estimation

---

### üéØ **Why Bayesian Methods in Quantitative Finance?**

| Advantage | Description |
|-----------|-------------|
| **Uncertainty Quantification** | Full probability distributions, not just point estimates |
| **Incorporating Prior Knowledge** | Systematic way to include domain expertise |
| **Small Sample Robustness** | Better performance with limited data |
| **Sequential Learning** | Natural framework for updating beliefs |
| **Regularization** | Priors act as regularizers preventing overfitting |
| **Model Comparison** | Principled approach via Bayes factors |

---

### üìñ **Table of Contents**

1. Import Required Libraries
2. Bayes' Theorem Fundamentals
3. Prior Distributions for Trading Parameters
4. Likelihood Functions for Market Data
5. Posterior Distribution Computation
6. Conjugate Priors in Finance
7. Bayesian Parameter Estimation for Returns
8. Bayesian Updating with New Market Data
9. Credible Intervals vs Confidence Intervals
10. Bayesian Linear Regression for Price Prediction
11. Markov Chain Monte Carlo (MCMC) Sampling
12. Bayesian Model Comparison and Selection
13. Bayesian Volatility Estimation
14. Bayesian Portfolio Optimization (Black-Litterman)
15. Bayesian Risk Management

---

## 1. Import Required Libraries

We'll use several key libraries for Bayesian analysis:
- **NumPy/SciPy**: Numerical computing and statistical distributions
- **PyMC**: Probabilistic programming for Bayesian inference
- **ArviZ**: Visualization and diagnostics for Bayesian models
- **Matplotlib/Seaborn**: Visualization

In [None]:
# Core libraries
import numpy as np
import pandas as pd
from scipy import stats
from scipy.special import gamma as gamma_func, beta as beta_func
from scipy.optimize import minimize
import warnings
warnings.filterwarnings('ignore')

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 11

# Set random seed for reproducibility
np.random.seed(42)

print("=" * 60)
print("WEEK 20: BAYESIAN METHODS FOR QUANTITATIVE TRADING")
print("=" * 60)
print("\n‚úì Libraries imported successfully")
print("\nNote: PyMC and ArviZ will be imported when needed for MCMC sections")

---

## 2. Bayes' Theorem Fundamentals

### üìê **The Foundation of Bayesian Inference**

Bayes' theorem provides the mathematical framework for updating beliefs:

$$P(\theta | D) = \frac{P(D | \theta) \cdot P(\theta)}{P(D)}$$

Where:
- $P(\theta | D)$ = **Posterior**: Updated belief about parameters after seeing data
- $P(D | \theta)$ = **Likelihood**: Probability of data given parameters
- $P(\theta)$ = **Prior**: Initial belief about parameters before seeing data
- $P(D)$ = **Evidence/Marginal Likelihood**: Normalizing constant

### üîë **Key Insight for Trading**

> "The posterior is a weighted average of prior beliefs and observed evidence"

$$\text{Posterior} \propto \text{Likelihood} \times \text{Prior}$$

### **Example: Trading Signal Classification**

Consider a trading signal that predicts market direction. We want to estimate its true accuracy.

In [None]:
def bayes_theorem_demo():
    """
    Demonstrate Bayes' theorem with a trading signal example.
    
    Scenario: You have a trading signal. You observe it making predictions
    and want to estimate its true accuracy (probability of correct prediction).
    """
    print("=" * 60)
    print("BAYES' THEOREM: TRADING SIGNAL ACCURACY ESTIMATION")
    print("=" * 60)
    
    # Grid of possible accuracy values (theta)
    theta = np.linspace(0.001, 0.999, 1000)
    
    # PRIOR: Our initial belief about signal accuracy
    # We'll use a Beta(2, 2) prior - mild belief that accuracy is around 50%
    prior_alpha, prior_beta = 2, 2
    prior = stats.beta.pdf(theta, prior_alpha, prior_beta)
    
    # OBSERVED DATA: Signal made 60 predictions, 42 were correct
    n_predictions = 60
    n_correct = 42
    
    # LIKELIHOOD: Binomial probability
    # P(42 correct out of 60 | accuracy = theta)
    likelihood = stats.binom.pmf(n_correct, n_predictions, theta)
    
    # POSTERIOR: Unnormalized posterior = likelihood √ó prior
    posterior_unnorm = likelihood * prior
    
    # Normalize to get proper probability distribution
    posterior = posterior_unnorm / np.trapz(posterior_unnorm, theta)
    
    # For Beta-Binomial conjugate pair, we have closed form:
    # Posterior is Beta(alpha + successes, beta + failures)
    post_alpha = prior_alpha + n_correct
    post_beta = prior_beta + (n_predictions - n_correct)
    posterior_exact = stats.beta.pdf(theta, post_alpha, post_beta)
    
    # Visualization
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    
    # Prior
    axes[0].fill_between(theta, prior, alpha=0.3, color='blue')
    axes[0].plot(theta, prior, 'b-', linewidth=2, label=f'Beta({prior_alpha}, {prior_beta})')
    axes[0].axvline(0.5, color='red', linestyle='--', alpha=0.7, label='Prior mean')
    axes[0].set_xlabel('Signal Accuracy (Œ∏)', fontsize=12)
    axes[0].set_ylabel('Probability Density', fontsize=12)
    axes[0].set_title('PRIOR Distribution\n(Before seeing data)', fontsize=14)
    axes[0].legend()
    axes[0].set_xlim(0, 1)
    
    # Likelihood
    axes[1].fill_between(theta, likelihood / np.max(likelihood), alpha=0.3, color='green')
    axes[1].plot(theta, likelihood / np.max(likelihood), 'g-', linewidth=2)
    axes[1].axvline(n_correct / n_predictions, color='red', linestyle='--', alpha=0.7, 
                    label=f'MLE = {n_correct/n_predictions:.3f}')
    axes[1].set_xlabel('Signal Accuracy (Œ∏)', fontsize=12)
    axes[1].set_ylabel('Likelihood (scaled)', fontsize=12)
    axes[1].set_title(f'LIKELIHOOD\n({n_correct}/{n_predictions} correct predictions)', fontsize=14)
    axes[1].legend()
    axes[1].set_xlim(0, 1)
    
    # Posterior
    axes[2].fill_between(theta, posterior_exact, alpha=0.3, color='purple')
    axes[2].plot(theta, posterior_exact, 'purple', linewidth=2, 
                 label=f'Beta({post_alpha}, {post_beta})')
    posterior_mean = post_alpha / (post_alpha + post_beta)
    axes[2].axvline(posterior_mean, color='red', linestyle='--', alpha=0.7,
                    label=f'Posterior mean = {posterior_mean:.3f}')
    axes[2].set_xlabel('Signal Accuracy (Œ∏)', fontsize=12)
    axes[2].set_ylabel('Probability Density', fontsize=12)
    axes[2].set_title('POSTERIOR Distribution\n(After seeing data)', fontsize=14)
    axes[2].legend()
    axes[2].set_xlim(0, 1)
    
    plt.tight_layout()
    plt.show()
    
    # Summary statistics
    print("\nüìä SUMMARY STATISTICS")
    print("-" * 40)
    print(f"Prior: Beta({prior_alpha}, {prior_beta})")
    print(f"  - Prior mean: {prior_alpha/(prior_alpha+prior_beta):.3f}")
    print(f"  - Prior std:  {np.sqrt(prior_alpha*prior_beta/((prior_alpha+prior_beta)**2*(prior_alpha+prior_beta+1))):.3f}")
    print(f"\nData observed: {n_correct}/{n_predictions} correct")
    print(f"  - MLE estimate: {n_correct/n_predictions:.3f}")
    print(f"\nPosterior: Beta({post_alpha}, {post_beta})")
    print(f"  - Posterior mean: {posterior_mean:.3f}")
    post_std = np.sqrt(post_alpha*post_beta/((post_alpha+post_beta)**2*(post_alpha+post_beta+1)))
    print(f"  - Posterior std:  {post_std:.3f}")
    
    # 95% Credible Interval
    ci_low = stats.beta.ppf(0.025, post_alpha, post_beta)
    ci_high = stats.beta.ppf(0.975, post_alpha, post_beta)
    print(f"\n95% Credible Interval: [{ci_low:.3f}, {ci_high:.3f}]")
    print(f"Probability signal accuracy > 60%: {1 - stats.beta.cdf(0.6, post_alpha, post_beta):.3f}")

bayes_theorem_demo()

### üìä **Bayesian vs Frequentist Comparison**

| Aspect | Frequentist | Bayesian |
|--------|-------------|----------|
| **Parameters** | Fixed, unknown constants | Random variables with distributions |
| **Probability** | Long-run frequency | Degree of belief |
| **Inference** | Point estimate + confidence interval | Full posterior distribution |
| **Prior information** | Ignored (or implicit) | Explicitly incorporated |
| **Small samples** | Often unreliable | More robust via regularization |
| **Interpretation** | "95% of such intervals contain true value" | "95% probability parameter is in interval" |

---

## 3. Prior Distributions for Trading Parameters

### üìà **Choosing Priors for Financial Parameters**

The choice of prior distribution encodes our beliefs before seeing data. In finance, we often have domain knowledge that should inform our priors.

### **Types of Priors**

| Prior Type | Description | Use Case |
|------------|-------------|----------|
| **Non-informative** | Minimal assumptions (uniform, Jeffreys) | Little domain knowledge |
| **Weakly informative** | Gentle regularization | Most common in practice |
| **Informative** | Strong domain knowledge | Expert elicitation, previous studies |
| **Skeptical** | Centered at null hypothesis | Testing "significant" effects |

### **Common Priors for Trading Parameters**

| Parameter | Prior | Justification |
|-----------|-------|---------------|
| Expected return Œº | Normal(0, œÉ) | Returns are roughly symmetric around zero |
| Volatility œÉ | Half-Normal, InverseGamma | Must be positive |
| Sharpe ratio | Normal(0, 1) | Most strategies cluster around 0-2 |
| Win rate | Beta(Œ±, Œ≤) | Bounded between 0 and 1 |
| Correlation œÅ | Uniform(-1, 1) or Beta-transformed | Bounded |

In [None]:
def visualize_trading_priors():
    """
    Visualize common prior distributions used for trading parameters.
    """
    print("=" * 60)
    print("PRIOR DISTRIBUTIONS FOR TRADING PARAMETERS")
    print("=" * 60)
    
    fig, axes = plt.subplots(2, 3, figsize=(15, 10))
    
    # 1. Prior for Expected Return (annualized)
    ax = axes[0, 0]
    mu_values = np.linspace(-0.5, 0.5, 1000)
    
    # Different prior choices
    prior_uninformative = stats.uniform.pdf(mu_values, -0.5, 1)
    prior_skeptical = stats.norm.pdf(mu_values, 0, 0.05)  # Skeptical - centered at 0
    prior_informative = stats.norm.pdf(mu_values, 0.07, 0.10)  # Historical equity premium
    
    ax.plot(mu_values, prior_uninformative, 'b-', label='Uniform (non-informative)', alpha=0.7)
    ax.plot(mu_values, prior_skeptical, 'r-', label='N(0, 0.05) - Skeptical', linewidth=2)
    ax.plot(mu_values, prior_informative, 'g-', label='N(0.07, 0.10) - Informative', linewidth=2)
    ax.set_xlabel('Expected Annual Return (Œº)', fontsize=11)
    ax.set_ylabel('Density', fontsize=11)
    ax.set_title('Priors for Expected Return', fontsize=12, fontweight='bold')
    ax.legend(fontsize=9)
    ax.axvline(0, color='gray', linestyle='--', alpha=0.5)
    
    # 2. Prior for Volatility
    ax = axes[0, 1]
    sigma_values = np.linspace(0.001, 0.6, 1000)
    
    prior_halfnormal = stats.halfnorm.pdf(sigma_values, scale=0.2)
    prior_invgamma = stats.invgamma.pdf(sigma_values, a=3, scale=0.3)
    prior_lognormal = stats.lognorm.pdf(sigma_values, s=0.5, scale=0.15)
    
    ax.plot(sigma_values, prior_halfnormal, 'b-', label='Half-Normal(0.2)', linewidth=2)
    ax.plot(sigma_values, prior_invgamma, 'r-', label='InvGamma(3, 0.3)', linewidth=2)
    ax.plot(sigma_values, prior_lognormal, 'g-', label='LogNormal(s=0.5)', linewidth=2)
    ax.set_xlabel('Annual Volatility (œÉ)', fontsize=11)
    ax.set_ylabel('Density', fontsize=11)
    ax.set_title('Priors for Volatility', fontsize=12, fontweight='bold')
    ax.legend(fontsize=9)
    ax.axvline(0.2, color='gray', linestyle='--', alpha=0.5, label='Typical equity vol')
    
    # 3. Prior for Sharpe Ratio
    ax = axes[0, 2]
    sr_values = np.linspace(-2, 4, 1000)
    
    prior_sr_skeptical = stats.norm.pdf(sr_values, 0, 0.5)
    prior_sr_moderate = stats.norm.pdf(sr_values, 0.5, 1.0)
    prior_sr_optimistic = stats.norm.pdf(sr_values, 1.0, 0.5)
    
    ax.plot(sr_values, prior_sr_skeptical, 'r-', label='N(0, 0.5) - Skeptical', linewidth=2)
    ax.plot(sr_values, prior_sr_moderate, 'b-', label='N(0.5, 1.0) - Moderate', linewidth=2)
    ax.plot(sr_values, prior_sr_optimistic, 'g-', label='N(1.0, 0.5) - Optimistic', linewidth=2)
    ax.set_xlabel('Sharpe Ratio', fontsize=11)
    ax.set_ylabel('Density', fontsize=11)
    ax.set_title('Priors for Sharpe Ratio', fontsize=12, fontweight='bold')
    ax.legend(fontsize=9)
    ax.axvline(0.5, color='gray', linestyle='--', alpha=0.5)
    
    # 4. Prior for Win Rate (probability of profitable trade)
    ax = axes[1, 0]
    p_values = np.linspace(0.001, 0.999, 1000)
    
    prior_uniform = stats.beta.pdf(p_values, 1, 1)
    prior_symmetric = stats.beta.pdf(p_values, 5, 5)  # Centered at 0.5
    prior_skilled = stats.beta.pdf(p_values, 6, 4)  # Slight edge
    
    ax.plot(p_values, prior_uniform, 'b-', label='Beta(1,1) - Uniform', linewidth=2)
    ax.plot(p_values, prior_symmetric, 'r-', label='Beta(5,5) - 50% prior', linewidth=2)
    ax.plot(p_values, prior_skilled, 'g-', label='Beta(6,4) - 60% prior', linewidth=2)
    ax.set_xlabel('Win Rate', fontsize=11)
    ax.set_ylabel('Density', fontsize=11)
    ax.set_title('Priors for Trade Win Rate', fontsize=12, fontweight='bold')
    ax.legend(fontsize=9)
    ax.axvline(0.5, color='gray', linestyle='--', alpha=0.5)
    
    # 5. Prior for Correlation
    ax = axes[1, 1]
    rho_values = np.linspace(-0.99, 0.99, 1000)
    
    # Transform to use beta distribution on [-1, 1]
    def beta_on_correlation(rho, a, b):
        # Transform rho from [-1,1] to [0,1]
        x = (rho + 1) / 2
        return stats.beta.pdf(x, a, b) / 2  # Divide by 2 for Jacobian
    
    prior_uniform_rho = np.ones_like(rho_values) * 0.5
    prior_near_zero = beta_on_correlation(rho_values, 5, 5)
    prior_positive = beta_on_correlation(rho_values, 6, 4)
    
    ax.plot(rho_values, prior_uniform_rho, 'b-', label='Uniform(-1, 1)', linewidth=2)
    ax.plot(rho_values, prior_near_zero, 'r-', label='Centered at 0', linewidth=2)
    ax.plot(rho_values, prior_positive, 'g-', label='Slightly positive', linewidth=2)
    ax.set_xlabel('Correlation (œÅ)', fontsize=11)
    ax.set_ylabel('Density', fontsize=11)
    ax.set_title('Priors for Correlation', fontsize=12, fontweight='bold')
    ax.legend(fontsize=9)
    ax.axvline(0, color='gray', linestyle='--', alpha=0.5)
    
    # 6. Prior for Maximum Drawdown (as fraction)
    ax = axes[1, 2]
    dd_values = np.linspace(0.01, 0.8, 1000)
    
    prior_dd_mild = stats.beta.pdf(dd_values, 2, 5)  # Expect small drawdowns
    prior_dd_moderate = stats.beta.pdf(dd_values, 2, 3)  # Moderate drawdowns
    prior_dd_heavy = stats.beta.pdf(dd_values, 3, 2)  # Expect larger drawdowns
    
    ax.plot(dd_values, prior_dd_mild, 'g-', label='Beta(2,5) - Mild', linewidth=2)
    ax.plot(dd_values, prior_dd_moderate, 'b-', label='Beta(2,3) - Moderate', linewidth=2)
    ax.plot(dd_values, prior_dd_heavy, 'r-', label='Beta(3,2) - Heavy', linewidth=2)
    ax.set_xlabel('Maximum Drawdown', fontsize=11)
    ax.set_ylabel('Density', fontsize=11)
    ax.set_title('Priors for Maximum Drawdown', fontsize=12, fontweight='bold')
    ax.legend(fontsize=9)
    ax.axvline(0.2, color='gray', linestyle='--', alpha=0.5)
    
    plt.tight_layout()
    plt.show()
    
    # Print recommendations
    print("\nüìã PRIOR SELECTION GUIDELINES FOR TRADING")
    print("-" * 50)
    print("""
    1. EXPECTED RETURNS:
       - Skeptical prior N(0, small œÉ) prevents overfitting
       - Historical equity premium ~7% can inform prior
       - For alpha research, center at 0 (market efficiency prior)
    
    2. VOLATILITY:
       - Must be positive: use Half-Normal, InverseGamma
       - Typical equity volatility ~15-25% annually
       - Crypto: prior allowing 50%+ volatility
    
    3. SHARPE RATIO:
       - Most real strategies: 0 to 2
       - Skeptical prior N(0, 1) is common
       - Sharpe > 2 sustained is rare in practice
    
    4. WIN RATE:
       - Random: 50%, so Beta(5,5) is reasonable
       - Skilled trader: Beta(6,4) gives slight edge
       - Avoid uniform - it's too weak
    
    5. CORRELATIONS:
       - Use LKJ prior for correlation matrices
       - Individual correlations: center near 0
       - Market regime affects correlations
    """)

visualize_trading_priors()

---

## 4. Likelihood Functions for Market Data

### üìä **The Data Generating Process**

The likelihood function $P(D|\theta)$ describes how data is generated given parameters. For financial returns, common choices include:

### **Distribution Models for Returns**

| Distribution | PDF | Use Case |
|--------------|-----|----------|
| **Gaussian** | $\frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(r-\mu)^2}{2\sigma^2}}$ | First approximation, tractable |
| **Student-t** | $\frac{\Gamma(\frac{\nu+1}{2})}{\sqrt{\nu\pi}\Gamma(\frac{\nu}{2})}(1+\frac{x^2}{\nu})^{-\frac{\nu+1}{2}}$ | Fat tails, better fit |
| **Mixture** | $\sum_k \pi_k \cdot f_k(r)$ | Regime switching |
| **Skew-Normal** | $2\phi(x)\Phi(\alpha x)$ | Asymmetric returns |

### **Log-Likelihood for Numerical Stability**

We work with log-likelihoods to avoid numerical underflow:

$$\log L(\theta | D) = \sum_{i=1}^{n} \log P(r_i | \theta)$$

In [None]:
def likelihood_functions_demo():
    """
    Compare different likelihood functions for modeling financial returns.
    """
    print("=" * 60)
    print("LIKELIHOOD FUNCTIONS FOR FINANCIAL RETURNS")
    print("=" * 60)
    
    # Generate synthetic returns with fat tails
    np.random.seed(42)
    n_obs = 500
    
    # Mix of normal returns with occasional extreme moves
    normal_returns = np.random.normal(0.0005, 0.015, n_obs)
    extreme_idx = np.random.choice(n_obs, size=int(n_obs * 0.05), replace=False)
    returns = normal_returns.copy()
    returns[extreme_idx] = np.random.normal(0, 0.05, len(extreme_idx))
    
    # Define likelihood functions
    class LikelihoodFunctions:
        @staticmethod
        def gaussian_loglik(params, data):
            """Gaussian log-likelihood"""
            mu, sigma = params
            if sigma <= 0:
                return -np.inf
            return np.sum(stats.norm.logpdf(data, mu, sigma))
        
        @staticmethod
        def student_t_loglik(params, data):
            """Student-t log-likelihood"""
            mu, sigma, nu = params
            if sigma <= 0 or nu <= 2:
                return -np.inf
            return np.sum(stats.t.logpdf(data, df=nu, loc=mu, scale=sigma))
        
        @staticmethod
        def mixture_loglik(params, data):
            """Mixture of two Gaussians log-likelihood"""
            mu1, sigma1, mu2, sigma2, pi = params
            if sigma1 <= 0 or sigma2 <= 0 or not (0 < pi < 1):
                return -np.inf
            
            # Log-sum-exp trick for numerical stability
            log_p1 = np.log(pi) + stats.norm.logpdf(data, mu1, sigma1)
            log_p2 = np.log(1 - pi) + stats.norm.logpdf(data, mu2, sigma2)
            
            # log(exp(a) + exp(b)) = a + log(1 + exp(b-a))
            max_log = np.maximum(log_p1, log_p2)
            log_sum = max_log + np.log(np.exp(log_p1 - max_log) + np.exp(log_p2 - max_log))
            return np.sum(log_sum)
    
    # Fit models using MLE
    from scipy.optimize import minimize
    
    # Gaussian fit
    def neg_gaussian_ll(params):
        return -LikelihoodFunctions.gaussian_loglik(params, returns)
    
    gauss_result = minimize(neg_gaussian_ll, [0, 0.02], method='L-BFGS-B',
                           bounds=[(-0.1, 0.1), (0.001, 0.1)])
    
    # Student-t fit
    def neg_student_ll(params):
        return -LikelihoodFunctions.student_t_loglik(params, returns)
    
    t_result = minimize(neg_student_ll, [0, 0.015, 5], method='L-BFGS-B',
                       bounds=[(-0.1, 0.1), (0.001, 0.1), (2.1, 30)])
    
    # Mixture fit
    def neg_mixture_ll(params):
        return -LikelihoodFunctions.mixture_loglik(params, returns)
    
    mix_result = minimize(neg_mixture_ll, [0, 0.01, 0, 0.04, 0.9], method='L-BFGS-B',
                         bounds=[(-0.1, 0.1), (0.001, 0.1), (-0.1, 0.1), (0.001, 0.1), (0.01, 0.99)])
    
    # Visualization
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    
    # Histogram of returns
    ax = axes[0]
    x_range = np.linspace(returns.min() - 0.01, returns.max() + 0.01, 200)
    
    ax.hist(returns, bins=50, density=True, alpha=0.6, color='gray', label='Data')
    
    # Fitted distributions
    mu_g, sig_g = gauss_result.x
    ax.plot(x_range, stats.norm.pdf(x_range, mu_g, sig_g), 
            'b-', linewidth=2, label=f'Gaussian (Œº={mu_g:.4f}, œÉ={sig_g:.4f})')
    
    mu_t, sig_t, nu_t = t_result.x
    ax.plot(x_range, stats.t.pdf(x_range, df=nu_t, loc=mu_t, scale=sig_t),
            'r-', linewidth=2, label=f'Student-t (ŒΩ={nu_t:.1f})')
    
    mu1, sig1, mu2, sig2, pi = mix_result.x
    mixture_pdf = pi * stats.norm.pdf(x_range, mu1, sig1) + (1-pi) * stats.norm.pdf(x_range, mu2, sig2)
    ax.plot(x_range, mixture_pdf, 'g-', linewidth=2, label=f'Mixture (œÄ={pi:.2f})')
    
    ax.set_xlabel('Return', fontsize=12)
    ax.set_ylabel('Density', fontsize=12)
    ax.set_title('Fitted Distributions', fontsize=14, fontweight='bold')
    ax.legend(fontsize=9)
    
    # QQ plots
    ax = axes[1]
    
    # Gaussian QQ
    sorted_returns = np.sort(returns)
    theoretical_quantiles = stats.norm.ppf(np.linspace(0.01, 0.99, len(returns)))
    
    ax.scatter(theoretical_quantiles, sorted_returns, alpha=0.5, s=10, label='Data')
    ax.plot([theoretical_quantiles.min(), theoretical_quantiles.max()],
            [theoretical_quantiles.min() * sig_g + mu_g, theoretical_quantiles.max() * sig_g + mu_g],
            'r--', linewidth=2, label='Gaussian fit')
    ax.set_xlabel('Theoretical Quantiles (Normal)', fontsize=12)
    ax.set_ylabel('Sample Quantiles', fontsize=12)
    ax.set_title('Q-Q Plot: Gaussian', fontsize=14, fontweight='bold')
    ax.legend()
    
    # Tail comparison
    ax = axes[2]
    
    # Empirical tail probabilities
    tail_thresholds = np.linspace(2, 4, 20) * returns.std()
    empirical_left = [np.mean(returns < -t) for t in tail_thresholds]
    empirical_right = [np.mean(returns > t) for t in tail_thresholds]
    
    gaussian_left = [stats.norm.cdf(-t, mu_g, sig_g) for t in tail_thresholds]
    gaussian_right = [1 - stats.norm.cdf(t, mu_g, sig_g) for t in tail_thresholds]
    
    student_left = [stats.t.cdf(-t, df=nu_t, loc=mu_t, scale=sig_t) for t in tail_thresholds]
    student_right = [1 - stats.t.cdf(t, df=nu_t, loc=mu_t, scale=sig_t) for t in tail_thresholds]
    
    ax.semilogy(tail_thresholds / returns.std(), empirical_left, 'ko', markersize=8, label='Empirical left tail')
    ax.semilogy(tail_thresholds / returns.std(), gaussian_left, 'b-', linewidth=2, label='Gaussian')
    ax.semilogy(tail_thresholds / returns.std(), student_left, 'r-', linewidth=2, label='Student-t')
    ax.set_xlabel('Standard Deviations from Mean', fontsize=12)
    ax.set_ylabel('Tail Probability (log scale)', fontsize=12)
    ax.set_title('Tail Behavior Comparison', fontsize=14, fontweight='bold')
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Model comparison
    print("\nüìä MODEL COMPARISON (Log-Likelihood)")
    print("-" * 50)
    ll_gauss = LikelihoodFunctions.gaussian_loglik(gauss_result.x, returns)
    ll_t = LikelihoodFunctions.student_t_loglik(t_result.x, returns)
    ll_mix = LikelihoodFunctions.mixture_loglik(mix_result.x, returns)
    
    n = len(returns)
    aic_gauss = 2 * 2 - 2 * ll_gauss
    aic_t = 2 * 3 - 2 * ll_t
    aic_mix = 2 * 5 - 2 * ll_mix
    
    bic_gauss = np.log(n) * 2 - 2 * ll_gauss
    bic_t = np.log(n) * 3 - 2 * ll_t
    bic_mix = np.log(n) * 5 - 2 * ll_mix
    
    print(f"{'Model':<20} {'Log-Lik':>12} {'AIC':>12} {'BIC':>12}")
    print("-" * 60)
    print(f"{'Gaussian':<20} {ll_gauss:>12.2f} {aic_gauss:>12.2f} {bic_gauss:>12.2f}")
    print(f"{'Student-t':<20} {ll_t:>12.2f} {aic_t:>12.2f} {bic_t:>12.2f}")
    print(f"{'Mixture':<20} {ll_mix:>12.2f} {aic_mix:>12.2f} {bic_mix:>12.2f}")
    
    print("\nüéØ KEY INSIGHT:")
    print("Student-t and mixture models better capture fat tails in returns")
    print("This matters for risk management (VaR, CVaR) and option pricing")

likelihood_functions_demo()

---

## 5. Posterior Distribution Computation

### üìê **Computing Posteriors**

For simple models, we can compute posteriors:
1. **Analytically** - Using conjugate priors (exact)
2. **Grid Approximation** - Discretize parameter space
3. **MCMC Sampling** - For complex models (Section 11)

### **Grid Approximation Algorithm**

```
1. Define grid of parameter values: Œ∏‚ÇÅ, Œ∏‚ÇÇ, ..., Œ∏‚Çñ
2. Compute prior P(Œ∏·µ¢) at each grid point
3. Compute likelihood P(D|Œ∏·µ¢) at each grid point
4. Multiply: P(Œ∏·µ¢|D) ‚àù P(D|Œ∏·µ¢) √ó P(Œ∏·µ¢)
5. Normalize: P(Œ∏·µ¢|D) = P(Œ∏·µ¢|D) / Œ£‚±º P(Œ∏‚±º|D)
```

In [None]:
def posterior_grid_approximation():
    """
    Demonstrate grid approximation for posterior computation.
    Example: Estimating mean return with known variance.
    """
    print("=" * 60)
    print("GRID APPROXIMATION FOR POSTERIOR COMPUTATION")
    print("=" * 60)
    
    # Simulate some return data
    np.random.seed(42)
    true_mu = 0.0008  # True daily mean return (~20% annual)
    true_sigma = 0.015  # Known daily volatility
    n_obs = 60  # 60 days of data (limited sample)
    
    returns = np.random.normal(true_mu, true_sigma, n_obs)
    sample_mean = returns.mean()
    
    print(f"\nüìà Data Generation:")
    print(f"   True mean: {true_mu:.5f} ({true_mu*252*100:.1f}% annualized)")
    print(f"   Known œÉ: {true_sigma:.5f}")
    print(f"   Sample size: {n_obs}")
    print(f"   Sample mean: {sample_mean:.5f}")
    
    # Grid approximation
    mu_grid = np.linspace(-0.002, 0.003, 1000)
    
    # Prior: Normal with mean 0 (skeptical) and std 0.001
    prior_mu = 0
    prior_sigma = 0.001
    prior = stats.norm.pdf(mu_grid, prior_mu, prior_sigma)
    
    # Likelihood: Product of normals (or equivalently, normal for sample mean)
    # The likelihood for sample mean given true mean Œº:
    # L(Œº) ‚àù N(sample_mean | Œº, œÉ/‚àön)
    likelihood_sigma = true_sigma / np.sqrt(n_obs)
    likelihood = stats.norm.pdf(sample_mean, mu_grid, likelihood_sigma)
    
    # Posterior (unnormalized)
    posterior_unnorm = likelihood * prior
    
    # Normalize
    posterior = posterior_unnorm / np.trapz(posterior_unnorm, mu_grid)
    
    # Analytical posterior (Normal-Normal conjugate)
    # Posterior precision = prior precision + data precision
    prior_precision = 1 / prior_sigma**2
    data_precision = n_obs / true_sigma**2
    post_precision = prior_precision + data_precision
    post_sigma = np.sqrt(1 / post_precision)
    
    # Posterior mean = weighted average
    post_mu = (prior_precision * prior_mu + data_precision * sample_mean) / post_precision
    
    posterior_analytical = stats.norm.pdf(mu_grid, post_mu, post_sigma)
    
    # Visualization
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    
    # Panel 1: Prior, Likelihood, Posterior
    ax = axes[0, 0]
    ax.plot(mu_grid * 252 * 100, prior / np.max(prior), 'b-', linewidth=2, 
            label=f'Prior: N({prior_mu}, {prior_sigma})')
    ax.plot(mu_grid * 252 * 100, likelihood / np.max(likelihood), 'g-', linewidth=2,
            label=f'Likelihood (from {n_obs} obs)')
    ax.fill_between(mu_grid * 252 * 100, posterior / np.max(posterior), alpha=0.3, color='purple')
    ax.plot(mu_grid * 252 * 100, posterior / np.max(posterior), 'purple', linewidth=2,
            label='Posterior (Grid)')
    ax.axvline(true_mu * 252 * 100, color='red', linestyle='--', alpha=0.7, label='True Œº')
    ax.axvline(sample_mean * 252 * 100, color='orange', linestyle=':', alpha=0.7, label='Sample mean')
    ax.set_xlabel('Annualized Return (%)', fontsize=12)
    ax.set_ylabel('Density (scaled)', fontsize=12)
    ax.set_title('Bayesian Inference via Grid Approximation', fontsize=14, fontweight='bold')
    ax.legend(fontsize=9)
    ax.set_xlim(-50, 80)
    
    # Panel 2: Grid vs Analytical
    ax = axes[0, 1]
    ax.plot(mu_grid * 252 * 100, posterior, 'purple', linewidth=2, label='Grid approximation')
    ax.plot(mu_grid * 252 * 100, posterior_analytical, 'k--', linewidth=2, label='Analytical')
    ax.fill_between(mu_grid * 252 * 100, posterior, alpha=0.3, color='purple')
    ax.set_xlabel('Annualized Return (%)', fontsize=12)
    ax.set_ylabel('Posterior Density', fontsize=12)
    ax.set_title('Grid vs Analytical Posterior', fontsize=14, fontweight='bold')
    ax.legend()
    
    # Panel 3: Effect of sample size
    ax = axes[1, 0]
    colors = plt.cm.viridis(np.linspace(0.2, 0.8, 5))
    
    for i, n in enumerate([10, 30, 60, 120, 252]):
        # Simulate data
        returns_n = np.random.normal(true_mu, true_sigma, n)
        sample_mean_n = returns_n.mean()
        
        # Posterior
        data_prec_n = n / true_sigma**2
        post_prec_n = prior_precision + data_prec_n
        post_sig_n = np.sqrt(1 / post_prec_n)
        post_mu_n = (prior_precision * prior_mu + data_prec_n * sample_mean_n) / post_prec_n
        
        post_n = stats.norm.pdf(mu_grid, post_mu_n, post_sig_n)
        ax.plot(mu_grid * 252 * 100, post_n, color=colors[i], linewidth=2, label=f'n={n}')
    
    ax.axvline(true_mu * 252 * 100, color='red', linestyle='--', alpha=0.7, label='True Œº')
    ax.set_xlabel('Annualized Return (%)', fontsize=12)
    ax.set_ylabel('Posterior Density', fontsize=12)
    ax.set_title('Posterior Sharpens with More Data', fontsize=14, fontweight='bold')
    ax.legend()
    
    # Panel 4: Effect of prior strength
    ax = axes[1, 1]
    prior_sigmas = [0.0003, 0.0007, 0.001, 0.002, 0.01]
    labels = ['Very strong', 'Strong', 'Moderate', 'Weak', 'Very weak']
    
    for i, (ps, label) in enumerate(zip(prior_sigmas, labels)):
        prior_prec_i = 1 / ps**2
        post_prec_i = prior_prec_i + data_precision
        post_sig_i = np.sqrt(1 / post_prec_i)
        post_mu_i = (prior_prec_i * prior_mu + data_precision * sample_mean) / post_prec_i
        
        post_i = stats.norm.pdf(mu_grid, post_mu_i, post_sig_i)
        ax.plot(mu_grid * 252 * 100, post_i, color=colors[i], linewidth=2, label=f'{label} (œÉ={ps})')
    
    ax.axvline(prior_mu * 252 * 100, color='blue', linestyle=':', alpha=0.7, label='Prior mean')
    ax.axvline(sample_mean * 252 * 100, color='orange', linestyle=':', alpha=0.7, label='Sample mean')
    ax.set_xlabel('Annualized Return (%)', fontsize=12)
    ax.set_ylabel('Posterior Density', fontsize=12)
    ax.set_title('Effect of Prior Strength', fontsize=14, fontweight='bold')
    ax.legend(fontsize=9)
    
    plt.tight_layout()
    plt.show()
    
    # Summary statistics
    print("\nüìä POSTERIOR SUMMARY")
    print("-" * 50)
    print(f"Prior: N(Œº‚ÇÄ={prior_mu:.5f}, œÉ‚ÇÄ={prior_sigma:.5f})")
    print(f"Data: n={n_obs}, xÃÑ={sample_mean:.5f}")
    print(f"\nPosterior: N(Œº={post_mu:.5f}, œÉ={post_sigma:.5f})")
    print(f"Annualized: {post_mu*252*100:.2f}% ¬± {1.96*post_sigma*252*100:.2f}%")
    
    # Shrinkage factor
    shrinkage = prior_precision / post_precision
    print(f"\nShrinkage toward prior: {shrinkage*100:.1f}%")
    print(f"Weight on data: {(1-shrinkage)*100:.1f}%")

posterior_grid_approximation()

---

## 6. Conjugate Priors in Finance

### üîó **What are Conjugate Priors?**

A prior distribution is **conjugate** to a likelihood if the posterior belongs to the same family as the prior. This allows:
- **Closed-form posteriors** (no numerical integration)
- **Efficient sequential updating**
- **Interpretable parameters**

### **Key Conjugate Pairs for Finance**

| Likelihood | Prior | Posterior | Application |
|------------|-------|-----------|-------------|
| Normal (known œÉ¬≤) | Normal | Normal | Return estimation |
| Normal (unknown œÉ¬≤) | Normal-Inverse-Gamma | Normal-Inverse-Gamma | Mean + volatility |
| Binomial | Beta | Beta | Win rates, hit ratios |
| Poisson | Gamma | Gamma | Trade counts |
| Multinomial | Dirichlet | Dirichlet | Portfolio weights |
| Exponential | Gamma | Gamma | Time between trades |

### **Mathematical Details**

**Beta-Binomial (Win Rate Estimation):**
- Prior: $\theta \sim \text{Beta}(\alpha, \beta)$
- Data: $k$ successes in $n$ trials
- Posterior: $\theta | k, n \sim \text{Beta}(\alpha + k, \beta + n - k)$

**Normal-Normal (Return Mean Estimation):**
- Prior: $\mu \sim N(\mu_0, \sigma_0^2)$
- Data: $\bar{x}$ from $n$ observations with known $\sigma^2$
- Posterior: $\mu | \bar{x} \sim N(\mu_n, \sigma_n^2)$ where:
  - $\sigma_n^2 = (1/\sigma_0^2 + n/\sigma^2)^{-1}$
  - $\mu_n = \sigma_n^2 (\mu_0/\sigma_0^2 + n\bar{x}/\sigma^2)$

In [None]:
class ConjugatePriors:
    """
    Implementation of conjugate prior-posterior pairs for finance applications.
    """
    
    @staticmethod
    def beta_binomial_update(prior_alpha, prior_beta, successes, trials):
        """
        Beta-Binomial conjugate update for win rate estimation.
        
        Parameters:
        -----------
        prior_alpha, prior_beta : float
            Parameters of Beta prior
        successes : int
            Number of successes (winning trades)
        trials : int
            Total number of trials (trades)
            
        Returns:
        --------
        post_alpha, post_beta : float
            Parameters of Beta posterior
        """
        post_alpha = prior_alpha + successes
        post_beta = prior_beta + (trials - successes)
        return post_alpha, post_beta
    
    @staticmethod
    def normal_normal_update(prior_mu, prior_sigma, data_mean, data_sigma, n):
        """
        Normal-Normal conjugate update for mean estimation.
        
        Parameters:
        -----------
        prior_mu, prior_sigma : float
            Prior mean and std
        data_mean : float
            Sample mean
        data_sigma : float
            Known data std
        n : int
            Sample size
            
        Returns:
        --------
        post_mu, post_sigma : float
            Posterior mean and std
        """
        prior_precision = 1 / prior_sigma**2
        data_precision = n / data_sigma**2
        
        post_precision = prior_precision + data_precision
        post_sigma = np.sqrt(1 / post_precision)
        post_mu = (prior_mu * prior_precision + data_mean * data_precision) / post_precision
        
        return post_mu, post_sigma
    
    @staticmethod
    def gamma_poisson_update(prior_alpha, prior_beta, total_counts, n_periods):
        """
        Gamma-Poisson conjugate update for rate estimation.
        
        Parameters:
        -----------
        prior_alpha, prior_beta : float
            Parameters of Gamma prior (shape, rate)
        total_counts : int
            Total observed counts
        n_periods : int
            Number of observation periods
            
        Returns:
        --------
        post_alpha, post_beta : float
            Parameters of Gamma posterior
        """
        post_alpha = prior_alpha + total_counts
        post_beta = prior_beta + n_periods
        return post_alpha, post_beta
    
    @staticmethod
    def normal_inverse_gamma_update(prior_mu0, prior_n0, prior_alpha, prior_beta, data, sample_mean, sample_var):
        """
        Normal-Inverse-Gamma conjugate update for unknown mean and variance.
        
        This is for the case where both Œº and œÉ¬≤ are unknown.
        
        Parameters:
        -----------
        prior_mu0 : float
            Prior mean for Œº
        prior_n0 : float
            Prior "sample size" (strength of belief in mean)
        prior_alpha, prior_beta : float
            Prior parameters for œÉ¬≤ (Inverse-Gamma)
        data : array
            Observed data
        sample_mean : float
            Sample mean of data
        sample_var : float
            Sample variance of data (n-1 denominator)
            
        Returns:
        --------
        post_mu0, post_n0, post_alpha, post_beta : float
            Posterior parameters
        """
        n = len(data)
        
        # Update parameters
        post_n0 = prior_n0 + n
        post_mu0 = (prior_n0 * prior_mu0 + n * sample_mean) / post_n0
        post_alpha = prior_alpha + n / 2
        
        # Sum of squares term
        ss = np.sum((data - sample_mean)**2)
        post_beta = prior_beta + 0.5 * ss + (prior_n0 * n * (sample_mean - prior_mu0)**2) / (2 * post_n0)
        
        return post_mu0, post_n0, post_alpha, post_beta


def conjugate_priors_demo():
    """
    Demonstrate conjugate priors for trading applications.
    """
    print("=" * 60)
    print("CONJUGATE PRIORS FOR TRADING APPLICATIONS")
    print("=" * 60)
    
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    
    # =========================================
    # 1. Beta-Binomial: Win Rate Estimation
    # =========================================
    ax = axes[0, 0]
    
    # Prior: Skeptical about edge, centered at 50%
    prior_alpha, prior_beta = 5, 5
    
    # Simulate trading results
    true_win_rate = 0.55
    np.random.seed(42)
    
    p_grid = np.linspace(0.01, 0.99, 200)
    colors = plt.cm.Blues(np.linspace(0.3, 1, 5))
    
    # Show prior
    prior_dist = stats.beta.pdf(p_grid, prior_alpha, prior_beta)
    ax.plot(p_grid, prior_dist, 'k--', linewidth=2, label='Prior')
    
    # Sequential updating
    cumulative_wins = 0
    cumulative_trades = 0
    trade_batches = [20, 40, 60, 80, 100]
    
    for i, n_trades in enumerate(trade_batches):
        new_trades = n_trades - cumulative_trades
        wins = np.sum(np.random.random(new_trades) < true_win_rate)
        cumulative_wins += wins
        cumulative_trades = n_trades
        
        post_alpha, post_beta = ConjugatePriors.beta_binomial_update(
            prior_alpha, prior_beta, cumulative_wins, cumulative_trades
        )
        
        post_dist = stats.beta.pdf(p_grid, post_alpha, post_beta)
        ax.plot(p_grid, post_dist, color=colors[i], linewidth=2,
                label=f'n={n_trades} (wins={cumulative_wins})')
    
    ax.axvline(true_win_rate, color='red', linestyle='--', alpha=0.7, label=f'True rate = {true_win_rate}')
    ax.set_xlabel('Win Rate', fontsize=12)
    ax.set_ylabel('Density', fontsize=12)
    ax.set_title('Beta-Binomial: Win Rate Estimation', fontsize=14, fontweight='bold')
    ax.legend(fontsize=9)
    
    # =========================================
    # 2. Normal-Normal: Return Estimation
    # =========================================
    ax = axes[0, 1]
    
    # Prior: Skeptical, centered at 0
    prior_mu, prior_sigma = 0, 0.001  # ~25% annualized std on prior
    
    # True parameters
    true_mu = 0.0006  # ~15% annual
    data_sigma = 0.015  # Known daily volatility
    
    mu_grid = np.linspace(-0.002, 0.002, 200)
    
    # Prior distribution
    ax.plot(mu_grid * 252 * 100, stats.norm.pdf(mu_grid, prior_mu, prior_sigma), 
            'k--', linewidth=2, label='Prior')
    
    # Sequential updating
    np.random.seed(123)
    sample_sizes = [20, 50, 100, 200, 500]
    colors = plt.cm.Greens(np.linspace(0.3, 1, len(sample_sizes)))
    
    for i, n in enumerate(sample_sizes):
        data = np.random.normal(true_mu, data_sigma, n)
        sample_mean = data.mean()
        
        post_mu, post_sigma = ConjugatePriors.normal_normal_update(
            prior_mu, prior_sigma, sample_mean, data_sigma, n
        )
        
        post_dist = stats.norm.pdf(mu_grid, post_mu, post_sigma)
        ax.plot(mu_grid * 252 * 100, post_dist, color=colors[i], linewidth=2,
                label=f'n={n}')
    
    ax.axvline(true_mu * 252 * 100, color='red', linestyle='--', alpha=0.7, 
               label=f'True Œº = {true_mu*252*100:.1f}%')
    ax.set_xlabel('Annualized Return (%)', fontsize=12)
    ax.set_ylabel('Density', fontsize=12)
    ax.set_title('Normal-Normal: Mean Return Estimation', fontsize=14, fontweight='bold')
    ax.legend(fontsize=9)
    
    # =========================================
    # 3. Gamma-Poisson: Trade Count Modeling
    # =========================================
    ax = axes[1, 0]
    
    # Prior: Expect about 5 trades per day
    prior_alpha, prior_beta = 5, 1  # Mean = 5, variance = 5
    
    # True rate
    true_rate = 7
    
    lambda_grid = np.linspace(0.1, 15, 200)
    
    # Prior
    ax.plot(lambda_grid, stats.gamma.pdf(lambda_grid, prior_alpha, scale=1/prior_beta),
            'k--', linewidth=2, label='Prior')
    
    # Sequential updating
    np.random.seed(456)
    n_days_list = [5, 10, 20, 50, 100]
    colors = plt.cm.Oranges(np.linspace(0.3, 1, len(n_days_list)))
    
    for i, n_days in enumerate(n_days_list):
        daily_trades = np.random.poisson(true_rate, n_days)
        total_trades = daily_trades.sum()
        
        post_alpha, post_beta = ConjugatePriors.gamma_poisson_update(
            prior_alpha, prior_beta, total_trades, n_days
        )
        
        post_dist = stats.gamma.pdf(lambda_grid, post_alpha, scale=1/post_beta)
        ax.plot(lambda_grid, post_dist, color=colors[i], linewidth=2,
                label=f'{n_days} days (total={total_trades})')
    
    ax.axvline(true_rate, color='red', linestyle='--', alpha=0.7, label=f'True Œª = {true_rate}')
    ax.set_xlabel('Trades per Day (Œª)', fontsize=12)
    ax.set_ylabel('Density', fontsize=12)
    ax.set_title('Gamma-Poisson: Trade Count Rate', fontsize=14, fontweight='bold')
    ax.legend(fontsize=9)
    
    # =========================================
    # 4. Summary Statistics Comparison
    # =========================================
    ax = axes[1, 1]
    ax.axis('off')
    
    summary_text = """
    CONJUGATE PRIOR SUMMARY FOR FINANCE
    ====================================
    
    1. BETA-BINOMIAL (Win Rate)
       ‚Ä¢ Prior: Beta(Œ±, Œ≤) with mean = Œ±/(Œ±+Œ≤)
       ‚Ä¢ After k wins in n trades:
         Posterior: Beta(Œ±+k, Œ≤+n-k)
       ‚Ä¢ Equivalent to starting with Œ±-1 "pseudo-wins"
         and Œ≤-1 "pseudo-losses"
    
    2. NORMAL-NORMAL (Mean Return)
       ‚Ä¢ Prior: N(Œº‚ÇÄ, œÉ‚ÇÄ¬≤) on mean return
       ‚Ä¢ Data: n observations with known œÉ¬≤
       ‚Ä¢ Posterior mean is precision-weighted average:
         Œº‚Çô = (Œº‚ÇÄ/œÉ‚ÇÄ¬≤ + nxÃÑ/œÉ¬≤) / (1/œÉ‚ÇÄ¬≤ + n/œÉ¬≤)
       ‚Ä¢ Posterior variance: œÉ‚Çô¬≤ = 1/(1/œÉ‚ÇÄ¬≤ + n/œÉ¬≤)
    
    3. GAMMA-POISSON (Trade Counts)
       ‚Ä¢ Prior: Gamma(Œ±, Œ≤) on rate Œª
       ‚Ä¢ After observing total T events in n periods:
         Posterior: Gamma(Œ±+T, Œ≤+n)
       ‚Ä¢ Expected rate: (Œ±+T)/(Œ≤+n)
    
    KEY INSIGHT: Conjugate priors allow efficient
    sequential updating as new data arrives!
    """
    
    ax.text(0.05, 0.95, summary_text, transform=ax.transAxes,
            fontsize=11, verticalalignment='top', fontfamily='monospace',
            bbox=dict(boxstyle='round', facecolor='lightgray', alpha=0.3))
    
    plt.tight_layout()
    plt.show()

conjugate_priors_demo()

---

## 7. Bayesian Parameter Estimation for Returns

### üìà **Estimating Return Parameters with Uncertainty**

In frequentist analysis, we get point estimates (MLE). In Bayesian analysis, we get full posterior distributions capturing parameter uncertainty.

### **Why This Matters for Trading**

1. **Small samples**: Limited historical data makes point estimates unreliable
2. **Risk management**: Need uncertainty in estimates for conservative risk measures
3. **Portfolio optimization**: Incorporate estimation error in allocation
4. **Strategy evaluation**: Distinguish skill from luck with proper uncertainty

### **Joint Estimation of Mean and Variance**

When both Œº and œÉ¬≤ are unknown, we use the Normal-Inverse-Gamma prior:

$$p(\mu, \sigma^2) = p(\mu | \sigma^2) \cdot p(\sigma^2)$$

where:
- $\mu | \sigma^2 \sim N(\mu_0, \sigma^2/\kappa_0)$
- $\sigma^2 \sim \text{Inverse-Gamma}(\alpha_0, \beta_0)$

In [None]:
def bayesian_return_estimation():
    """
    Bayesian estimation of return parameters (mean and variance).
    Compare with frequentist MLE estimates.
    """
    print("=" * 60)
    print("BAYESIAN RETURN PARAMETER ESTIMATION")
    print("=" * 60)
    
    # Simulate asset returns
    np.random.seed(42)
    true_mu = 0.0006  # Daily mean (~15% annual)
    true_sigma = 0.018  # Daily vol (~28% annual)
    n_obs = 100
    
    returns = np.random.normal(true_mu, true_sigma, n_obs)
    
    # Frequentist (MLE) estimates
    mle_mu = returns.mean()
    mle_sigma = returns.std(ddof=0)  # MLE uses n, not n-1
    
    # Standard errors (frequentist)
    se_mu = mle_sigma / np.sqrt(n_obs)
    se_sigma = mle_sigma / np.sqrt(2 * n_obs)
    
    print(f"\nüìà TRUE PARAMETERS:")
    print(f"   Œº = {true_mu:.6f} ({true_mu*252*100:.2f}% annualized)")
    print(f"   œÉ = {true_sigma:.6f} ({true_sigma*np.sqrt(252)*100:.2f}% annualized)")
    
    print(f"\nüìä FREQUENTIST (MLE) ESTIMATES:")
    print(f"   ŒºÃÇ = {mle_mu:.6f} ¬± {1.96*se_mu:.6f}")
    print(f"   œÉÃÇ = {mle_sigma:.6f} ¬± {1.96*se_sigma:.6f}")
    
    # =========================================
    # Bayesian Estimation with Normal-Inverse-Gamma Prior
    # =========================================
    
    # Prior parameters (weakly informative)
    # Prior on Œº|œÉ¬≤: N(Œº‚ÇÄ, œÉ¬≤/Œ∫‚ÇÄ)
    mu0 = 0  # Prior mean for Œº (skeptical)
    kappa0 = 1  # Prior "sample size" for mean
    
    # Prior on œÉ¬≤: Inverse-Gamma(Œ±‚ÇÄ, Œ≤‚ÇÄ)
    alpha0 = 3  # Shape
    beta0 = 0.0005  # Scale (implies prior mean œÉ¬≤ = Œ≤‚ÇÄ/(Œ±‚ÇÄ-1) ‚âà 0.00025)
    
    # Posterior parameters (Normal-Inverse-Gamma update)
    n = len(returns)
    sample_mean = returns.mean()
    sample_var = returns.var(ddof=0)
    
    # Update
    kappa_n = kappa0 + n
    mu_n = (kappa0 * mu0 + n * sample_mean) / kappa_n
    alpha_n = alpha0 + n / 2
    
    ss = np.sum((returns - sample_mean)**2)
    beta_n = beta0 + 0.5 * ss + (kappa0 * n * (sample_mean - mu0)**2) / (2 * kappa_n)
    
    # Marginal posterior of œÉ¬≤ is Inverse-Gamma(Œ±_n, Œ≤_n)
    # Marginal posterior of Œº is Student-t
    
    print(f"\nüéØ BAYESIAN ESTIMATES:")
    print(f"   Prior: Œº‚ÇÄ={mu0}, Œ∫‚ÇÄ={kappa0}, Œ±‚ÇÄ={alpha0}, Œ≤‚ÇÄ={beta0}")
    print(f"   Posterior: Œº‚Çô={mu_n:.6f}, Œ∫‚Çô={kappa_n}, Œ±‚Çô={alpha_n:.1f}, Œ≤‚Çô={beta_n:.6f}")
    
    # Posterior mean and variance of œÉ¬≤
    post_mean_sigma2 = beta_n / (alpha_n - 1)
    post_sigma = np.sqrt(post_mean_sigma2)
    
    # Posterior mean of Œº (same as Œº_n for Normal-Inverse-Gamma)
    post_mu = mu_n
    
    print(f"\n   Posterior mean Œº = {post_mu:.6f}")
    print(f"   Posterior mean œÉ = {post_sigma:.6f}")
    
    # =========================================
    # Visualization
    # =========================================
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    
    # Panel 1: Data histogram
    ax = axes[0, 0]
    ax.hist(returns, bins=30, density=True, alpha=0.6, color='steelblue', label='Data')
    
    x_range = np.linspace(returns.min() - 0.02, returns.max() + 0.02, 200)
    ax.plot(x_range, stats.norm.pdf(x_range, true_mu, true_sigma), 'r-', 
            linewidth=2, label=f'True N({true_mu:.4f}, {true_sigma:.4f}¬≤)')
    ax.plot(x_range, stats.norm.pdf(x_range, mle_mu, mle_sigma), 'g--',
            linewidth=2, label=f'MLE N({mle_mu:.4f}, {mle_sigma:.4f}¬≤)')
    ax.set_xlabel('Daily Return', fontsize=12)
    ax.set_ylabel('Density', fontsize=12)
    ax.set_title('Return Distribution', fontsize=14, fontweight='bold')
    ax.legend()
    
    # Panel 2: Posterior of Œº (marginal is Student-t)
    ax = axes[0, 1]
    
    mu_grid = np.linspace(-0.003, 0.003, 500)
    
    # Prior on Œº (integrated over œÉ¬≤): Student-t
    prior_mu_df = 2 * alpha0
    prior_mu_scale = np.sqrt(beta0 / (alpha0 * kappa0))
    prior_mu_dist = stats.t.pdf(mu_grid, df=prior_mu_df, loc=mu0, scale=prior_mu_scale)
    
    # Posterior on Œº: Student-t
    post_mu_df = 2 * alpha_n
    post_mu_scale = np.sqrt(beta_n / (alpha_n * kappa_n))
    post_mu_dist = stats.t.pdf(mu_grid, df=post_mu_df, loc=mu_n, scale=post_mu_scale)
    
    # MLE distribution (frequentist)
    mle_dist = stats.norm.pdf(mu_grid, mle_mu, se_mu)
    
    ax.plot(mu_grid * 252 * 100, prior_mu_dist / np.max(prior_mu_dist), 'b--', 
            linewidth=2, alpha=0.7, label='Prior (scaled)')
    ax.fill_between(mu_grid * 252 * 100, post_mu_dist, alpha=0.3, color='purple')
    ax.plot(mu_grid * 252 * 100, post_mu_dist, 'purple', linewidth=2, 
            label='Posterior (Bayesian)')
    ax.plot(mu_grid * 252 * 100, mle_dist, 'g--', linewidth=2, 
            label='MLE Sampling Dist')
    ax.axvline(true_mu * 252 * 100, color='red', linestyle=':', 
               linewidth=2, label='True Œº')
    ax.set_xlabel('Annualized Return (%)', fontsize=12)
    ax.set_ylabel('Density', fontsize=12)
    ax.set_title('Posterior Distribution of Mean Return', fontsize=14, fontweight='bold')
    ax.legend(fontsize=9)
    
    # Panel 3: Posterior of œÉ¬≤ (Inverse-Gamma)
    ax = axes[1, 0]
    
    sigma2_grid = np.linspace(0.0001, 0.0008, 500)
    sigma_grid = np.sqrt(sigma2_grid)
    
    # Prior (Inverse-Gamma for œÉ¬≤)
    prior_sigma2 = stats.invgamma.pdf(sigma2_grid, alpha0, scale=beta0)
    
    # Posterior
    post_sigma2 = stats.invgamma.pdf(sigma2_grid, alpha_n, scale=beta_n)
    
    # Convert to œÉ scale via Jacobian
    ax.plot(sigma_grid * np.sqrt(252) * 100, prior_sigma2 * 2 * sigma_grid / np.max(prior_sigma2 * 2 * sigma_grid), 
            'b--', linewidth=2, alpha=0.7, label='Prior (scaled)')
    ax.fill_between(sigma_grid * np.sqrt(252) * 100, post_sigma2 * 2 * sigma_grid, alpha=0.3, color='orange')
    ax.plot(sigma_grid * np.sqrt(252) * 100, post_sigma2 * 2 * sigma_grid, 'orange', 
            linewidth=2, label='Posterior')
    ax.axvline(true_sigma * np.sqrt(252) * 100, color='red', linestyle=':', 
               linewidth=2, label='True œÉ')
    ax.axvline(mle_sigma * np.sqrt(252) * 100, color='green', linestyle='--',
               linewidth=2, label='MLE œÉ')
    ax.set_xlabel('Annualized Volatility (%)', fontsize=12)
    ax.set_ylabel('Density', fontsize=12)
    ax.set_title('Posterior Distribution of Volatility', fontsize=14, fontweight='bold')
    ax.legend(fontsize=9)
    
    # Panel 4: Comparison summary
    ax = axes[1, 1]
    
    # Credible intervals
    ci_mu_low = stats.t.ppf(0.025, df=post_mu_df, loc=mu_n, scale=post_mu_scale)
    ci_mu_high = stats.t.ppf(0.975, df=post_mu_df, loc=mu_n, scale=post_mu_scale)
    
    ci_sigma2_low = stats.invgamma.ppf(0.025, alpha_n, scale=beta_n)
    ci_sigma2_high = stats.invgamma.ppf(0.975, alpha_n, scale=beta_n)
    
    # Shrinkage analysis
    shrinkage_mu = (mle_mu - post_mu) / (mle_mu - mu0) if mle_mu != mu0 else 0
    
    summary_data = {
        'Parameter': ['Œº (daily)', 'Œº (annual %)', 'œÉ (daily)', 'œÉ (annual %)'],
        'True': [f'{true_mu:.5f}', f'{true_mu*252*100:.2f}', 
                 f'{true_sigma:.5f}', f'{true_sigma*np.sqrt(252)*100:.2f}'],
        'MLE': [f'{mle_mu:.5f}', f'{mle_mu*252*100:.2f}',
                f'{mle_sigma:.5f}', f'{mle_sigma*np.sqrt(252)*100:.2f}'],
        'Bayesian': [f'{post_mu:.5f}', f'{post_mu*252*100:.2f}',
                    f'{post_sigma:.5f}', f'{post_sigma*np.sqrt(252)*100:.2f}'],
        '95% CI': [f'[{ci_mu_low:.5f}, {ci_mu_high:.5f}]', 
                   f'[{ci_mu_low*252*100:.1f}%, {ci_mu_high*252*100:.1f}%]',
                   f'[{np.sqrt(ci_sigma2_low):.5f}, {np.sqrt(ci_sigma2_high):.5f}]',
                   f'[{np.sqrt(ci_sigma2_low)*np.sqrt(252)*100:.1f}%, {np.sqrt(ci_sigma2_high)*np.sqrt(252)*100:.1f}%]']
    }
    
    df_summary = pd.DataFrame(summary_data)
    ax.axis('off')
    
    table = ax.table(cellText=df_summary.values,
                     colLabels=df_summary.columns,
                     cellLoc='center',
                     loc='center',
                     colColours=['lightblue']*5)
    table.auto_set_font_size(False)
    table.set_fontsize(10)
    table.scale(1.2, 1.8)
    ax.set_title('Estimation Summary', fontsize=14, fontweight='bold', pad=20)
    
    plt.tight_layout()
    plt.show()
    
    print(f"\nüéØ KEY INSIGHTS:")
    print(f"   1. Bayesian estimate of Œº is shrunk toward prior ({shrinkage_mu*100:.1f}% shrinkage)")
    print(f"   2. 95% credible interval captures true Œº: {ci_mu_low*252*100:.1f}% to {ci_mu_high*252*100:.1f}%")
    print(f"   3. Uncertainty in œÉ affects confidence in Sharpe ratio!")

bayesian_return_estimation()

---

## 8. Bayesian Updating with New Market Data

### üîÑ **Sequential Bayesian Learning**

One of the most powerful features of Bayesian inference is **sequential updating**:
- Today's posterior becomes tomorrow's prior
- No need to re-analyze all historical data
- Natural framework for online learning

### **The Update Equation**

$$P(\theta | D_1, D_2) = \frac{P(D_2 | \theta) \cdot P(\theta | D_1)}{P(D_2 | D_1)}$$

This is equivalent to analyzing all data at once:
$$P(\theta | D_1, D_2) \propto P(D_2 | \theta) \cdot P(D_1 | \theta) \cdot P(\theta)$$

### **Applications in Trading**

1. **Adaptive parameter estimation**: Update strategy parameters as market evolves
2. **Regime detection**: Track changing market conditions
3. **Strategy selection**: Learn which strategy performs best (Thompson Sampling)
4. **Risk monitoring**: Update risk estimates in real-time

In [None]:
def bayesian_online_learning():
    """
    Demonstrate sequential Bayesian updating for trading.
    """
    print("=" * 60)
    print("SEQUENTIAL BAYESIAN UPDATING FOR TRADING")
    print("=" * 60)
    
    # Simulate a trading strategy with regime change
    np.random.seed(42)
    
    # Regime 1: Lower win rate (first 100 trades)
    # Regime 2: Higher win rate (next 100 trades)
    true_win_rate_1 = 0.48
    true_win_rate_2 = 0.58
    
    trades_regime_1 = np.random.random(100) < true_win_rate_1
    trades_regime_2 = np.random.random(100) < true_win_rate_2
    all_trades = np.concatenate([trades_regime_1, trades_regime_2])
    
    # Initialize prior: Beta(2, 2) - mild skepticism
    alpha, beta_param = 2, 2
    
    # Store history for visualization
    history = {
        'trade_num': [0],
        'alpha': [alpha],
        'beta': [beta_param],
        'posterior_mean': [alpha / (alpha + beta_param)],
        'ci_low': [stats.beta.ppf(0.025, alpha, beta_param)],
        'ci_high': [stats.beta.ppf(0.975, alpha, beta_param)],
        'cumulative_win_rate': [0.5]
    }
    
    cumulative_wins = 0
    
    # Sequential update
    for i, win in enumerate(all_trades):
        # Update posterior
        if win:
            alpha += 1
            cumulative_wins += 1
        else:
            beta_param += 1
        
        # Store
        history['trade_num'].append(i + 1)
        history['alpha'].append(alpha)
        history['beta'].append(beta_param)
        history['posterior_mean'].append(alpha / (alpha + beta_param))
        history['ci_low'].append(stats.beta.ppf(0.025, alpha, beta_param))
        history['ci_high'].append(stats.beta.ppf(0.975, alpha, beta_param))
        history['cumulative_win_rate'].append(cumulative_wins / (i + 1))
    
    df_history = pd.DataFrame(history)
    
    # Visualization
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    
    # Panel 1: Posterior mean with credible interval
    ax = axes[0, 0]
    ax.fill_between(df_history['trade_num'], df_history['ci_low'], df_history['ci_high'],
                    alpha=0.3, color='blue', label='95% Credible Interval')
    ax.plot(df_history['trade_num'], df_history['posterior_mean'], 'b-', 
            linewidth=2, label='Posterior Mean')
    ax.plot(df_history['trade_num'], df_history['cumulative_win_rate'], 'g--',
            linewidth=1.5, alpha=0.7, label='Cumulative Win Rate (MLE)')
    ax.axhline(true_win_rate_1, color='red', linestyle=':', alpha=0.7, 
               label=f'True Rate Regime 1 = {true_win_rate_1}')
    ax.axhline(true_win_rate_2, color='orange', linestyle=':', alpha=0.7,
               label=f'True Rate Regime 2 = {true_win_rate_2}')
    ax.axvline(100, color='black', linestyle='--', alpha=0.5, label='Regime Change')
    ax.set_xlabel('Number of Trades', fontsize=12)
    ax.set_ylabel('Win Rate', fontsize=12)
    ax.set_title('Sequential Bayesian Updating of Win Rate', fontsize=14, fontweight='bold')
    ax.legend(fontsize=9, loc='lower right')
    ax.set_xlim(0, 200)
    ax.set_ylim(0.3, 0.75)
    
    # Panel 2: Evolution of posterior distribution
    ax = axes[0, 1]
    p_grid = np.linspace(0.01, 0.99, 200)
    
    snapshots = [0, 10, 50, 100, 150, 200]
    colors = plt.cm.viridis(np.linspace(0.2, 0.9, len(snapshots)))
    
    for i, n in enumerate(snapshots):
        a = df_history.loc[n, 'alpha']
        b = df_history.loc[n, 'beta']
        posterior = stats.beta.pdf(p_grid, a, b)
        ax.plot(p_grid, posterior, color=colors[i], linewidth=2, label=f'n={n}')
    
    ax.axvline(true_win_rate_1, color='red', linestyle=':', alpha=0.7)
    ax.axvline(true_win_rate_2, color='orange', linestyle=':', alpha=0.7)
    ax.set_xlabel('Win Rate', fontsize=12)
    ax.set_ylabel('Posterior Density', fontsize=12)
    ax.set_title('Posterior Evolution Over Time', fontsize=14, fontweight='bold')
    ax.legend(fontsize=10)
    
    # Panel 3: Credible interval width (uncertainty reduction)
    ax = axes[1, 0]
    ci_width = np.array(df_history['ci_high']) - np.array(df_history['ci_low'])
    ax.plot(df_history['trade_num'], ci_width, 'purple', linewidth=2)
    ax.set_xlabel('Number of Trades', fontsize=12)
    ax.set_ylabel('95% CI Width', fontsize=12)
    ax.set_title('Uncertainty Reduction with More Data', fontsize=14, fontweight='bold')
    ax.axvline(100, color='black', linestyle='--', alpha=0.5)
    ax.set_xlim(0, 200)
    
    # Panel 4: Bayesian Forgetting (Exponential weighting)
    ax = axes[1, 1]
    
    # Implement Bayesian updating with exponential forgetting
    # This gives more weight to recent observations
    decay_factor = 0.99  # "Forget" 1% per observation
    
    alpha_forget, beta_forget = 2, 2
    forget_history = {'trade_num': [0], 'posterior_mean': [0.5]}
    
    for i, win in enumerate(all_trades):
        # Decay toward prior before update
        alpha_forget = 1 + decay_factor * (alpha_forget - 1)
        beta_forget = 1 + decay_factor * (beta_forget - 1)
        
        # Update
        if win:
            alpha_forget += 1
        else:
            beta_forget += 1
        
        forget_history['trade_num'].append(i + 1)
        forget_history['posterior_mean'].append(alpha_forget / (alpha_forget + beta_forget))
    
    df_forget = pd.DataFrame(forget_history)
    
    ax.plot(df_history['trade_num'], df_history['posterior_mean'], 'b-',
            linewidth=2, label='Standard Bayesian', alpha=0.7)
    ax.plot(df_forget['trade_num'], df_forget['posterior_mean'], 'r-',
            linewidth=2, label=f'With Forgetting (Œª={decay_factor})')
    ax.axhline(true_win_rate_1, color='blue', linestyle=':', alpha=0.5)
    ax.axhline(true_win_rate_2, color='orange', linestyle=':', alpha=0.5)
    ax.axvline(100, color='black', linestyle='--', alpha=0.5)
    ax.set_xlabel('Number of Trades', fontsize=12)
    ax.set_ylabel('Posterior Mean', fontsize=12)
    ax.set_title('Bayesian Forgetting for Non-Stationary Data', fontsize=14, fontweight='bold')
    ax.legend(fontsize=10)
    ax.set_xlim(0, 200)
    
    plt.tight_layout()
    plt.show()
    
    print("\nüìä KEY OBSERVATIONS:")
    print("-" * 50)
    print("1. Standard Bayesian: Averages all data, slow to detect regime change")
    print("2. Bayesian with forgetting: Adapts faster to new regime")
    print("3. Credible intervals shrink as we observe more data")
    print("4. Posterior mean is more stable than cumulative win rate (shrinkage)")

bayesian_online_learning()

---

## 9. Credible Intervals vs Confidence Intervals

### üìè **Understanding the Difference**

| Aspect | Frequentist CI | Bayesian Credible Interval |
|--------|----------------|---------------------------|
| **Definition** | Interval procedure | Interval of posterior |
| **Interpretation** | 95% of such intervals contain Œ∏ | 95% probability Œ∏ is in interval |
| **Requires** | Repeated sampling concept | Prior distribution |
| **Computation** | Sampling distribution | Posterior distribution |
| **Fixed** | True parameter Œ∏ | Observed data D |
| **Random** | Interval endpoints | Parameter Œ∏ |

### **Types of Bayesian Intervals**

1. **Equal-tailed Interval (ETI)**: 2.5% in each tail
2. **Highest Density Interval (HDI)**: Narrowest interval containing 95%

For symmetric posteriors, ETI = HDI. For skewed posteriors, HDI is often preferred.

In [None]:
def credible_intervals_demo():
    """
    Compare credible intervals with confidence intervals.
    """
    print("=" * 60)
    print("CREDIBLE INTERVALS vs CONFIDENCE INTERVALS")
    print("=" * 60)
    
    def compute_hdi(samples, prob=0.95):
        """
        Compute Highest Density Interval from samples.
        """
        samples = np.sort(samples)
        n = len(samples)
        interval_idx = int(np.floor(prob * n))
        n_intervals = n - interval_idx
        
        interval_widths = samples[interval_idx:] - samples[:n_intervals]
        min_idx = np.argmin(interval_widths)
        
        return samples[min_idx], samples[min_idx + interval_idx]
    
    # Scenario: Estimate Sharpe ratio from limited data
    np.random.seed(42)
    true_sharpe = 0.8
    n_obs = 50
    
    # Generate returns consistent with Sharpe = 0.8 (annual)
    daily_sharpe = true_sharpe / np.sqrt(252)
    returns = np.random.normal(daily_sharpe * 0.02, 0.02, n_obs)  # 2% daily vol
    
    sample_mean = returns.mean()
    sample_std = returns.std(ddof=1)
    sample_sharpe = (sample_mean / sample_std) * np.sqrt(252)
    
    # =========================================
    # Frequentist Confidence Interval
    # =========================================
    # Standard error of Sharpe ratio (Lo, 2002)
    se_sharpe = np.sqrt((1 + 0.5 * sample_sharpe**2) / n_obs) * np.sqrt(252)
    freq_ci_low = sample_sharpe - 1.96 * se_sharpe
    freq_ci_high = sample_sharpe + 1.96 * se_sharpe
    
    # =========================================
    # Bayesian Credible Interval (via sampling)
    # =========================================
    # Prior on Sharpe ratio: Normal(0, 1) - skeptical
    prior_mu = 0
    prior_sigma = 1
    
    # Bootstrap posterior approximation
    n_bootstrap = 10000
    sharpe_samples = []
    
    for _ in range(n_bootstrap):
        # Sample from posterior (approximate via bootstrap + prior)
        boot_idx = np.random.choice(n_obs, n_obs, replace=True)
        boot_returns = returns[boot_idx]
        boot_sharpe = (boot_returns.mean() / boot_returns.std()) * np.sqrt(252)
        
        # Add prior influence (approximate)
        prior_weight = 1 / (1 + n_obs/10)  # Prior gets less weight with more data
        posterior_sharpe = (1 - prior_weight) * boot_sharpe + prior_weight * np.random.normal(prior_mu, prior_sigma)
        sharpe_samples.append(posterior_sharpe)
    
    sharpe_samples = np.array(sharpe_samples)
    
    # Equal-tailed credible interval
    eti_low = np.percentile(sharpe_samples, 2.5)
    eti_high = np.percentile(sharpe_samples, 97.5)
    
    # Highest Density Interval
    hdi_low, hdi_high = compute_hdi(sharpe_samples, 0.95)
    
    # =========================================
    # Visualization
    # =========================================
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    
    # Panel 1: Posterior distribution with intervals
    ax = axes[0, 0]
    ax.hist(sharpe_samples, bins=50, density=True, alpha=0.6, color='steelblue',
            label='Posterior samples')
    
    # Mark intervals
    ax.axvline(sample_sharpe, color='green', linestyle='-', linewidth=2, label='Sample Sharpe')
    ax.axvline(true_sharpe, color='red', linestyle='--', linewidth=2, label='True Sharpe')
    
    # ETI
    ax.axvline(eti_low, color='purple', linestyle=':', linewidth=2)
    ax.axvline(eti_high, color='purple', linestyle=':', linewidth=2, label=f'95% ETI [{eti_low:.2f}, {eti_high:.2f}]')
    
    # HDI
    ax.axvline(hdi_low, color='orange', linestyle='-.', linewidth=2)
    ax.axvline(hdi_high, color='orange', linestyle='-.', linewidth=2, label=f'95% HDI [{hdi_low:.2f}, {hdi_high:.2f}]')
    
    ax.set_xlabel('Sharpe Ratio (Annualized)', fontsize=12)
    ax.set_ylabel('Density', fontsize=12)
    ax.set_title('Posterior Distribution of Sharpe Ratio', fontsize=14, fontweight='bold')
    ax.legend(fontsize=9)
    
    # Panel 2: Frequentist vs Bayesian intervals
    ax = axes[0, 1]
    
    y_positions = [0, 1, 2]
    labels = ['Frequentist 95% CI', '95% ETI (Bayesian)', '95% HDI (Bayesian)']
    
    # Frequentist CI
    ax.barh(y_positions[0], freq_ci_high - freq_ci_low, left=freq_ci_low, 
            height=0.4, color='green', alpha=0.6)
    ax.plot(sample_sharpe, y_positions[0], 'g^', markersize=12)
    
    # ETI
    ax.barh(y_positions[1], eti_high - eti_low, left=eti_low,
            height=0.4, color='purple', alpha=0.6)
    ax.plot(np.mean(sharpe_samples), y_positions[1], 'p^', markersize=12, color='purple')
    
    # HDI
    ax.barh(y_positions[2], hdi_high - hdi_low, left=hdi_low,
            height=0.4, color='orange', alpha=0.6)
    ax.plot(np.median(sharpe_samples), y_positions[2], 'o^', markersize=12, color='orange')
    
    ax.axvline(true_sharpe, color='red', linestyle='--', linewidth=2, label='True Sharpe')
    ax.set_yticks(y_positions)
    ax.set_yticklabels(labels)
    ax.set_xlabel('Sharpe Ratio', fontsize=12)
    ax.set_title('Comparison of Interval Estimates', fontsize=14, fontweight='bold')
    ax.legend()
    
    # Panel 3: Repeated sampling demonstration
    ax = axes[1, 0]
    
    n_experiments = 100
    freq_contains_true = 0
    bayes_contains_true = 0
    
    freq_intervals = []
    bayes_intervals = []
    
    for exp in range(n_experiments):
        # Generate new data
        exp_returns = np.random.normal(daily_sharpe * 0.02, 0.02, n_obs)
        exp_sharpe = (exp_returns.mean() / exp_returns.std()) * np.sqrt(252)
        
        # Frequentist CI
        exp_se = np.sqrt((1 + 0.5 * exp_sharpe**2) / n_obs) * np.sqrt(252)
        f_low = exp_sharpe - 1.96 * exp_se
        f_high = exp_sharpe + 1.96 * exp_se
        
        if f_low <= true_sharpe <= f_high:
            freq_contains_true += 1
        freq_intervals.append((f_low, f_high, f_low <= true_sharpe <= f_high))
        
        # Bayesian (simplified)
        prior_weight = 1 / (1 + n_obs/10)
        b_mean = (1 - prior_weight) * exp_sharpe + prior_weight * prior_mu
        b_std = np.sqrt((1 - prior_weight)**2 * exp_se**2 + prior_weight**2 * prior_sigma**2)
        b_low = b_mean - 1.96 * b_std
        b_high = b_mean + 1.96 * b_std
        
        if b_low <= true_sharpe <= b_high:
            bayes_contains_true += 1
        bayes_intervals.append((b_low, b_high, b_low <= true_sharpe <= b_high))
    
    # Plot first 30 intervals
    n_show = 30
    for i in range(n_show):
        color = 'green' if freq_intervals[i][2] else 'red'
        ax.plot([freq_intervals[i][0], freq_intervals[i][1]], [i, i], 
                color=color, linewidth=1.5, alpha=0.7)
    
    ax.axvline(true_sharpe, color='red', linestyle='--', linewidth=2)
    ax.set_xlabel('Sharpe Ratio', fontsize=12)
    ax.set_ylabel('Experiment', fontsize=12)
    ax.set_title(f'Frequentist CI Coverage: {freq_contains_true}% (Expected: 95%)', 
                 fontsize=14, fontweight='bold')
    
    # Panel 4: Interpretation summary
    ax = axes[1, 1]
    ax.axis('off')
    
    summary_text = f"""
    INTERPRETATION COMPARISON
    ========================
    
    FREQUENTIST 95% CI: [{freq_ci_low:.2f}, {freq_ci_high:.2f}]
    ---------------------------------------------------------
    "If we repeated this experiment many times, 95% of 
     the constructed intervals would contain the true value."
    
    ‚Ä¢ The interval is random (depends on data)
    ‚Ä¢ The parameter is fixed (but unknown)
    ‚Ä¢ This particular interval either contains Œ∏ or doesn't!
    
    
    BAYESIAN 95% CREDIBLE INTERVAL: [{eti_low:.2f}, {eti_high:.2f}]
    ---------------------------------------------------------
    "Given the observed data and our prior beliefs, there is
     a 95% probability that the true Sharpe ratio lies in 
     this interval."
    
    ‚Ä¢ The interval is determined by the data
    ‚Ä¢ The parameter is treated as random
    ‚Ä¢ Direct probability statement about Œ∏!
    
    
    KEY INSIGHT FOR TRADING
    -----------------------
    Sample Sharpe: {sample_sharpe:.2f}
    True Sharpe: {true_sharpe:.2f}
    
    The Bayesian interval is often narrower due to shrinkage
    toward the skeptical prior (0). This prevents 
    overconfidence from noisy small samples!
    """
    
    ax.text(0.05, 0.95, summary_text, transform=ax.transAxes,
            fontsize=10, verticalalignment='top', fontfamily='monospace',
            bbox=dict(boxstyle='round', facecolor='lightyellow', alpha=0.5))
    
    plt.tight_layout()
    plt.show()
    
    print(f"\nüìä COVERAGE RESULTS ({n_experiments} experiments):")
    print(f"   Frequentist CI coverage: {freq_contains_true}%")
    print(f"   Bayesian CI coverage: {bayes_contains_true}%")

credible_intervals_demo()

---

## 10. Bayesian Linear Regression for Price Prediction

### üìà **Why Bayesian Linear Regression?**

Standard OLS gives point estimates. Bayesian linear regression provides:
- **Posterior distributions** over all coefficients
- **Predictive distributions** with uncertainty
- **Natural regularization** via priors
- **Model comparison** via marginal likelihood

### **Mathematical Framework**

**Model:** $y = X\beta + \epsilon$ where $\epsilon \sim N(0, \sigma^2)$

**Prior on coefficients:** $\beta \sim N(\beta_0, \Sigma_0)$

**Posterior:** $\beta | y, X \sim N(\beta_n, \Sigma_n)$

Where:
- $\Sigma_n = (\Sigma_0^{-1} + \sigma^{-2} X^T X)^{-1}$
- $\beta_n = \Sigma_n (\Sigma_0^{-1} \beta_0 + \sigma^{-2} X^T y)$

### **Connection to Ridge Regression**

With prior $\beta \sim N(0, \tau^2 I)$:
- Posterior mean = Ridge estimate with $\lambda = \sigma^2/\tau^2$
- Bayesian interpretation: $\lambda$ controls prior strength

In [None]:
class BayesianLinearRegression:
    """
    Bayesian Linear Regression with conjugate Normal-Inverse-Gamma prior.
    """
    
    def __init__(self, prior_beta_mean=None, prior_beta_precision=None,
                 prior_sigma_alpha=1, prior_sigma_beta=1):
        """
        Initialize Bayesian Linear Regression.
        
        Parameters:
        -----------
        prior_beta_mean : array-like
            Prior mean for coefficients (default: zeros)
        prior_beta_precision : array-like
            Prior precision matrix for coefficients (default: weak)
        prior_sigma_alpha, prior_sigma_beta : float
            Inverse-Gamma prior parameters for noise variance
        """
        self.prior_beta_mean = prior_beta_mean
        self.prior_beta_precision = prior_beta_precision
        self.prior_sigma_alpha = prior_sigma_alpha
        self.prior_sigma_beta = prior_sigma_beta
        
        self.posterior_beta_mean = None
        self.posterior_beta_cov = None
        self.posterior_sigma_alpha = None
        self.posterior_sigma_beta = None
        self.n_features = None
        
    def fit(self, X, y, known_sigma=None):
        """
        Fit Bayesian linear regression.
        
        Parameters:
        -----------
        X : array-like, shape (n_samples, n_features)
            Feature matrix
        y : array-like, shape (n_samples,)
            Target vector
        known_sigma : float, optional
            If provided, use this as known noise std
        """
        X = np.asarray(X)
        y = np.asarray(y)
        
        n_samples, n_features = X.shape
        self.n_features = n_features
        
        # Set default priors if not specified
        if self.prior_beta_mean is None:
            self.prior_beta_mean = np.zeros(n_features)
        if self.prior_beta_precision is None:
            # Weak prior: precision = 0.01 * I
            self.prior_beta_precision = 0.01 * np.eye(n_features)
        
        prior_beta_mean = np.asarray(self.prior_beta_mean)
        prior_beta_precision = np.asarray(self.prior_beta_precision)
        
        # OLS estimate for sigma (if not known)
        if known_sigma is None:
            ols_beta = np.linalg.lstsq(X, y, rcond=None)[0]
            ols_residuals = y - X @ ols_beta
            sigma_squared_estimate = np.var(ols_residuals)
        else:
            sigma_squared_estimate = known_sigma ** 2
        
        # Posterior for beta (assuming known sigma for simplicity)
        data_precision = (1 / sigma_squared_estimate) * (X.T @ X)
        
        self.posterior_beta_cov = np.linalg.inv(prior_beta_precision + data_precision)
        self.posterior_beta_mean = self.posterior_beta_cov @ (
            prior_beta_precision @ prior_beta_mean + 
            (1 / sigma_squared_estimate) * X.T @ y
        )
        
        # Posterior for sigma (Inverse-Gamma update)
        self.posterior_sigma_alpha = self.prior_sigma_alpha + n_samples / 2
        residuals = y - X @ self.posterior_beta_mean
        self.posterior_sigma_beta = self.prior_sigma_beta + 0.5 * np.sum(residuals**2)
        
        return self
    
    def predict(self, X_new, return_std=True):
        """
        Make predictions with uncertainty.
        
        Parameters:
        -----------
        X_new : array-like
            New feature matrix
        return_std : bool
            Whether to return predictive std
            
        Returns:
        --------
        mean : array
            Predicted mean
        std : array (if return_std=True)
            Predictive standard deviation
        """
        X_new = np.asarray(X_new)
        
        # Predictive mean
        mean = X_new @ self.posterior_beta_mean
        
        if return_std:
            # Predictive variance = noise variance + coefficient uncertainty
            sigma_squared = self.posterior_sigma_beta / (self.posterior_sigma_alpha - 1)
            
            # Variance from coefficient uncertainty
            var_coef = np.sum((X_new @ self.posterior_beta_cov) * X_new, axis=1)
            
            # Total predictive variance
            var_total = sigma_squared + var_coef
            std = np.sqrt(var_total)
            
            return mean, std
        
        return mean
    
    def sample_coefficients(self, n_samples=1000):
        """
        Sample from posterior distribution of coefficients.
        """
        return np.random.multivariate_normal(
            self.posterior_beta_mean, 
            self.posterior_beta_cov, 
            size=n_samples
        )


def bayesian_regression_demo():
    """
    Demonstrate Bayesian linear regression for price prediction.
    """
    print("=" * 60)
    print("BAYESIAN LINEAR REGRESSION FOR PRICE PREDICTION")
    print("=" * 60)
    
    # Generate synthetic data: predict returns from factors
    np.random.seed(42)
    n_train = 100
    n_test = 30
    
    # True factor loadings
    true_beta = np.array([0.001, 0.5, -0.3, 0.2])  # intercept, momentum, value, size
    noise_sigma = 0.015
    
    # Generate factor data
    def generate_factors(n):
        momentum = np.random.normal(0, 0.02, n)
        value = np.random.normal(0, 0.03, n)
        size = np.random.normal(0, 0.01, n)
        return np.column_stack([np.ones(n), momentum, value, size])
    
    X_train = generate_factors(n_train)
    y_train = X_train @ true_beta + np.random.normal(0, noise_sigma, n_train)
    
    X_test = generate_factors(n_test)
    y_test = X_test @ true_beta + np.random.normal(0, noise_sigma, n_test)
    
    # Fit OLS
    ols_beta = np.linalg.lstsq(X_train, y_train, rcond=None)[0]
    ols_pred = X_test @ ols_beta
    
    # Fit Bayesian regression with skeptical prior
    # Prior: coefficients centered at 0 with moderate precision
    prior_precision = np.diag([0.1, 10, 10, 10])  # Weaker on intercept
    
    bayes_reg = BayesianLinearRegression(
        prior_beta_mean=np.zeros(4),
        prior_beta_precision=prior_precision
    )
    bayes_reg.fit(X_train, y_train)
    bayes_pred_mean, bayes_pred_std = bayes_reg.predict(X_test)
    
    # Visualization
    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
    
    # Panel 1: Coefficient comparison
    ax = axes[0, 0]
    
    labels = ['Intercept', 'Momentum', 'Value', 'Size']
    x_pos = np.arange(len(labels))
    width = 0.25
    
    ax.bar(x_pos - width, true_beta, width, label='True', color='green', alpha=0.7)
    ax.bar(x_pos, ols_beta, width, label='OLS', color='blue', alpha=0.7)
    ax.bar(x_pos + width, bayes_reg.posterior_beta_mean, width, label='Bayesian', color='red', alpha=0.7)
    
    # Add error bars for Bayesian
    bayes_std = np.sqrt(np.diag(bayes_reg.posterior_beta_cov))
    ax.errorbar(x_pos + width, bayes_reg.posterior_beta_mean, yerr=1.96*bayes_std,
                fmt='none', color='black', capsize=5)
    
    ax.set_xticks(x_pos)
    ax.set_xticklabels(labels)
    ax.set_ylabel('Coefficient Value', fontsize=12)
    ax.set_title('Coefficient Estimates', fontsize=14, fontweight='bold')
    ax.legend()
    ax.axhline(0, color='gray', linestyle='--', alpha=0.5)
    
    # Panel 2: Posterior distributions of coefficients
    ax = axes[0, 1]
    
    samples = bayes_reg.sample_coefficients(5000)
    colors = plt.cm.Set1(np.linspace(0, 1, 4))
    
    for i, (label, color) in enumerate(zip(labels[1:], colors[1:])):  # Skip intercept
        ax.hist(samples[:, i+1], bins=50, density=True, alpha=0.4, color=color, label=label)
        ax.axvline(true_beta[i+1], color=color, linestyle='--', linewidth=2)
    
    ax.set_xlabel('Coefficient Value', fontsize=12)
    ax.set_ylabel('Density', fontsize=12)
    ax.set_title('Posterior Distribution of Factor Loadings', fontsize=14, fontweight='bold')
    ax.legend()
    
    # Panel 3: Predictions with uncertainty
    ax = axes[1, 0]
    
    idx = np.argsort(y_test)
    
    ax.scatter(range(n_test), y_test[idx], color='black', s=50, label='Actual', zorder=3)
    ax.plot(range(n_test), ols_pred[idx], 'b-', linewidth=2, label='OLS', alpha=0.7)
    ax.plot(range(n_test), bayes_pred_mean[idx], 'r-', linewidth=2, label='Bayesian mean')
    ax.fill_between(range(n_test), 
                    bayes_pred_mean[idx] - 1.96*bayes_pred_std[idx],
                    bayes_pred_mean[idx] + 1.96*bayes_pred_std[idx],
                    alpha=0.3, color='red', label='95% Predictive Interval')
    
    ax.set_xlabel('Test Sample (sorted)', fontsize=12)
    ax.set_ylabel('Return', fontsize=12)
    ax.set_title('Predictions with Uncertainty', fontsize=14, fontweight='bold')
    ax.legend(fontsize=9)
    
    # Panel 4: Model evaluation
    ax = axes[1, 1]
    
    # Compute metrics
    ols_mse = np.mean((y_test - ols_pred)**2)
    bayes_mse = np.mean((y_test - bayes_pred_mean)**2)
    
    # Coverage: proportion of actual values within 95% PI
    coverage = np.mean((y_test >= bayes_pred_mean - 1.96*bayes_pred_std) & 
                       (y_test <= bayes_pred_mean + 1.96*bayes_pred_std))
    
    # Shrinkage
    shrinkage = np.mean(np.abs(ols_beta - bayes_reg.posterior_beta_mean) / 
                        (np.abs(ols_beta) + 1e-10))
    
    metrics_text = f"""
    MODEL EVALUATION
    ================
    
    PREDICTION ACCURACY
    -------------------
    OLS Test MSE:      {ols_mse:.8f}
    Bayesian Test MSE: {bayes_mse:.8f}
    Improvement:       {(ols_mse - bayes_mse)/ols_mse * 100:.1f}%
    
    UNCERTAINTY CALIBRATION
    -----------------------
    95% Predictive Interval Coverage: {coverage*100:.1f}%
    (Should be ~95% if well-calibrated)
    
    REGULARIZATION (SHRINKAGE)
    --------------------------
    Average coefficient shrinkage: {shrinkage*100:.1f}%
    
    COEFFICIENT SUMMARY
    -------------------
    """
    
    for i, label in enumerate(labels):
        metrics_text += f"    {label}: True={true_beta[i]:.4f}, OLS={ols_beta[i]:.4f}, "
        metrics_text += f"Bayes={bayes_reg.posterior_beta_mean[i]:.4f}¬±{bayes_std[i]:.4f}\n"
    
    ax.axis('off')
    ax.text(0.05, 0.95, metrics_text, transform=ax.transAxes,
            fontsize=10, verticalalignment='top', fontfamily='monospace',
            bbox=dict(boxstyle='round', facecolor='lightgray', alpha=0.3))
    
    plt.tight_layout()
    plt.show()
    
    print("\nüéØ KEY INSIGHTS:")
    print("   1. Bayesian estimates are shrunk toward prior (0)")
    print("   2. Predictive intervals capture uncertainty in both coefficients AND noise")
    print("   3. Regularization prevents overfitting on small samples")

bayesian_regression_demo()

---

## 11. Markov Chain Monte Carlo (MCMC) Sampling

### üé≤ **Why MCMC?**

For complex models, we cannot compute posteriors analytically. MCMC algorithms generate samples from the posterior distribution by constructing a Markov chain whose stationary distribution is the target posterior.

### **Key MCMC Algorithms**

| Algorithm | Mechanism | Pros | Cons |
|-----------|-----------|------|------|
| **Metropolis-Hastings** | Proposal + accept/reject | Simple, general | Can be slow, tuning required |
| **Gibbs Sampling** | Sample each parameter conditionally | No rejection, good for conjugate | Requires conditional distributions |
| **Hamiltonian MC** | Uses gradient information | Efficient in high dimensions | Requires differentiable model |
| **NUTS** | Auto-tuned HMC | Best general-purpose | Computationally intensive |

### **Metropolis-Hastings Algorithm**

```
1. Start at Œ∏‚ÇÄ
2. For t = 1, 2, ..., T:
   a. Propose Œ∏* ~ q(Œ∏*|Œ∏‚Çú‚Çã‚ÇÅ)
   b. Compute acceptance ratio:
      Œ± = min(1, [P(Œ∏*|D) √ó q(Œ∏‚Çú‚Çã‚ÇÅ|Œ∏*)] / [P(Œ∏‚Çú‚Çã‚ÇÅ|D) √ó q(Œ∏*|Œ∏‚Çú‚Çã‚ÇÅ)])
   c. Accept Œ∏‚Çú = Œ∏* with probability Œ±
      Otherwise Œ∏‚Çú = Œ∏‚Çú‚Çã‚ÇÅ
3. Discard burn-in, return samples
```

### **Convergence Diagnostics**

- **Trace plots**: Visual inspection for stationarity
- **R-hat (Gelman-Rubin)**: Compare within-chain and between-chain variance (target < 1.01)
- **Effective Sample Size (ESS)**: Account for autocorrelation (target > 400)