# XGBoost: Point Prediction vs Distribution Prediction

This notebook demonstrates the core thesis of `temporalpdf`:

**Pipeline 1**: XGBoost → single number (expected return) → trade if positive

**Pipeline 2**: XGBoost → 4 distribution parameters → VaR/CVaR/Kelly → risk-filtered trades

Same features, same model architecture. The difference: **distribution prediction gives you uncertainty quantification**, enabling sophisticated risk management that point predictions cannot provide.

In [None]:
import sys
from pathlib import Path
sys.path.insert(0, str(Path.cwd().parent / "src"))

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.multioutput import MultiOutputRegressor
import temporalpdf as tpdf

%matplotlib inline
plt.rcParams['figure.figsize'] = (14, 5)

COST_BPS = 2  # Transaction cost in basis points

## Load Data & Create Features

S&P 500 daily returns from yfinance.

In [None]:
df = pd.read_csv(Path.cwd().parent / "data" / "equity_returns.csv")
returns = df["return_pct"].values
print(f"S&P 500: {len(returns):,} days")

# Create features from lookback window
lookback = 20
X, y = [], []
for i in range(lookback, len(returns) - 1):
    window = returns[i-lookback:i]
    X.append([np.mean(window), np.std(window), window[-1], window[-2],
              np.min(window), np.max(window), np.sum(window > 0) / lookback])
    y.append(returns[i + 1])
X, y = np.array(X), np.array(y)

# Train/test split
split = int(len(X) * 0.8)
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]
print(f"Train: {len(y_train):,}, Test: {len(y_test):,}")

## Pipeline 1: XGBoost → Point Prediction

Model predicts next-day return as a single number. Trading rule: go long if prediction > 0.

**What you get**: A point estimate E[X].

**What you're missing**: Any sense of how uncertain that estimate is.

In [None]:
model_point = GradientBoostingRegressor(n_estimators=100, max_depth=3, random_state=42)
model_point.fit(X_train, y_train)
pred_point = model_point.predict(X_test)

decisions_p1 = pred_point > 0
print(f"Pipeline 1: {np.sum(decisions_p1)} / {len(y_test)} trades ({np.mean(decisions_p1):.1%})")

## Pipeline 2: XGBoost → Distribution Parameters

Model predicts NIG distribution parameters (mu, delta, alpha, beta).

**What you get**: A full probability distribution over possible outcomes.

This unlocks the entire `temporalpdf` decision toolkit:
- `tpdf.var()` - Value at Risk (worst-case loss at confidence level)
- `tpdf.cvar()` - Conditional VaR / Expected Shortfall (expected loss in tail)
- `tpdf.kelly_fraction()` - Optimal position sizing
- `tpdf.prob_greater_than()` - P(return > threshold)
- `tpdf.prob_less_than()` - P(return < threshold)

In [None]:
# Create distribution targets: fit NIG to each training window
print("Fitting NIG distributions to training windows...")
param_targets = []
for i in range(lookback, lookback + len(y_train)):
    window = returns[i-lookback:i]
    params = tpdf.fit_nig(window)
    # Store in transformed space (log for positive params, arctanh for bounded)
    beta_ratio = np.clip(params.beta / params.alpha, -0.99, 0.99)
    param_targets.append([
        params.mu,
        np.log(params.delta),
        np.log(params.alpha),
        np.arctanh(beta_ratio)
    ])
param_targets = np.array(param_targets)
print(f"Created {len(param_targets)} distribution targets")

In [None]:
# Train model to predict distribution parameters
model_dist = MultiOutputRegressor(
    GradientBoostingRegressor(n_estimators=100, max_depth=3, random_state=42)
)
model_dist.fit(X_train, param_targets)
pred_params_raw = model_dist.predict(X_test)
print("Distribution model trained")

In [None]:
# Convert predicted parameters to NIGParameters and compute risk metrics
nig = tpdf.NIG()

decisions_p2 = []
var_estimates = []
cvar_estimates = []
kelly_estimates = []
prob_positive = []
prob_big_loss = []  # P(loss > 2%)

for mu, log_delta, log_alpha, beta_raw in pred_params_raw:
    # Transform back to NIG parameter space
    delta = max(np.exp(log_delta), 0.01)
    alpha = max(np.exp(log_alpha), 0.1)
    beta = np.clip(alpha * np.tanh(beta_raw), -alpha + 0.01, alpha - 0.01)
    
    params = tpdf.NIGParameters(mu=mu, delta=delta, alpha=alpha, beta=beta)
    
    # Use library functions for risk metrics
    var_5 = tpdf.var(nig, params, alpha=0.05)     # 95% VaR (positive = loss)
    cvar_5 = tpdf.cvar(nig, params, alpha=0.05)   # Expected Shortfall
    kelly = tpdf.kelly_fraction(nig, params)       # Optimal position size
    p_pos = tpdf.prob_greater_than(nig, params, 0.0)  # P(return > 0)
    p_loss = tpdf.prob_less_than(nig, params, -2.0)   # P(return < -2%)
    
    var_estimates.append(var_5)
    cvar_estimates.append(cvar_5)
    kelly_estimates.append(kelly)
    prob_positive.append(p_pos)
    prob_big_loss.append(p_loss)
    
    # Decision rule: E[return] > 0 AND VaR(5%) < 2%
    expected = nig.mean(0.0, params)
    decisions_p2.append(expected > 0 and var_5 < 2.0)

decisions_p2 = np.array(decisions_p2)
var_estimates = np.array(var_estimates)
cvar_estimates = np.array(cvar_estimates)
kelly_estimates = np.array(kelly_estimates)
prob_positive = np.array(prob_positive)
prob_big_loss = np.array(prob_big_loss)

print(f"Pipeline 2: {np.sum(decisions_p2)} / {len(y_test)} trades ({np.mean(decisions_p2):.1%})")

## Results Comparison

In [None]:
def sharpe(returns):
    if len(returns) == 0 or np.std(returns) == 0:
        return 0
    return np.mean(returns) / np.std(returns) * np.sqrt(252)

def bootstrap_ci(returns, n=1000):
    if len(returns) == 0:
        return (0, 0)
    rng = np.random.default_rng(42)
    sharpes = [sharpe(rng.choice(returns, len(returns), replace=True)) for _ in range(n)]
    return np.percentile(sharpes, 2.5), np.percentile(sharpes, 97.5)

cost = COST_BPS / 100
strat_bh = y_test
strat_p1 = np.where(decisions_p1, y_test - cost, 0)
strat_p2 = np.where(decisions_p2, y_test - cost, 0)

sharpe_bh, ci_bh = sharpe(strat_bh), bootstrap_ci(strat_bh)
sharpe_p1, ci_p1 = sharpe(strat_p1), bootstrap_ci(strat_p1)
sharpe_p2, ci_p2 = sharpe(strat_p2), bootstrap_ci(strat_p2)

print("STRATEGY COMPARISON")
print("=" * 70)
print(f"{'Strategy':<25} {'Trades':>8} {'PnL':>10} {'Sharpe':>10} {'95% CI':>16}")
print("-" * 70)
print(f"{'Buy & Hold':<25} {len(y_test):>8} {np.sum(strat_bh):>+9.1f}% {sharpe_bh:>10.2f} [{ci_bh[0]:.2f}, {ci_bh[1]:.2f}]")
print(f"{'XGBoost -> Point':<25} {np.sum(decisions_p1):>8} {np.sum(strat_p1):>+9.1f}% {sharpe_p1:>10.2f} [{ci_p1[0]:.2f}, {ci_p1[1]:.2f}]")
print(f"{'XGBoost -> Distribution':<25} {np.sum(decisions_p2):>8} {np.sum(strat_p2):>+9.1f}% {sharpe_p2:>10.2f} [{ci_p2[0]:.2f}, {ci_p2[1]:.2f}]")
print("-" * 70)
if sharpe_p1 != 0:
    print(f"\nDistribution vs Point: {(sharpe_p2/sharpe_p1 - 1)*100:+.0f}% Sharpe improvement")

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Cumulative PnL
ax = axes[0]
days = np.arange(len(y_test))
ax.plot(days, np.cumsum(strat_bh), 'k-', alpha=0.5, lw=1.5, label=f'Buy & Hold ({sharpe_bh:.2f})')
ax.plot(days, np.cumsum(strat_p1), 'r-', lw=2, label=f'Point Prediction ({sharpe_p1:.2f})')
ax.plot(days, np.cumsum(strat_p2), 'b-', lw=2, label=f'Distribution ({sharpe_p2:.2f})')
ax.axhline(0, color='gray', ls=':', lw=1)
ax.set_xlabel('Day')
ax.set_ylabel('Cumulative PnL (%)')
ax.set_title('Cumulative Returns (Sharpe in legend)', fontweight='bold')
ax.legend()
ax.grid(True, alpha=0.3)

# Sharpe comparison
ax = axes[1]
x = [0, 1, 2]
bars = ax.bar(x, [sharpe_bh, sharpe_p1, sharpe_p2], color=['gray', 'red', 'blue'], alpha=0.7)
ax.errorbar(x, [sharpe_bh, sharpe_p1, sharpe_p2],
            yerr=[[sharpe_bh-ci_bh[0], sharpe_p1-ci_p1[0], sharpe_p2-ci_p2[0]],
                  [ci_bh[1]-sharpe_bh, ci_p1[1]-sharpe_p1, ci_p2[1]-sharpe_p2]],
            fmt='none', color='black', capsize=5)
ax.set_xticks(x)
ax.set_xticklabels(['Buy & Hold', 'Point\nPrediction', 'Distribution\nPrediction'])
ax.set_ylabel('Annualized Sharpe')
ax.set_title('Risk-Adjusted Returns (95% CI)', fontweight='bold')
ax.grid(True, alpha=0.3, axis='y')
for i, s in enumerate([sharpe_bh, sharpe_p1, sharpe_p2]):
    ax.text(i, s + 0.15, f'{s:.2f}', ha='center', fontweight='bold')

plt.tight_layout()
plt.savefig('comparison.png', dpi=150, bbox_inches='tight', facecolor='white')
plt.show()

## Why Distribution Prediction Wins: The VaR Filter

The key insight: Pipeline 1 sees "positive expected return" and trades.

Pipeline 2 sees the full distribution and asks:
- What's my VaR? (How much could I lose?)
- What's my CVaR? (If things go wrong, how bad?)
- What's the probability of a large loss?

It rejects trades where expected return is positive but risk is unacceptable.

In [None]:
# Days where Pipeline 1 trades but Pipeline 2 doesn't (rejected by VaR filter)
rejected = decisions_p1 & ~decisions_p2
kept = decisions_p2

print("VaR FILTER ANALYSIS")
print("=" * 60)
print(f"\nDays rejected by VaR filter: {np.sum(rejected)}")
if np.sum(rejected) > 0:
    print(f"  Mean actual return: {np.mean(y_test[rejected]):+.3f}%")
    print(f"  Std actual return:  {np.std(y_test[rejected]):.3f}%")
    print(f"  Mean VaR estimate:  {np.mean(var_estimates[rejected]):.3f}%")
    print(f"  Mean CVaR estimate: {np.mean(cvar_estimates[rejected]):.3f}%")

print(f"\nDays kept by Pipeline 2: {np.sum(kept)}")
if np.sum(kept) > 0:
    print(f"  Mean actual return: {np.mean(y_test[kept]):+.3f}%")
    print(f"  Std actual return:  {np.std(y_test[kept]):.3f}%")
    print(f"  Mean VaR estimate:  {np.mean(var_estimates[kept]):.3f}%")
    print(f"  Mean CVaR estimate: {np.mean(cvar_estimates[kept]):.3f}%")

print("\n" + "-" * 60)
print("The VaR filter removes high-volatility days with worse risk-adjusted returns.")

## Rich Risk Metrics from Distribution Prediction

Distribution prediction unlocks queries that are impossible with point predictions:

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# VaR distribution
ax = axes[0, 0]
ax.hist(var_estimates, bins=50, alpha=0.7, color='red', edgecolor='black')
ax.axvline(2.0, color='black', ls='--', lw=2, label='VaR threshold (2%)')
ax.set_xlabel('5% VaR (%)')
ax.set_ylabel('Frequency')
ax.set_title('Distribution of Predicted VaR', fontweight='bold')
ax.legend()
ax.grid(True, alpha=0.3)

# CVaR vs VaR (CVaR is always >= VaR)
ax = axes[0, 1]
ax.scatter(var_estimates, cvar_estimates, alpha=0.3, s=10)
ax.plot([0, max(var_estimates)], [0, max(var_estimates)], 'k--', label='CVaR = VaR')
ax.set_xlabel('VaR (%)')
ax.set_ylabel('CVaR (%)')
ax.set_title('CVaR vs VaR (CVaR captures tail severity)', fontweight='bold')
ax.legend()
ax.grid(True, alpha=0.3)

# Kelly fraction distribution
ax = axes[1, 0]
kelly_clipped = np.clip(kelly_estimates, -5, 5)  # Clip for visualization
ax.hist(kelly_clipped, bins=50, alpha=0.7, color='green', edgecolor='black')
ax.axvline(0, color='black', ls='-', lw=1)
ax.axvline(0.5, color='blue', ls='--', lw=2, label='Half-Kelly example')
ax.set_xlabel('Kelly Fraction')
ax.set_ylabel('Frequency')
ax.set_title('Optimal Position Sizing (Kelly Criterion)', fontweight='bold')
ax.legend()
ax.grid(True, alpha=0.3)

# P(return > 0) vs actual outcome
ax = axes[1, 1]
correct = (prob_positive > 0.5) == (y_test > 0)
ax.scatter(prob_positive[correct], y_test[correct], alpha=0.3, s=10, c='green', label='Correct')
ax.scatter(prob_positive[~correct], y_test[~correct], alpha=0.3, s=10, c='red', label='Wrong')
ax.axhline(0, color='black', ls='-', lw=1)
ax.axvline(0.5, color='black', ls='--', lw=1)
ax.set_xlabel('P(return > 0)')
ax.set_ylabel('Actual Return (%)')
ax.set_title('Probability Calibration', fontweight='bold')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('risk_metrics.png', dpi=150, bbox_inches='tight', facecolor='white')
plt.show()

## Alternative Decision Rules

With distribution prediction, we can design sophisticated decision rules.
Let's compare different risk thresholds:

In [None]:
# Compare different VaR thresholds
thresholds = [1.0, 1.5, 2.0, 2.5, 3.0, 4.0]

print("SENSITIVITY TO VAR THRESHOLD")
print("=" * 70)
print(f"{'VaR Threshold':>15} {'Trades':>10} {'PnL':>10} {'Sharpe':>10}")
print("-" * 70)

# Recompute expected returns using nig.mean()
expected_returns = []
for mu, log_delta, log_alpha, beta_raw in pred_params_raw:
    delta = max(np.exp(log_delta), 0.01)
    alpha = max(np.exp(log_alpha), 0.1)
    beta = np.clip(alpha * np.tanh(beta_raw), -alpha + 0.01, alpha - 0.01)
    params = tpdf.NIGParameters(mu=mu, delta=delta, alpha=alpha, beta=beta)
    expected_returns.append(nig.mean(0.0, params))
expected_returns = np.array(expected_returns)

for thresh in thresholds:
    decisions = (expected_returns > 0) & (var_estimates < thresh)
    strat = np.where(decisions, y_test - cost, 0)
    s = sharpe(strat)
    print(f"{thresh:>14.1f}% {np.sum(decisions):>10} {np.sum(strat):>+9.1f}% {s:>10.2f}")

print("-" * 70)
print("Note: Tighter thresholds = fewer trades but better risk-adjusted returns (up to a point)")

## Kelly-Weighted Returns

Instead of binary trade/no-trade, use Kelly fraction for position sizing:

In [None]:
# Kelly-weighted strategy: size positions by Kelly fraction
# Cap at 1.0 (no leverage) and use half-Kelly for safety
kelly_weights = np.clip(kelly_estimates * 0.5, 0, 1)  # Half-Kelly, long only, max 100%

# Only trade when VaR is acceptable
var_filter = var_estimates < 2.0
kelly_weights_filtered = np.where(var_filter, kelly_weights, 0)

strat_kelly = kelly_weights_filtered * (y_test - cost)
sharpe_kelly = sharpe(strat_kelly)
ci_kelly = bootstrap_ci(strat_kelly)

print("KELLY-WEIGHTED STRATEGY")
print("=" * 60)
print(f"Half-Kelly, VaR<2% filter, long-only, max 100% position")
print(f"\nTrades (non-zero weight): {np.sum(kelly_weights_filtered > 0)}")
print(f"Mean position size: {np.mean(kelly_weights_filtered[kelly_weights_filtered > 0]):.1%}")
print(f"PnL: {np.sum(strat_kelly):+.1f}%")
print(f"Sharpe: {sharpe_kelly:.2f} [{ci_kelly[0]:.2f}, {ci_kelly[1]:.2f}]")

## Summary

| Pipeline | Output | Decision Rule | Risk Metrics |
|----------|--------|---------------|-------------|
| Point | E[return] | Trade if E[X] > 0 | None |
| Distribution | (μ, δ, α, β) | Trade if E[X] > 0 AND VaR < 2% | VaR, CVaR, Kelly, P(loss) |

**Key insight**: Predicting a distribution instead of a point gives you:

1. **VaR** - Maximum expected loss at confidence level (tpdf.var)
2. **CVaR** - Expected loss given you're in the tail (tpdf.cvar)  
3. **Kelly** - Optimal position sizing based on edge and variance (tpdf.kelly_fraction)
4. **Probabilities** - P(return > x), P(loss > y) for any threshold (tpdf.prob_greater_than, tpdf.prob_less_than)

This enables risk-aware trading that point predictions cannot achieve.

## Systematic Validation: Barrier Probability Experiments

The trading example above is one use case. To validate `temporalpdf` more broadly, we ran a systematic comparison across **12 configurations**:

- **Horizons**: 10, 20, 30 days
- **Barriers**: 3%, 5% (probability that cumulative return exceeds barrier)
- **Features**: 8 (basic), 32 (extended with tail/autocorrelation features)

**Task**: Predict P(max cumulative return over horizon ≥ barrier)

| Pipeline | Method |
|----------|--------|
| P1 | XGBoost Classifier → P(hit) directly |
| P2 | XGBoost → Distribution Parameters → Monte Carlo → P(hit) |

**Evaluation**: Brier Score (lower = better probabilistic accuracy)