# Time Series Fundamentals for sklearn Practitioners

## The Missing Bridge: Why Everything You Know About ML Validation is Wrong for Time Series

---

## 🎯 Who This Notebook is For

You should read this notebook if:
- ✅ You're comfortable with sklearn's `KFold`, `train_test_split`, and metrics like MAE/RMSE
- ✅ You've built classification or regression models on tabular data
- ❌ You've never worked with time series (or tried and got weird results)
- ❌ You don't understand why "shuffling" is suddenly dangerous

**Time required:** ~30 minutes

**After this notebook, you will understand:**
1. WHY time series violates standard ML assumptions
2. What autocorrelation means and how to read ACF plots
3. The three types of data leakage unique to time series
4. How to set up your first correct temporal validation

---

## Section 1: What You Know vs What Changes

Let's start with a clear comparison. Everything in the left column is what you learned for standard ML. The right column shows what's different for time series—and **why**.

| Standard ML (Classification/Regression) | Time Series | Why It's Different |
|----------------------------------------|-------------|-------------------|
| **Data points are independent** | Data points are **dependent** | Today's stock price ≈ yesterday's |
| Each row = different entity (Customer A, B, C) | Each row = **same entity** at different times | All rows are "Customer A" over time |
| `shuffle=True` protects against ordering bias | `shuffle=True` **creates fake results** | Shuffling lets future data train past |
| KFold randomly splits data | KFold is **dangerous** | Random split = "time travel" |
| MAE tells you model quality | MAE can be **meaningless** | Trivial baselines may achieve tiny MAE |
| 80% test accuracy = good model | Need to compare to **naive baseline** | Persistence baseline sets the floor |

### The Core Problem: i.i.d. Assumption

sklearn assumes your data is **i.i.d.** (independent and identically distributed):
- **Independent**: Knowing row 1 tells you nothing about row 2
- **Identically distributed**: All rows come from the same distribution

Time series **violates independence**:
- Knowing today's temperature tells you a LOT about tomorrow's temperature
- Knowing this week's stock price tells you a LOT about next week's
- This correlation is called **autocorrelation**

In [None]:
# Setup: imports and configuration
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import KFold
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_absolute_error

# Set random seed for reproducibility
np.random.seed(42)
plt.style.use('seaborn-v0_8-whitegrid')

print("Setup complete!")

---

## Section 2: Understanding Autocorrelation (The Key Concept)

**Autocorrelation** measures how similar a time series is to itself at different time lags.

- **ACF(1)** = correlation between y[t] and y[t-1] (consecutive observations)
- **ACF(2)** = correlation between y[t] and y[t-2] (observations 2 steps apart)
- And so on...

### Intuitive Interpretation

| ACF(1) Value | What It Means | Example |
|--------------|---------------|----------|
| **0.99** | Tomorrow ≈ today (very sticky) | Interest rates, unemployment |
| **0.90** | Strong persistence | Stock volatility, temperature |
| **0.50** | Moderate persistence | Some economic indicators |
| **0.00** | No autocorrelation (i.i.d.!) | White noise, shuffled data |
| **-0.50** | Negative autocorrelation | Alternating patterns |

### Why This Matters for Validation

When ACF is high:
1. Nearby observations are nearly identical
2. Shuffling spreads these "twins" across train/test
3. The model sees answers (future data) disguised as training data
4. Validation metrics look amazing... but they're fake

In [None]:
def generate_ar1(n=500, phi=0.9, sigma=1.0, seed=42):
    """
    Generate AR(1) process: y[t] = phi * y[t-1] + noise
    
    The parameter 'phi' controls autocorrelation:
    - phi = 0.0: white noise (no autocorrelation)
    - phi = 0.5: moderate autocorrelation
    - phi = 0.9: high autocorrelation (sticky)
    - phi = 0.99: very high (like interest rates)
    
    ACF(1) ≈ phi for large samples.
    """
    rng = np.random.default_rng(seed)
    y = np.zeros(n)
    # Start from stationary distribution
    y[0] = rng.normal(0, sigma / np.sqrt(1 - phi**2)) if phi < 1 else 0
    for t in range(1, n):
        y[t] = phi * y[t-1] + sigma * rng.normal()
    return y

def compute_acf(series, max_lag=20):
    """Compute autocorrelation function (ACF) for given lags."""
    n = len(series)
    mean = np.mean(series)
    var = np.var(series)
    acf = []
    for lag in range(max_lag + 1):
        if lag == 0:
            acf.append(1.0)
        else:
            cov = np.mean((series[lag:] - mean) * (series[:-lag] - mean))
            acf.append(cov / var)
    return np.array(acf)

# Generate series with different persistence levels
series_low = generate_ar1(n=500, phi=0.3, seed=42)
series_medium = generate_ar1(n=500, phi=0.7, seed=42)
series_high = generate_ar1(n=500, phi=0.95, seed=42)

print("Generated three series with different autocorrelation levels:")
print(f"  Low persistence (φ=0.3):    ACF(1) = {np.corrcoef(series_low[1:], series_low[:-1])[0,1]:.3f}")
print(f"  Medium persistence (φ=0.7): ACF(1) = {np.corrcoef(series_medium[1:], series_medium[:-1])[0,1]:.3f}")
print(f"  High persistence (φ=0.95):  ACF(1) = {np.corrcoef(series_high[1:], series_high[:-1])[0,1]:.3f}")

In [None]:
# Visualize: What autocorrelation LOOKS like
fig, axes = plt.subplots(2, 3, figsize=(15, 8))

series_list = [series_low, series_medium, series_high]
phi_list = [0.3, 0.7, 0.95]
titles = ['Low Persistence (φ=0.3)', 'Medium Persistence (φ=0.7)', 'High Persistence (φ=0.95)']

# Top row: Time series plots
for i, (series, title) in enumerate(zip(series_list, titles)):
    ax = axes[0, i]
    ax.plot(series[:100], 'steelblue', linewidth=1.5)
    ax.set_title(title, fontsize=12, fontweight='bold')
    ax.set_xlabel('Time')
    ax.set_ylabel('Value')
    if i == 0:
        ax.annotate('Rapid changes\n(easy to model)', xy=(50, 0), fontsize=9, 
                   bbox=dict(boxstyle='round', facecolor='lightgreen', alpha=0.7))
    elif i == 2:
        ax.annotate('Slow drift\n(hard to beat naive)', xy=(50, 0), fontsize=9,
                   bbox=dict(boxstyle='round', facecolor='lightyellow', alpha=0.7))

# Bottom row: ACF plots
for i, (series, phi) in enumerate(zip(series_list, phi_list)):
    ax = axes[1, i]
    acf = compute_acf(series, max_lag=15)
    lags = np.arange(len(acf))
    
    # Bar plot for ACF
    colors = ['steelblue' if a > 0 else 'salmon' for a in acf]
    ax.bar(lags, acf, color=colors, edgecolor='black', linewidth=0.5)
    ax.axhline(y=0, color='black', linewidth=0.5)
    
    # Significance bounds (approximate 95% CI)
    sig_bound = 1.96 / np.sqrt(len(series))
    ax.axhline(y=sig_bound, color='red', linestyle='--', alpha=0.7)
    ax.axhline(y=-sig_bound, color='red', linestyle='--', alpha=0.7)
    
    ax.set_xlabel('Lag')
    ax.set_ylabel('ACF')
    ax.set_title(f'Autocorrelation Function', fontsize=11)
    ax.set_ylim(-0.3, 1.1)

plt.tight_layout()
plt.suptitle('Understanding Autocorrelation: How "Sticky" is Your Data?', 
             fontsize=14, fontweight='bold', y=1.02)
plt.show()

print("\n★ Key Insight: Higher persistence = slower ACF decay = harder to beat naive predictions")
print("  - φ=0.3: ACF drops quickly → models CAN add value")
print("  - φ=0.95: ACF stays high → naive forecast is almost unbeatable")

In [None]:
# Demonstrate: Scatter plot shows WHY high ACF is a problem
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

for i, (series, phi, title) in enumerate(zip(series_list, phi_list, titles)):
    ax = axes[i]
    ax.scatter(series[:-1], series[1:], alpha=0.5, s=20)
    
    # Add correlation line
    z = np.polyfit(series[:-1], series[1:], 1)
    p = np.poly1d(z)
    x_line = np.linspace(series.min(), series.max(), 100)
    ax.plot(x_line, p(x_line), 'r-', linewidth=2, label=f'r = {phi:.2f}')
    
    ax.set_xlabel('y[t]', fontsize=11)
    ax.set_ylabel('y[t+1]', fontsize=11)
    ax.set_title(f'{title}\nACF(1) = {np.corrcoef(series[1:], series[:-1])[0,1]:.3f}', fontsize=11)
    ax.legend(loc='upper left')

plt.tight_layout()
plt.suptitle('Consecutive Values: y[t] vs y[t+1]', fontsize=14, fontweight='bold', y=1.02)
plt.show()

print("\n★ Key Insight: With high ACF, y[t+1] ≈ y[t]")
print("  This means 'predict yesterday' is a very strong baseline!")

---

## Section 3: Why Random Shuffling Creates Fake Results

Now we'll see the **exact mechanism** by which shuffling breaks validation.

### The Setup
- We have 500 time points in a series
- We want to predict y[t] from lag features: y[t-1], y[t-2], etc.
- The series has high autocorrelation (φ=0.9)

### What KFold Does (Simulation)

In [None]:
# Create high-persistence series
series = generate_ar1(n=500, phi=0.9, sigma=1.0, seed=42)

# Create lag features
def create_lag_features(series, n_lags=3):
    """Create lag features: y[t-1], y[t-2], ..., y[t-n_lags]"""
    n = len(series)
    X = np.column_stack([
        np.concatenate([[np.nan]*lag, series[:-lag]]) 
        for lag in range(1, n_lags + 1)
    ])
    valid = ~np.isnan(X).any(axis=1)
    return X[valid], series[valid], np.where(valid)[0]

X, y, valid_indices = create_lag_features(series, n_lags=3)
print(f"Features shape: {X.shape}")
print(f"Feature columns: y[t-1], y[t-2], y[t-3]")
print(f"Target: y[t]")

In [None]:
# THE PROBLEM: Visualize what KFold does
kfold = KFold(n_splits=5, shuffle=True, random_state=42)

fig, ax = plt.subplots(figsize=(14, 6))

# Plot the series
ax.plot(valid_indices, y, 'gray', linewidth=1, alpha=0.5, label='Full series')

# Show one fold's train/test split
train_idx, test_idx = list(kfold.split(X))[0]

# Highlight where test points came from in time
test_times = valid_indices[test_idx]
train_times = valid_indices[train_idx]

ax.scatter(test_times, y[test_idx], c='red', s=30, alpha=0.8, label='Test set (Fold 1)', zorder=5)

# Show a few "problematic" test points and their neighbors
# Find test points that have very close training neighbors
problems = []
for ti, t_time in enumerate(test_times[:10]):
    neighbors = train_times[(train_times >= t_time - 5) & (train_times <= t_time + 5)]
    if len(neighbors) > 0:
        problems.append((t_time, ti))

# Highlight problems
for t_time, ti in problems[:5]:
    ax.annotate('', xy=(t_time, y[test_idx[ti]]), 
               xytext=(t_time, y[test_idx[ti]] + 1.5),
               arrowprops=dict(arrowstyle='->', color='red', lw=2))

ax.set_xlabel('Time Index', fontsize=12)
ax.set_ylabel('Value', fontsize=12)
ax.set_title('KFold SCATTERS Test Points Throughout Time\n(Red dots are spread across all time periods)', 
            fontsize=14, fontweight='bold', color='red')
ax.legend(loc='upper right')

# Add annotation
ax.annotate('Test points at t=50, t=60, t=75...\nall have training points at\nt=49, t=59, t=74!\n\nModel sees "answers" in training.',
           xy=(250, y.min()), fontsize=11, color='red', fontweight='bold',
           bbox=dict(boxstyle='round', facecolor='lightyellow', edgecolor='red', alpha=0.9))

plt.tight_layout()
plt.show()

print("\n★ The Problem: Test point at t=100 has y[t-1]=y[99] as a feature.")
print("  But with shuffled KFold, y[101] might be in the TRAINING set!")
print("  Since y[101] ≈ y[100] (high ACF), the model 'knows' the answer.")

In [None]:
# Quantify the problem: How much overlap exists?
from sklearn.model_selection import cross_val_score

model = Ridge(alpha=1.0)

# WRONG: KFold with shuffle
kfold_shuffled = KFold(n_splits=5, shuffle=True, random_state=42)
kfold_scores = -cross_val_score(model, X, y, cv=kfold_shuffled, scoring='neg_mean_absolute_error')

# Also WRONG: KFold without shuffle (still mixes time)
kfold_no_shuffle = KFold(n_splits=5, shuffle=False)
kfold_ns_scores = -cross_val_score(model, X, y, cv=kfold_no_shuffle, scoring='neg_mean_absolute_error')

# CORRECT: Forward-only split (simple version)
# Train on first 80%, test on last 20%
train_size = int(0.8 * len(y))
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]

model.fit(X_train, y_train)
preds = model.predict(X_test)
forward_mae = mean_absolute_error(y_test, preds)

print("THE SHOCKING COMPARISON")
print("=" * 55)
print(f"KFold (shuffled) MAE:     {np.mean(kfold_scores):.4f}  <- Looks great!")
print(f"KFold (no shuffle) MAE:   {np.mean(kfold_ns_scores):.4f}  <- Still wrong!")
print(f"Forward split MAE:        {forward_mae:.4f}  <- REALITY")
print()
print(f"KFold overestimates performance by {(forward_mae - np.mean(kfold_scores)) / forward_mae * 100:.1f}%")
print()
print("Both KFold variants are wrong because fold 3's test set")
print("contains data that appears BEFORE fold 1's test set!")

### Why Even `shuffle=False` is Wrong

You might think: "I'll just set `shuffle=False`!"

**This is still wrong.** Here's why:

```
KFold(n_splits=5, shuffle=False) creates:

Fold 1: Train on [100-500], Test on [0-100]   <- Model trained on FUTURE!
Fold 2: Train on [0-100, 200-500], Test on [100-200]
Fold 3: Train on [0-200, 300-500], Test on [200-300]
...
```

In Fold 1, you're training on observations 100-500 to predict observations 0-100. **The model is trained on the future to predict the past!**

---

## Section 4: The Three Types of Time Series Leakage [T1]

Now that you understand WHY shuffling is dangerous, let's categorize the three distinct ways leakage occurs in time series:

### Type 1: Data Snooping (Train/Test Contamination)
- **What it is**: Future observations in training set
- **Caused by**: KFold, random train_test_split
- **Detection**: `gate_signal_verification()` from temporalcv

### Type 2: Lookahead Bias (Gap Violations)
- **What it is**: Target at time t uses information from time t+1, t+2, etc.
- **Caused by**: Multi-step forecasts without proper gaps
- **Example**: Predicting 12-week ahead returns without 12-week gap
- **Detection**: `gate_temporal_boundary()` from temporalcv

### Type 3: Feature Leakage (Engineering Errors)
- **What it is**: Features computed using future information
- **Caused by**: Centered rolling windows, full-series normalization, target encoding
- **Example**: `rolling_mean(window=5, center=True)` uses t+1, t+2
- **Detection**: Careful code review + `gate_signal_verification()`

In [None]:
# Demonstrate Type 3: Feature Leakage

def create_safe_features(series):
    """Features that only use PAST information."""
    n = len(series)
    features = {}
    
    # Safe: Lag features
    features['lag_1'] = np.concatenate([[np.nan], series[:-1]])
    features['lag_2'] = np.concatenate([[np.nan, np.nan], series[:-2]])
    
    # Safe: Backward-only rolling mean (uses t-4 to t-1)
    rolling = np.full(n, np.nan)
    for t in range(4, n):
        rolling[t] = np.mean(series[t-4:t])  # Only past!
    features['rolling_mean_4'] = rolling
    
    return features

def create_leaky_features(series):
    """Features that LEAK future information!"""
    n = len(series)
    features = {}
    
    # Leaky: Centered rolling mean (uses t-2 to t+2)
    rolling = np.full(n, np.nan)
    for t in range(2, n - 2):
        rolling[t] = np.mean(series[t-2:t+3])  # Includes t+1, t+2!
    features['centered_rolling_5'] = rolling
    
    # Leaky: Z-score using full series mean/std
    full_mean = np.mean(series)  # Uses ALL data including future!
    full_std = np.std(series)
    features['zscore_full'] = (series - full_mean) / full_std
    
    # Leaky: Percentile rank using full series
    from scipy.stats import rankdata
    features['pct_rank_full'] = rankdata(series) / len(series)
    
    return features

print("SAFE FEATURES (backward-looking only):")
print("  - lag_1: y[t-1]")
print("  - lag_2: y[t-2]")
print("  - rolling_mean_4: mean of y[t-4:t] (excludes current!)")
print()
print("LEAKY FEATURES (use future information):")
print("  - centered_rolling_5: mean of y[t-2:t+3] (includes t+1, t+2!)")
print("  - zscore_full: (y - mean(ALL)) / std(ALL)")
print("  - pct_rank_full: rank(y) / n using full series")

In [None]:
# Visual: Show the difference
fig, axes = plt.subplots(2, 2, figsize=(14, 8))

# Generate series
series = generate_ar1(n=200, phi=0.9, seed=42)
t_focus = 100  # Focus point

# Top left: Safe backward rolling
ax = axes[0, 0]
ax.plot(series, 'gray', alpha=0.5, linewidth=1)
ax.axvline(x=t_focus, color='black', linestyle='--', label='Current time t')
ax.axvspan(t_focus-4, t_focus, alpha=0.3, color='green', label='Used for feature')
ax.scatter([t_focus], [series[t_focus]], c='blue', s=100, zorder=5)
ax.set_title('✓ SAFE: Backward Rolling Mean\nUses only t-4 to t-1', fontsize=11, color='green', fontweight='bold')
ax.set_xlabel('Time')
ax.legend(loc='upper right')

# Top right: Leaky centered rolling
ax = axes[0, 1]
ax.plot(series, 'gray', alpha=0.5, linewidth=1)
ax.axvline(x=t_focus, color='black', linestyle='--', label='Current time t')
ax.axvspan(t_focus-2, t_focus+3, alpha=0.3, color='red', label='Used for feature')
ax.scatter([t_focus], [series[t_focus]], c='blue', s=100, zorder=5)
ax.set_title('✗ LEAKY: Centered Rolling Mean\nIncludes t+1 and t+2!', fontsize=11, color='red', fontweight='bold')
ax.set_xlabel('Time')
ax.legend(loc='upper right')

# Bottom left: Safe percentile (training only)
ax = axes[1, 0]
train_end = 150
ax.plot(series, 'gray', alpha=0.5, linewidth=1)
ax.axvline(x=train_end, color='black', linestyle='--', label='Train/test boundary')
ax.axvspan(0, train_end, alpha=0.2, color='green', label='Percentiles from training')
ax.set_title('✓ SAFE: Percentile from Training Only', fontsize=11, color='green', fontweight='bold')
ax.set_xlabel('Time')
ax.legend(loc='upper right')

# Bottom right: Leaky percentile (full series)
ax = axes[1, 1]
ax.plot(series, 'gray', alpha=0.5, linewidth=1)
ax.axvline(x=train_end, color='black', linestyle='--', label='Train/test boundary')
ax.axvspan(0, len(series), alpha=0.2, color='red', label='Percentiles from ALL data')
ax.set_title('✗ LEAKY: Percentile from Full Series\n(includes test data!)', fontsize=11, color='red', fontweight='bold')
ax.set_xlabel('Time')
ax.legend(loc='upper right')

plt.tight_layout()
plt.show()

print("\n★ Rule: Features must only use information available at prediction time.")
print("  At time t, you can only use data from times 0, 1, 2, ..., t-1.")

---

## Section 5: Your First Correct Temporal Validation

Now let's set up validation the RIGHT way using **Walk-Forward Cross-Validation**.

### The Key Principles

1. **Training data must PRECEDE test data** (no exceptions!)
2. **Gap between train and test** equals your forecast horizon
3. **Use expanding or sliding windows** to simulate production

### Walk-Forward Visualization

```
Fold 1: [=====TRAIN=====][GAP][TEST]
Fold 2: [=======TRAIN=======][GAP][TEST]
Fold 3: [=========TRAIN=========][GAP][TEST]
Fold 4: [===========TRAIN===========][GAP][TEST]
```

The training window expands (or slides) forward in time. Test always comes AFTER train.

In [None]:
# Import temporalcv for proper validation
from temporalcv.cv import WalkForwardCV

# Generate our data
series = generate_ar1(n=500, phi=0.9, sigma=1.0, seed=42)
X, y, valid_indices = create_lag_features(series, n_lags=3)

# Set up proper walk-forward CV
cv = WalkForwardCV(
    n_splits=5,
    window_type='expanding',  # Training window grows over time
    horizon=1, extra_gap=0,                    # horizon=1, extra_gap=0 for 1-step forecasting (extra_gap=h for h-step)
    test_size=50,             # Test on 50 observations per fold
)

# Visualize the splits
print("Walk-Forward CV Splits:")
print("=" * 60)
for info in cv.get_split_info(X):
    train_bar = '█' * (info.train_size // 10)
    gap_bar = '░' * (info.gap)
    test_bar = '▓' * (info.test_size // 10)
    print(f"Fold {info.split_idx+1}: Train[{info.train_start}:{info.train_end}] → "
          f"Test[{info.test_start}:{info.test_end}]  "
          f"(train={info.train_size}, test={info.test_size})")

In [None]:
# Visual comparison: KFold vs WalkForward
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

# Top: KFold (WRONG)
ax = axes[0]
ax.set_title('❌ KFold: Test Points Scattered in Time (WRONG)', fontsize=12, fontweight='bold', color='red')

kfold = KFold(n_splits=5, shuffle=True, random_state=42)
for fold, (train_idx, test_idx) in enumerate(kfold.split(X)):
    ax.scatter(valid_indices[train_idx], [fold]*len(train_idx), c='steelblue', s=5, alpha=0.5, marker='s')
    ax.scatter(valid_indices[test_idx], [fold]*len(test_idx), c='salmon', s=10, alpha=0.8, marker='o')

ax.set_xlabel('Time Index')
ax.set_ylabel('Fold')
ax.set_yticks(range(5))
ax.set_yticklabels([f'Fold {i+1}' for i in range(5)])

# Bottom: WalkForward (CORRECT)
ax = axes[1]
ax.set_title('✓ WalkForward: Test Always After Train (CORRECT)', fontsize=12, fontweight='bold', color='green')

for fold, (train_idx, test_idx) in enumerate(cv.split(X)):
    train_times = valid_indices[train_idx]
    test_times = valid_indices[test_idx]
    
    # Draw as bars for clarity
    ax.barh(fold, train_times.max() - train_times.min(), left=train_times.min(), 
           height=0.6, color='steelblue', alpha=0.7, label='Train' if fold==0 else '')
    ax.barh(fold, test_times.max() - test_times.min(), left=test_times.min(), 
           height=0.6, color='salmon', alpha=0.9, label='Test' if fold==0 else '')
    
    # Draw boundary
    ax.axvline(x=train_times.max(), ymin=(fold-0.3)/5, ymax=(fold+0.7)/5, 
              color='black', linestyle='--', linewidth=1.5)

ax.set_xlabel('Time Index')
ax.set_ylabel('Fold')
ax.set_yticks(range(5))
ax.set_yticklabels([f'Fold {i+1}' for i in range(5)])
ax.legend(loc='upper left')

plt.tight_layout()
plt.show()

In [None]:
# Run proper validation
model = Ridge(alpha=1.0)

# Collect scores from walk-forward CV
wf_scores = []
for train_idx, test_idx in cv.split(X):
    model.fit(X[train_idx], y[train_idx])
    preds = model.predict(X[test_idx])
    mae = mean_absolute_error(y[test_idx], preds)
    wf_scores.append(mae)

# Compare to KFold
kfold_scores = -cross_val_score(model, X, y, cv=kfold, scoring='neg_mean_absolute_error')

# Compare to persistence baseline
persistence_errors = []
for train_idx, test_idx in cv.split(X):
    persistence_preds = X[test_idx, 0]  # y[t-1]
    mae = mean_absolute_error(y[test_idx], persistence_preds)
    persistence_errors.append(mae)

print("VALIDATION RESULTS")
print("=" * 55)
print(f"KFold MAE:           {np.mean(kfold_scores):.4f} ± {np.std(kfold_scores):.4f}  <- FAKE")
print(f"Walk-Forward MAE:    {np.mean(wf_scores):.4f} ± {np.std(wf_scores):.4f}  <- REAL")
print(f"Persistence MAE:     {np.mean(persistence_errors):.4f} ± {np.std(persistence_errors):.4f}  <- BASELINE")
print()
improvement = (np.mean(persistence_errors) - np.mean(wf_scores)) / np.mean(persistence_errors) * 100
print(f"Model improvement over persistence: {improvement:.1f}%")
print()
if improvement > 5:
    print("✓ Model adds value beyond naive baseline!")
else:
    print("⚠ Model barely beats persistence. Consider simpler approach.")

---

## Summary: What You've Learned

### 1. Time Series Violates i.i.d. [T1]
- Observations are NOT independent
- Autocorrelation (ACF) measures this dependence
- High ACF = sticky data = harder to beat naive

### 2. Shuffling Creates Fake Results [T1]
- KFold with `shuffle=True` causes temporal leakage
- Even `shuffle=False` is wrong (future trains past)
- Walk-Forward CV is the correct approach

### 3. Three Types of Leakage [T1]
- **Data snooping**: Future in training set
- **Lookahead bias**: Gap < forecast horizon
- **Feature leakage**: Features use future information

### 4. How to Read ACF Plots
- ACF(1) > 0.9: High persistence, naive is strong
- ACF decays slowly: Hard to beat persistence
- ACF decays fast: Models can add value

### 5. The Correct Validation Setup
```python
from temporalcv.cv import WalkForwardCV

cv = WalkForwardCV(
    n_splits=5,
    window_type='expanding',
    extra_gap=horizon,  # gap >= forecast horizon
    test_size=50,
)
```

---

## Next Steps

Continue your learning path:

1. **00b_real_world_motivation.ipynb**: See realistic synthetic examples (Treasury rates, stock returns)
2. **01_why_temporal_cv.ipynb**: Deep dive into validation gates and detection
3. **Feature Engineering Safety Guide**: Decision tree for safe vs dangerous features

---

*"The most common mistake in time series ML is treating it like regular ML. Once you understand autocorrelation, everything else follows."*