# Validation Demo

This notebook demonstrates the validation modules in `timeseries_toolkit`:

1. **Convergent Cross Mapping (CCM)** - Nonlinear causal inference
2. **Granger Causality** - Linear predictive causality testing
3. **Forensic Diagnostics** - 7-test suite for forecast evaluation

All examples use real market data.

In [1]:
import sys, os
sys.path.insert(0, os.path.abspath('..'))

import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd

print('Setup complete.')

Setup complete.


## 1. Fetch Data

In [2]:
from timeseries_toolkit.data_sources import CryptoDataLoader, EquityDataLoader, VolatilityDataLoader

crypto = CryptoDataLoader()
btc_df = crypto.get_prices(['BTC-USD'], period='2y')
btc_close = btc_df[[c for c in btc_df.columns if 'Close' in c or 'close' in c][0]]
btc_close.name = 'BTC-USD'

equities = EquityDataLoader()
spy_df = equities.get_prices(['SPY'], period='2y')
spy_close = spy_df[[c for c in spy_df.columns if 'Close' in c or 'close' in c][0]]
spy_close.name = 'SPY'

vol_loader = VolatilityDataLoader()
vix = vol_loader.get_vix(period='2y')
vix.name = 'VIX'

print(f'BTC-USD: {len(btc_close)} days')
print(f'SPY:     {len(spy_close)} days')
print(f'VIX:     {len(vix)} days')

# Align all series to common dates
common_idx = btc_close.index.intersection(spy_close.index).intersection(vix.index)
btc_aligned = btc_close.loc[common_idx]
spy_aligned = spy_close.loc[common_idx]
vix_aligned = vix.loc[common_idx]
print(f'\nCommon dates: {len(common_idx)}')

BTC-USD: 732 days
SPY:     502 days
VIX:     503 days

Common dates: 502


---
## 2. Convergent Cross Mapping (CCM)

CCM detects **nonlinear causal coupling** between two time series. Unlike Granger causality, it works for deterministic dynamical systems where variables are coupled via shared attractors.

### 2.1 Test BTC -> SPY Causality

In [3]:
from timeseries_toolkit.validation import ccm_test

btc_ret = btc_aligned.pct_change().dropna()
spy_ret = spy_aligned.pct_change().dropna()
vix_ret = vix_aligned.pct_change().dropna()

result_btc_spy = ccm_test(
    source=btc_ret.values, target=spy_ret.values,
    embedding_dim=3, tau=1, n_surrogates=30
)

print('CCM Test: BTC-USD -> SPY')
print(f'  CCM score:            {result_btc_spy["ccm_score"]:.4f}')
print(f'  Surrogate threshold:  {result_btc_spy["surrogate_threshold"]:.4f}')
print(f'  p-value:              {result_btc_spy["p_value"]:.4f}')
print(f'  Significant:          {result_btc_spy["is_significant"]}')

CCM Test: BTC-USD -> SPY
  CCM score:            0.1764
  Surrogate threshold:  0.0911
  p-value:              0.0000
  Significant:          True


### 2.2 Test SPY -> BTC and VIX -> BTC

In [4]:
result_spy_btc = ccm_test(
    source=spy_ret.values, target=btc_ret.values,
    embedding_dim=3, tau=1, n_surrogates=30
)
print('CCM Test: SPY -> BTC-USD')
print(f'  CCM score: {result_spy_btc["ccm_score"]:.4f}, Significant: {result_spy_btc["is_significant"]}')

common_ret_idx = btc_ret.index.intersection(vix_ret.index)
result_vix_btc = ccm_test(
    source=vix_ret.loc[common_ret_idx].values,
    target=btc_ret.loc[common_ret_idx].values,
    embedding_dim=3, tau=1, n_surrogates=30
)
print(f'\nCCM Test: VIX -> BTC-USD')
print(f'  CCM score: {result_vix_btc["ccm_score"]:.4f}, Significant: {result_vix_btc["is_significant"]}')

CCM Test: SPY -> BTC-USD
  CCM score: 0.2072, Significant: True



CCM Test: VIX -> BTC-USD
  CCM score: 0.2048, Significant: True


### 2.3 Synthetic Causal System (Ground Truth)

In [5]:
from timeseries_toolkit.validation import generate_causal_system

synth_data, causal_map = generate_causal_system(n=1000, seed=42)
print('Synthetic causal system:')
print(f'  Variables: {list(synth_data.columns)}')
print(f'  Known causal structure: {causal_map}')

cols = list(synth_data.columns)
result_synth = ccm_test(
    source=synth_data[cols[0]].values,
    target=synth_data[cols[1]].values,
    embedding_dim=3
)
print(f'\nCCM test on known causal pair ({cols[0]} -> {cols[1]}):')
print(f'  CCM score: {result_synth["ccm_score"]:.4f}, Significant: {result_synth["is_significant"]}')

Synthetic causal system:
  Variables: ['X1', 'X2', 'X3', 'X4', 'Y']
  Known causal structure: {'X1': 'indirect_cause (driver of X2 and X3)', 'X2': 'direct_linear_cause', 'X3': 'direct_nonlinear_cause', 'X4': 'noise (no causal relationship)'}



CCM test on known causal pair (X1 -> X2):
  CCM score: 0.8528, Significant: True


---
## 3. Granger Causality

Granger causality tests whether lagged values of X improve predictions of Y beyond Y's own lags.

### 3.1 Test Multiple Drivers of BTC Returns

In [6]:
from timeseries_toolkit.validation import granger_causality_test

returns_df = pd.DataFrame({'BTC': btc_ret, 'SPY': spy_ret, 'VIX': vix_ret}).dropna()
print(f'Returns DataFrame: {returns_df.shape[0]} rows, {returns_df.shape[1]} columns')

granger_result = granger_causality_test(
    data=returns_df, target_col='BTC',
    source_cols=['SPY', 'VIX'], max_lags=5
)

print('\nGranger Causality: SPY + VIX -> BTC')
print(f'  Improvement over AR baseline: {granger_result.get("improvement_pct", 0):.2f}%')
print(f'  Optimal lags: {granger_result.get("optimal_lag", "N/A")}')

if 'per_source' in granger_result:
    print('\n  Per-source results:')
    for src, details in granger_result['per_source'].items():
        print(f'    {src}: F={details.get("f_statistic", 0):.4f}, p={details.get("p_value", 1):.4f}')

Returns DataFrame: 501 rows, 3 columns

Granger Causality: SPY + VIX -> BTC
  Improvement over AR baseline: 0.00%
  Optimal lags: N/A


### 3.2 Reverse Test: Does BTC Granger-Cause SPY?

In [7]:
granger_reverse = granger_causality_test(
    data=returns_df, target_col='SPY',
    source_cols=['BTC'], max_lags=5
)
print('Granger Causality: BTC -> SPY')
print(f'  Improvement: {granger_reverse.get("improvement_pct", 0):.2f}%')

Granger Causality: BTC -> SPY
  Improvement: 0.00%


---
## 4. Forensic Diagnostics (7-Test Suite)

The `ForensicEnsembleAnalyzer` runs 7 statistical tests on model forecasts:

1. **Baseline check** - Does the model beat naive persistence?
2. **Ljung-Box** - Are residuals white noise?
3. **Shapiro-Wilk** - Are residuals normally distributed?
4. **Spectral analysis** - Any periodic structure in errors?
5. **Hurst exponent** - Do errors show long memory?
6. **Entropy ratio** - How predictable are the errors?
7. **Feature leakage** - Are errors correlated with features?

### 4.1 Create Forecast Models to Evaluate

In [8]:
from timeseries_toolkit.models import AutoKalmanFilter

spy_daily = spy_close.asfreq('D', method='ffill').dropna()
split_idx = int(len(spy_daily) * 0.8)
train = spy_daily.iloc[:split_idx]
test = spy_daily.iloc[split_idx:]

print(f'Train: {len(train)} days')
print(f'Test:  {len(test)} days')

kf = AutoKalmanFilter(level='local linear trend')
kf.fit(train)
kf_preds = kf.predict(start=test.index[0], end=test.index[-1])

sma_preds = train.iloc[-20:].mean() * np.ones(len(test))
sma_series = pd.Series(sma_preds, index=test.index, name='SMA')

print(f'\nKalman predictions: {len(kf_preds)}')
print(f'SMA predictions:    {len(sma_series)}')

Train: 584 days
Test:  147 days

Kalman predictions: 147
SMA predictions:    147


### 4.2 Run Full Forensic Analysis

In [9]:
from timeseries_toolkit.validation import ForensicEnsembleAnalyzer

eval_df = pd.DataFrame({'actual': test, 'kalman': kf_preds, 'sma': sma_series}).dropna()
print(f'Evaluation DataFrame: {eval_df.shape}')

analyzer = ForensicEnsembleAnalyzer(
    df=eval_df, target_col='actual', model_cols=['kalman', 'sma']
)

summary_df = analyzer.run_full_analysis()
print('\nForensic Summary:')
summary_df

Evaluation DataFrame: (147, 3)

Forensic Summary:


Unnamed: 0,Model,Forensic_Score (/6),MAE,RMSE,1_Baseline_Beat,2_WhiteNoise_Pass,3_Normality_Pass,4_Spectral_Pass,5_Hurst_Pass,6_Entropy_Pass,7_Leakage_Pass
0,kalman,1,13.7187,14.9938,False,False,False,False,False,True,Skipped
1,sma,1,33.7096,36.3912,False,False,False,False,False,True,Skipped


### 4.3 Detailed Test Results

In [10]:
for model_name in ['kalman', 'sma']:
    print(f'\n{"=" * 50}')
    print(f'  MODEL: {model_name.upper()}')
    print(f'{"=" * 50}')
    
    details = analyzer.get_detailed_results(model_name)
    
    t1 = details.get('test_1_baseline', {})
    print(f'  1. Baseline check:    {"PASS" if t1.get("passed") else "FAIL"}'
          f' (MAE: {t1.get("mae_model", 0):.2f} vs naive: {t1.get("mae_naive", 0):.2f})')
    
    t2 = details.get('test_2_ljung_box', {})
    print(f'  2. Ljung-Box:         p={t2.get("p_value", 0):.4f} ({"PASS" if t2.get("passed") else "FAIL"})')
    
    t3 = details.get('test_3_shapiro_wilk', {})
    print(f'  3. Shapiro-Wilk:      p={t3.get("p_value", 0):.4f} ({"PASS" if t3.get("passed") else "FAIL"})')
    
    t4 = details.get('test_4_spectral', {})
    print(f'  4. Spectral:          CV(PSD)={t4.get("cv_psd", 0):.4f} ({"PASS" if t4.get("passed") else "FAIL"})')
    
    t5 = details.get('test_5_hurst', {})
    h = t5.get('hurst_exponent', 0)
    hi = 'random walk' if abs(h - 0.5) < 0.1 else ('persistent' if h > 0.5 else 'anti-persistent')
    print(f'  5. Hurst exponent:    H={h:.4f} ({hi}) ({"PASS" if t5.get("passed") else "FAIL"})')
    
    t6 = details.get('test_6_entropy', {})
    print(f'  6. Entropy ratio:     {t6.get("entropy_ratio", 0):.4f} ({"PASS" if t6.get("passed") else "FAIL"})')
    
    t7 = details.get('test_7_leakage', {})
    p7 = t7.get('passed')
    print(f'  7. Feature leakage:   {"PASS" if p7 else ("SKIPPED" if p7 is None else "FAIL")}')


  MODEL: KALMAN
  1. Baseline check:    FAIL (MAE: 13.72 vs naive: 2.45)
  2. Ljung-Box:         p=0.0000 (FAIL)
  3. Shapiro-Wilk:      p=0.0006 (FAIL)
  4. Spectral:          CV(PSD)=2.6393 (FAIL)
  5. Hurst exponent:    H=0.1472 (anti-persistent) (FAIL)
  6. Entropy ratio:     1.0582 (PASS)
  7. Feature leakage:   SKIPPED

  MODEL: SMA
  1. Baseline check:    FAIL (MAE: 33.71 vs naive: 2.45)
  2. Ljung-Box:         p=0.0000 (FAIL)
  3. Shapiro-Wilk:      p=0.0002 (FAIL)
  4. Spectral:          CV(PSD)=2.7907 (FAIL)
  5. Hurst exponent:    H=0.1472 (anti-persistent) (FAIL)
  6. Entropy ratio:     1.0000 (PASS)
  7. Feature leakage:   SKIPPED


---
## Summary

| Module | Tool | Purpose |
|--------|------|--------|
| `causality` | `ccm_test()` | Detect nonlinear causal coupling |
| `causality` | `granger_causality_test()` | Test linear predictive causality |
| `causality` | `generate_causal_system()` | Synthetic data with known causal structure |
| `diagnostics` | `ForensicEnsembleAnalyzer` | 7-test forecast evaluation suite |