# Project3_v2 â€” Optimized Event-Driven Calendar Spread Backtest

**Goals**: speed, reproducibility, modularization, strategy correctness.

This notebook is a full refactor of the original workflow, using the optimized backtesting framework in `project3/`.
It avoids inline framework code and standardizes configuration, validation, and reporting.

## 0) Setup & Reproducibility

In [None]:
from pathlib import Path
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from project3.optimized_integration import (
    run_optimized_backtest,
    plot_optimized_performance,
    summarize_performance,
)

# Fixed seeds for reproducibility
SEED = 42
random.seed(SEED)
np.random.seed(SEED)

print(f"Reproducibility seed set: {SEED}")

## 1) Configuration

In [None]:
# Data selection
BASE_DIR = Path('.')
CANDIDATE_FILES = [
    BASE_DIR / 'real_akshare_spread_data.csv',
    BASE_DIR / 'crude_oil_wti_spread_data.csv',
    BASE_DIR / 'demo_spread_data.csv',
]

csv_path = next((p for p in CANDIDATE_FILES if p.exists()), None)
if csv_path is None:
    raise FileNotFoundError(
        'No spread CSV found. Place a CSV with NEAR/FAR columns in this folder.'
    )

symbol = 'CALENDAR_SPREAD'
initial_capital = 500_000.0
quantity = 10
lookback_window = 30
z_threshold = 1.5
commission = 5.0
slippage = 0.01

# Baseline metrics (fill in manually if you want metric consistency checks)
baseline_total_return = None
baseline_sharpe_ratio = None
baseline_max_drawdown = None

tolerance = 0.05  # 5% relative tolerance for baseline comparisons

print(f'Using data file: {csv_path}')

## 2) Data Validation & Preview

In [None]:
df = pd.read_csv(csv_path, index_col=0, parse_dates=True)
required_columns = {'NEAR', 'FAR'}
if not required_columns.issubset(df.columns):
    raise ValueError(f'Missing required columns: {required_columns - set(df.columns)}')

df = df.sort_index()
missing_count = df[['NEAR', 'FAR']].isna().sum().sum()
if missing_count > 0:
    print(f'Warning: {missing_count} missing values found in NEAR/FAR columns.')

print(df[['NEAR', 'FAR']].head())
print(f'Data shape: {df.shape}, Date range: {df.index.min()} -> {df.index.max()}')

## 3) Run Optimized Backtest

In [None]:
result = run_optimized_backtest(
    csv_path=str(csv_path),
    symbol=symbol,
    initial_capital=initial_capital,
    quantity=quantity,
    lookback_window=lookback_window,
    z_threshold=z_threshold,
    commission=commission,
    slippage=slippage,
)

summarize_performance(result)

## 4) Baseline Metric Consistency Check

In [None]:
metrics = result.performance
new_total_return = metrics['total_return']
new_sharpe_ratio = metrics['sharpe_ratio']
new_max_drawdown = metrics['max_drawdown']

def within_tolerance(new, baseline, tol):
    if baseline is None:
        return None
    if baseline == 0:
        return abs(new) <= tol
    return abs(new - baseline) / abs(baseline) <= tol

checks = {
    'total_return': within_tolerance(new_total_return, baseline_total_return, tolerance),
    'sharpe_ratio': within_tolerance(new_sharpe_ratio, baseline_sharpe_ratio, tolerance),
    'max_drawdown': within_tolerance(new_max_drawdown, baseline_max_drawdown, tolerance),
}

print('Baseline check results (None = no baseline provided):')
for k, v in checks.items():
    print(f'  {k}: {v}')

## 5) Plots

In [None]:
plot_optimized_performance(
    result,
    lookback_window=lookback_window,
    z_threshold=z_threshold,
    title='Optimized Calendar Spread Backtest (Project3_v2)',
)

## 6) References (Best Practices)
- Zipline: event-driven processing and reproducible backtest configuration options. https://zipline-trader.readthedocs.io/en/latest/beginner-tutorial.html
- QuantConnect: data model testing and validation guidance. https://www.quantconnect.com/docs/v2/lean-engine/contributions/datasets/testing-data-models
- Backtrader: slippage and commission modeling. https://www.backtrader.com/docu/slippage/slippage/ , https://www.backtrader.com/docu/commission-schemes/commission-schemes/