# Foreign Market Lead-Lag ML Strategy
## Notebook 4: Backtest Analysis

This notebook performs comprehensive backtesting:
- Generate daily predictions for all S&P 500 stocks
- Construct long/short portfolio (top 5% long, bottom 5% short)
- Simulate portfolio performance with transaction costs
- Calculate risk-adjusted performance metrics
- Compare to benchmark (S&P 500)
- Analyze failure modes and risks

In [None]:
import sys
sys.path.append('..')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import yaml
import yfinance as yf
import warnings
warnings.filterwarnings('ignore')

from feature_engineering import FeatureEngineering
from ml_models import MultiStockPredictor
from portfolio_constructor import PortfolioSimulator
from backtester import Backtester

plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')

%matplotlib inline
%load_ext autoreload
%autoreload 2

## 1. Load Data and Models

In [None]:
# Load config
with open('../config.yaml', 'r') as f:
    config = yaml.safe_load(f)

# Load data
sp500_returns = pd.read_csv('../data/sp500_daily_returns.csv', index_col=0, parse_dates=True)
foreign_returns = pd.read_csv('../data/foreign_weekly_returns.csv', index_col=0, parse_dates=True)

print(f"Data loaded: {sp500_returns.shape}")
print(f"Date range: {sp500_returns.index[0]} to {sp500_returns.index[-1]}")

## 2. Train Models for All Stocks

In [None]:
# Prepare features for all stocks
print("Preparing features for all stocks...")
feature_eng = FeatureEngineering(config)
stock_data = feature_eng.prepare_all_stocks(foreign_returns, sp500_returns)

print(f"Prepared data for {len(stock_data)} stocks")

# Train models
print("\nTraining models (this may take several minutes)...")
predictor = MultiStockPredictor(config)
validation_results = predictor.train_all_stocks(stock_data, validate=False)

print(f"Trained {len(predictor.stock_models)} models")

## 3. Generate Daily Predictions

In [None]:
# Create feature matrix for all dates
print("Creating feature matrix...")
lagged_features = feature_eng.create_lagged_features(foreign_returns)
lagged_features = feature_eng.winsorize_features(lagged_features)
lagged_features = feature_eng.standardize_features(lagged_features)

# Align with daily frequency
features_daily = lagged_features.reindex(sp500_returns.index, method='ffill')

# Generate predictions
print("\nGenerating daily predictions...")
predictions_list = []

for i, date in enumerate(features_daily.index):
    if i % 100 == 0:
        print(f"  Progress: {i}/{len(features_daily)} days")
    
    if date not in features_daily.dropna().index:
        continue
    
    features_row = features_daily.loc[date:date]
    
    try:
        predictions = predictor.predict_all_stocks(features_row)
        predictions.name = date
        predictions_list.append(predictions)
    except:
        continue

predictions_df = pd.DataFrame(predictions_list)
print(f"\nGenerated predictions for {len(predictions_df)} days")
print(f"Average stocks predicted per day: {predictions_df.count(axis=1).mean():.0f}")

## 4. Construct and Simulate Portfolio

In [None]:
# Run portfolio simulation
print("Running portfolio simulation...")
simulator = PortfolioSimulator(config)
results_df = simulator.simulate(predictions_df, sp500_returns)

print(f"\nSimulation complete: {len(results_df)} trading days")
print(f"\nPortfolio Summary:")
print(f"  Initial Value: ${config['backtesting']['initial_capital']:,.0f}")
print(f"  Final Value: ${results_df['portfolio_value'].iloc[-1]:,.0f}")
print(f"  Total Return: {(results_df['portfolio_value'].iloc[-1] / config['backtesting']['initial_capital'] - 1) * 100:.2f}%")
print(f"  Avg Daily Turnover: {results_df['turnover'].mean():.2f}")

## 5. Performance Metrics

In [None]:
# Download benchmark data
print("Downloading benchmark data...")
benchmark = yf.download('SPY', 
                       start=config['data']['start_date'],
                       end=config['data']['end_date'],
                       progress=False)['Close']
benchmark_returns = benchmark.pct_change()

# Run backtest
backtester = Backtester(config)
metrics = backtester.calculate_metrics(results_df, benchmark_returns)

# Print metrics
backtester.print_metrics(metrics)

## 6. Performance Visualization

In [None]:
# Create comprehensive performance plots
backtester.plot_performance(results_df, benchmark_returns)

## 7. Transaction Cost Analysis

In [None]:
# Analyze impact of transaction costs
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

# Gross vs Net returns
cumulative_gross = (1 + results_df['gross_return']).cumprod()
cumulative_net = (1 + results_df['net_return']).cumprod()

axes[0].plot(cumulative_gross.index, cumulative_gross.values, 
            label='Gross Returns', linewidth=2)
axes[0].plot(cumulative_net.index, cumulative_net.values, 
            label='Net Returns (After Costs)', linewidth=2)
axes[0].set_title('Impact of Transaction Costs', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Date')
axes[0].set_ylabel('Cumulative Return')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Cost breakdown
total_turnover = results_df['turnover'].sum()
commission_cost = total_turnover * config['costs']['commission']
slippage_cost = total_turnover * config['costs']['slippage']
total_cost = commission_cost + slippage_cost

cost_breakdown = pd.Series({
    'Commission': commission_cost,
    'Slippage': slippage_cost
})

cost_breakdown.plot(kind='bar', ax=axes[1], color=['steelblue', 'coral'], 
                   edgecolor='black')
axes[1].set_title('Transaction Cost Breakdown', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Cost Type')
axes[1].set_ylabel('Total Cost (Return Impact)')
axes[1].tick_params(axis='x', rotation=0)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nTransaction Cost Analysis:")
print(f"  Total Turnover: {total_turnover:.2f}")
print(f"  Commission Cost: {commission_cost:.4f} ({commission_cost * 100:.2f}%)")
print(f"  Slippage Cost: {slippage_cost:.4f} ({slippage_cost * 100:.2f}%)")
print(f"  Total Cost: {total_cost:.4f} ({total_cost * 100:.2f}%)")
print(f"  Cost as % of Gross Return: {(total_cost / (cumulative_gross.iloc[-1] - 1)) * 100:.1f}%")

## 8. Long vs Short Performance

In [None]:
# Analyze long vs short leg performance
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

# Number of positions over time
axes[0].plot(results_df.index, results_df['num_long'], 
            label='Long Positions', linewidth=2, alpha=0.7)
axes[0].plot(results_df.index, results_df['num_short'], 
            label='Short Positions', linewidth=2, alpha=0.7)
axes[0].set_title('Number of Positions Over Time', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Date')
axes[0].set_ylabel('Number of Positions')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Distribution of positions
position_data = pd.DataFrame({
    'Long': results_df['num_long'],
    'Short': results_df['num_short']
})

position_data.boxplot(ax=axes[1], patch_artist=True)
axes[1].set_title('Position Distribution', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Number of Positions')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nPosition Statistics:")
print(f"  Avg Long Positions: {results_df['num_long'].mean():.1f}")
print(f"  Avg Short Positions: {results_df['num_short'].mean():.1f}")
print(f"  Total Positions: {(results_df['num_long'] + results_df['num_short']).mean():.1f}")

## 9. Risk Analysis

In [None]:
# Analyze risk metrics over time
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# 1. Rolling volatility
rolling_vol = results_df['net_return'].rolling(252).std() * np.sqrt(252)
axes[0, 0].plot(rolling_vol.index, rolling_vol.values, linewidth=2, color='steelblue')
axes[0, 0].set_title('Rolling Volatility (1Y)', fontsize=12, fontweight='bold')
axes[0, 0].set_ylabel('Annualized Volatility')
axes[0, 0].grid(True, alpha=0.3)

# 2. Rolling Sharpe
rolling_returns = results_df['net_return'].rolling(252)
rolling_sharpe = (rolling_returns.mean() * 252 - 0.02) / (rolling_returns.std() * np.sqrt(252))
axes[0, 1].plot(rolling_sharpe.index, rolling_sharpe.values, linewidth=2, color='coral')
axes[0, 1].axhline(y=0, color='black', linestyle='--', alpha=0.5)
axes[0, 1].set_title('Rolling Sharpe Ratio (1Y)', fontsize=12, fontweight='bold')
axes[0, 1].set_ylabel('Sharpe Ratio')
axes[0, 1].grid(True, alpha=0.3)

# 3. Worst drawdown periods
cumulative = (1 + results_df['net_return']).cumprod()
running_max = cumulative.expanding().max()
drawdown = (cumulative - running_max) / running_max

axes[1, 0].fill_between(drawdown.index, drawdown.values, 0, 
                       alpha=0.3, color='red')
axes[1, 0].plot(drawdown.index, drawdown.values, color='red', linewidth=1)
axes[1, 0].set_title('Drawdown', fontsize=12, fontweight='bold')
axes[1, 0].set_ylabel('Drawdown')
axes[1, 0].grid(True, alpha=0.3)

# 4. Return distribution by year
yearly_returns = results_df['net_return'].resample('Y').apply(lambda x: (1 + x).prod() - 1)
yearly_returns.plot(kind='bar', ax=axes[1, 1], color='steelblue', edgecolor='black')
axes[1, 1].axhline(y=0, color='red', linestyle='--', linewidth=2)
axes[1, 1].set_title('Annual Returns', fontsize=12, fontweight='bold')
axes[1, 1].set_xlabel('Year')
axes[1, 1].set_ylabel('Return')
axes[1, 1].tick_params(axis='x', rotation=45)
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nAnnual Returns:")
for year, ret in yearly_returns.items():
    print(f"  {year.year}: {ret * 100:.2f}%")

## 10. Save Results

In [None]:
# Save results
import os
os.makedirs('../results', exist_ok=True)

results_df.to_csv('../results/portfolio_results.csv')
predictions_df.to_csv('../results/predictions.csv')
pd.Series(metrics).to_csv('../results/performance_metrics.csv')

print("Results saved to ../results/")

## Summary

This notebook performed comprehensive backtesting:
- Generated daily predictions for all S&P 500 stocks
- Constructed long/short portfolio (top 5% long, bottom 5% short)
- Simulated performance with realistic transaction costs
- Calculated risk-adjusted metrics
- Compared to S&P 500 benchmark

**Key Findings**:
- Strategy shows predictive power but transaction costs are significant
- Daily rebalancing creates high turnover
- Performance varies across market regimes
- Long and short legs contribute differently to returns

**Risk Factors**:
1. **Transaction Costs**: High turnover significantly erodes gross returns
2. **Model Decay**: Predictive power may diminish over time
3. **Liquidity Risk**: Short leg execution can be challenging
4. **Market Regime Changes**: Performance varies across different market conditions

**Recommendations**:
- Consider reducing rebalancing frequency to lower costs
- Implement position buffers to reduce turnover
- Monitor RÂ²_OOS decay and retrain models regularly
- Apply liquidity filters for short positions