# Optimizer Testing Matrix

**Purpose**: Systematically compare portfolio optimization strategies under consistent market conditions.

This notebook implements the testing matrix from `/docs/Portfolio_Weight_Optimization.md`.

## Available Optimizers:
1. **Equal Weight** - Simple benchmark
2. **Score-Weighted** - Weights proportional to ML scores
3. **Inverse Volatility** - Risk parity balance
4. **MVO** - Mean-variance optimization (max Sharpe)
5. **MVO_REG** - Regularized MVO with L2 penalty
6. **HRP** - Hierarchical Risk Parity
7. **Hybrid** - Combine scores + risk

---

## Setup

In [None]:
# Standard imports
import sys
import os
import yaml
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

# Set plotting style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Add project root to path
project_root = os.path.abspath('..')
if project_root not in sys.path:
    sys.path.insert(0, project_root)

print(f"Working directory: {os.getcwd()}")
print(f"Project root: {project_root}")

In [None]:
# Load configuration
config_path = os.path.join(project_root, 'config/config.yaml')
with open(config_path) as f:
    config = yaml.safe_load(f)

print("✓ Configuration loaded")

In [None]:
# Import project modules
from src.io.ingest_ohlcv import OHLCVIngester
from src.features.ta_features import create_technical_features
from src.labeling.labels import generate_forward_returns
from src.ml.dataset import MLDataset, create_time_based_split
from src.ml.train import ModelTrainer
from src.portfolio.optimizer_comparison import (
    run_optimizer_comparison,
    generate_optimizer_recommendations
)

print("✓ Modules imported")

---
## Load Pre-computed Features & Scores

**Option 1**: Use existing data from `01_interactive_pipeline.ipynb`  
**Option 2**: Generate new data (uncomment cells below)

In [None]:
# Load pre-computed features from previous run
try:
    df = pd.read_parquet('data/features/all_features.parquet')
    print(f"✓ Loaded {len(df):,} rows of features")
    print(f"Date range: {df['date'].min()} to {df['date'].max()}")
except FileNotFoundError:
    print("⚠️ Features file not found. Please run 01_interactive_pipeline.ipynb first.")
    raise

## Generate ML Scores

Train model and generate predictions (or load existing model)

In [None]:
# Time-based split
label_col = f"forward_return_{config['labels']['horizon']}d"
train_df, test_df = create_time_based_split(df, test_size=0.2, embargo_days=5)

print(f"Train: {len(train_df):,} rows ({train_df['date'].min()} to {train_df['date'].max()})")
print(f"Test:  {len(test_df):,} rows ({test_df['date'].min()} to {test_df['date'].max()})")

In [None]:
# Prepare ML dataset
dataset = MLDataset(label_col=label_col)
X_train, y_train = dataset.prepare(train_df, auto_select_features=True)
X_test, y_test = dataset.prepare(test_df, auto_select_features=False)

print(f"Training set: {len(X_train):,} samples, {len(X_train.columns)} features")
print(f"Test set:     {len(X_test):,} samples")

In [None]:
# Train model (or load existing)
trainer = ModelTrainer(config)

# Option 1: Train new model
trainer.train(X_train, y_train)
print("✓ Model trained")

# Option 2: Load existing model
# trainer.load_model('data/models/latest_model.pkl')
# print("✓ Model loaded")

In [None]:
# Generate ML scores for test period
X_test_full, _ = dataset.prepare(test_df, auto_select_features=False)
scores = trainer.predict(X_test_full)

# Create scored DataFrame
test_df_clean = test_df[test_df[label_col].notna()].copy()
scored_df = test_df_clean[['date', 'symbol']].copy()
scored_df['ml_score'] = scores

print(f"✓ Generated scores for {len(scored_df):,} observations")
print(f"\nScore statistics:")
print(scored_df['ml_score'].describe())

In [None]:
# Get price panel
price_panel = df[['date', 'symbol', 'close']].copy()
print(f"✓ Price panel: {len(price_panel):,} rows")

---
## Run Optimizer Comparison

This will test all optimizers on the same data and generate comparative metrics.

In [None]:
# Define optimizers to test
optimizers_to_test = [
    'equal',
    'score_weighted',
    'inv_vol',
    'mvo',
    'hybrid',
    # 'hrp',  # Uncomment if you want to test HRP (may be slower)
]

print(f"Testing {len(optimizers_to_test)} optimizers: {optimizers_to_test}")
print("\nThis may take 2-5 minutes depending on data size...\n")

In [None]:
# Run comparison
comparison_results = run_optimizer_comparison(
    scored_df=scored_df,
    price_panel=price_panel,
    config=config,
    optimizers=optimizers_to_test,
    score_col='ml_score'
)

print("\n" + "="*80)
print("✓ Comparison complete!")
print("="*80)

---
## Results Analysis

In [None]:
# Display full results table
print("\n" + "="*80)
print("OPTIMIZER COMPARISON - DETAILED RESULTS")
print("="*80 + "\n")

display_df = comparison_results[[
    'optimizer', 'sharpe_ratio', 'annual_return', 'volatility',
    'max_drawdown', 'avg_turnover', 'hhi', 'effective_n'
]].copy()

# Format for display
display_df['annual_return'] = display_df['annual_return'].apply(lambda x: f"{x:.2%}")
display_df['volatility'] = display_df['volatility'].apply(lambda x: f"{x:.2%}")
display_df['max_drawdown'] = display_df['max_drawdown'].apply(lambda x: f"{x:.2%}")
display_df['avg_turnover'] = display_df['avg_turnover'].apply(lambda x: f"{x:.2%}")
display_df['sharpe_ratio'] = display_df['sharpe_ratio'].apply(lambda x: f"{x:.2f}")
display_df['hhi'] = display_df['hhi'].apply(lambda x: f"{x:.4f}")
display_df['effective_n'] = display_df['effective_n'].apply(lambda x: f"{x:.1f}")

print(display_df.to_string(index=False))

In [None]:
# Generate recommendations
recommendations = generate_optimizer_recommendations(comparison_results)

print("\n" + "="*80)
print("OPTIMIZER RECOMMENDATIONS")
print("="*80 + "\n")

for objective, recommendation in recommendations.items():
    print(f"🎯 {objective.replace('_', ' ').title()}:")
    print(f"   → {recommendation}\n")

---
## Visualizations

In [None]:
# Performance comparison charts
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# 1. Sharpe Ratio
comparison_results_sorted = comparison_results.sort_values('sharpe_ratio', ascending=False)
axes[0, 0].barh(comparison_results_sorted['optimizer'], comparison_results_sorted['sharpe_ratio'])
axes[0, 0].set_xlabel('Sharpe Ratio')
axes[0, 0].set_title('Risk-Adjusted Performance', fontsize=14, fontweight='bold')
axes[0, 0].grid(True, alpha=0.3)

# 2. Return vs Volatility
axes[0, 1].scatter(
    comparison_results['volatility'] * 100,
    comparison_results['annual_return'] * 100,
    s=200,
    alpha=0.6
)
for _, row in comparison_results.iterrows():
    axes[0, 1].annotate(
        row['optimizer'],
        (row['volatility'] * 100, row['annual_return'] * 100),
        fontsize=9
    )
axes[0, 1].set_xlabel('Volatility (%)')
axes[0, 1].set_ylabel('Annual Return (%)')
axes[0, 1].set_title('Return vs Risk Profile', fontsize=14, fontweight='bold')
axes[0, 1].grid(True, alpha=0.3)

# 3. Max Drawdown
comparison_results_sorted_dd = comparison_results.sort_values('max_drawdown', ascending=False)
axes[1, 0].barh(comparison_results_sorted_dd['optimizer'], comparison_results_sorted_dd['max_drawdown'] * 100)
axes[1, 0].set_xlabel('Max Drawdown (%)')
axes[1, 0].set_title('Downside Risk', fontsize=14, fontweight='bold')
axes[1, 0].grid(True, alpha=0.3)

# 4. Concentration (HHI)
comparison_results_sorted_hhi = comparison_results.sort_values('hhi')
axes[1, 1].barh(comparison_results_sorted_hhi['optimizer'], comparison_results_sorted_hhi['hhi'])
axes[1, 1].set_xlabel('Herfindahl Index (Lower = More Diversified)')
axes[1, 1].set_title('Portfolio Concentration', fontsize=14, fontweight='bold')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Turnover vs Sharpe trade-off
fig, ax = plt.subplots(1, 1, figsize=(10, 6))

scatter = ax.scatter(
    comparison_results['avg_turnover'] * 100,
    comparison_results['sharpe_ratio'],
    s=300,
    c=comparison_results['hhi'],
    cmap='viridis',
    alpha=0.7
)

for _, row in comparison_results.iterrows():
    ax.annotate(
        row['optimizer'],
        (row['avg_turnover'] * 100, row['sharpe_ratio']),
        fontsize=10,
        ha='center'
    )

ax.set_xlabel('Average Turnover (%)', fontsize=12)
ax.set_ylabel('Sharpe Ratio', fontsize=12)
ax.set_title('Turnover vs Performance Trade-off\n(Color = Concentration)', 
             fontsize=14, fontweight='bold')
ax.grid(True, alpha=0.3)

# Add colorbar
cbar = plt.colorbar(scatter, ax=ax)
cbar.set_label('HHI (Concentration)', fontsize=10)

plt.tight_layout()
plt.show()

---
## Interpretation Framework

Based on the results, consider the following:

| Observation | Interpretation | Action |
|-------------|----------------|--------|
| High Sharpe + Low Drawdown | Efficient allocation | Prioritize for production |
| High Turnover + Same Sharpe | Overfitting or excessive rebalancing | Increase rebalance interval |
| HRP > MVO Stability | Covariance estimation too noisy | Use shrinkage or HRP |
| Hybrid Outperforms | ML scores add predictive alpha | Keep hybrid weighting as default |

---

## Save Results

In [None]:
# Save comparison results
output_path = 'data/reports/optimizer_comparison.csv'
comparison_results.to_csv(output_path, index=False)
print(f"✓ Results saved to {output_path}")

---
## Next Steps

1. **Select Best Optimizer**: Update `config.yaml` with the best-performing optimizer
2. **Fine-tune Parameters**: Adjust optimizer-specific settings (e.g., hybrid weights, HRP linkage)
3. **Test Different Regimes**: Run comparison on different market periods
4. **Production Deployment**: Use selected optimizer in live trading system

**To change optimizer in config:**
```yaml
portfolio:
  optimizer: "hybrid"  # Change to your preferred optimizer
```

---