<span style="color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;">An Exception was encountered at '<a href="#papermill-error-cell">In [8]</a>'.</span>

# Universal Strategy Analysis

This notebook provides comprehensive analysis across all strategies tested in a parameter sweep.

**Key Features:**
- Cross-strategy performance comparison
- Parameter sensitivity analysis
- Correlation analysis for ensemble building
- Regime-specific performance breakdown
- Automatic identification of optimal strategies and ensembles

In [1]:
# parameters
run_dir = "/path/to/results/run_20250623_143030"
config_name = "my_sweep"
symbols = ["SPY"]
timeframe = "5m"
min_strategies_to_analyze = 20
sharpe_threshold = 1.0
correlation_threshold = 0.7
top_n_strategies = 10
ensemble_size = 5

In [2]:
# Parameters
run_dir = "/Users/daws/ADMF-PC/config/bollinger/results/20250624_150142"
config_name = "unnamed"
symbols = ["SPY"]
timeframe = "5m"
min_strategies_to_analyze = 20
sharpe_threshold = 1.0
correlation_threshold = 0.7
top_n_strategies = 10
ensemble_size = 5


## Setup

In [3]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import duckdb
import json
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Configure plotting
plt.style.use('seaborn-v0_8-darkgrid')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 10

# Initialize DuckDB
con = duckdb.connect()

# Convert run_dir to Path
run_dir = Path(run_dir)
print(f"Analyzing run: {run_dir.name}")
print(f"Config: {config_name}")
print(f"Symbol(s): {symbols}")
print(f"Timeframe: {timeframe}")

Analyzing run: 20250624_150142
Config: unnamed
Symbol(s): ['SPY']
Timeframe: 5m


In [4]:
# Setup path for loading analysis snippets
import sys
from pathlib import Path

# Find the project root (where src/ directory is)
current_path = Path(run_dir).resolve()
project_root = None

# Search up the directory tree for src/analytics/snippets
for parent in current_path.parents:
    if (parent / 'src' / 'analytics' / 'snippets').exists():
        project_root = parent
        break

if project_root:
    # Add to Python path if not already there
    if str(project_root) not in sys.path:
        sys.path.insert(0, str(project_root))
    snippets_path = project_root / 'src' / 'analytics' / 'snippets'
    queries_path = project_root / 'src' / 'analytics' / 'queries'
    print(f"✅ Analysis snippets available at: {snippets_path}")
    print(f"✅ SQL queries available at: {queries_path}")
    print("\nUse %load to load any snippet, e.g.:")
    print("  %load src/analytics/snippets/exploratory/signal_frequency.py")
    print("  %load src/analytics/snippets/ensembles/find_uncorrelated.py")
else:
    print("⚠️ Could not find project root with src/analytics/snippets")

✅ Analysis snippets available at: /Users/daws/ADMF-PC/src/analytics/snippets
✅ SQL queries available at: /Users/daws/ADMF-PC/src/analytics/queries

Use %load to load any snippet, e.g.:
  %load src/analytics/snippets/exploratory/signal_frequency.py
  %load src/analytics/snippets/ensembles/find_uncorrelated.py


## Load Strategy Index

In [5]:
# Load strategy index - the catalog of all strategies tested
strategy_index_path = run_dir / 'strategy_index.parquet'

if strategy_index_path.exists():
    strategy_index = pd.read_parquet(strategy_index_path)
    print(f"✅ Loaded {len(strategy_index)} strategies")
    
    # Show strategy type distribution
    by_type = strategy_index['strategy_type'].value_counts()
    print("\nStrategies by type:")
    for stype, count in by_type.items():
        print(f"  {stype}: {count}")
else:
    # Fallback for legacy format
    print("⚠️ No strategy_index.parquet found, using legacy format")
    strategy_index = None

✅ Loaded 1640 strategies

Strategies by type:
  bollinger_bands: 1640


## Performance Calculation

In [6]:
def calculate_performance(strategy_hash, trace_path, market_data):
    """Calculate performance metrics for a strategy"""
    try:
        # Load sparse signals
        signals = pd.read_parquet(run_dir / trace_path)
        signals['ts'] = pd.to_datetime(signals['ts'])
        
        # Merge with market data
        df = market_data.merge(
            signals[['ts', 'val']], 
            left_on='timestamp', 
            right_on='ts', 
            how='left'
        )
        
        # Forward fill signals (sparse to dense)
        df['signal'] = df['val'].fillna(method='ffill').fillna(0)
        
        # Calculate returns
        df['returns'] = df['close'].pct_change()
        df['strategy_returns'] = df['returns'] * df['signal'].shift(1)
        df['cum_returns'] = (1 + df['strategy_returns']).cumprod()
        
        # Metrics
        total_return = df['cum_returns'].iloc[-1] - 1
        
        if df['strategy_returns'].std() > 0:
            sharpe = df['strategy_returns'].mean() / df['strategy_returns'].std() * np.sqrt(252 * 78)
        else:
            sharpe = 0
            
        cummax = df['cum_returns'].expanding().max()
        drawdown = (df['cum_returns'] / cummax - 1)
        max_dd = drawdown.min()
        
        # Count trades
        trades = (df['signal'] != df['signal'].shift()).sum()
        
        return {
            'total_return': total_return,
            'sharpe_ratio': sharpe,
            'max_drawdown': max_dd,
            'num_trades': trades,
            'df': df  # For later analysis
        }
    except Exception as e:
        print(f"Error calculating performance for {strategy_hash}: {e}")
        return None

In [7]:
# Load market data
market_data_paths = [
    Path(f'data/{symbols[0]}_{timeframe}.parquet'),
    Path(f'../data/{symbols[0]}_{timeframe}.parquet'),
    Path(f'../../data/{symbols[0]}_{timeframe}.parquet'),
    Path(f'../../../data/{symbols[0]}_{timeframe}.parquet'),
    Path(f'../../../../data/{symbols[0]}_{timeframe}.parquet'),
]

market_data = None
for path in market_data_paths:
    if path.exists():
        market_data = pd.read_parquet(path)
        print(f'✅ Loaded market data from: {path}')
        break

if market_data is None:
    print('❌ Could not find market data file')

✅ Loaded market data from: data/SPY_5m.parquet


<span id="papermill-error-cell" style="color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;">Execution using papermill encountered an exception here and stopped:</span>

In [8]:
# Calculate performance for all strategies
if strategy_index is not None and market_data is not None:
    performance_results = []
    
    # Limit analysis for performance if too many strategies
    strategies_to_analyze = strategy_index
    if len(strategy_index) > min_strategies_to_analyze * 2:
        print(f"Note: Limiting initial analysis to {min_strategies_to_analyze * 2} strategies for performance")
        # Sample diverse strategies
        strategies_to_analyze = strategy_index.groupby('strategy_type').apply(
            lambda x: x.sample(min=len(x), n=min(len(x), min_strategies_to_analyze // strategy_index['strategy_type'].nunique()))
        ).reset_index(drop=True)
    
    print(f"\nCalculating performance for {len(strategies_to_analyze)} strategies...")
    
    for idx, row in strategies_to_analyze.iterrows():
        if idx % 10 == 0:
            print(f"  Progress: {idx}/{len(strategies_to_analyze)}")
            
        perf = calculate_performance(row['strategy_hash'], row['trace_path'], market_data)
        
        if perf:
            # Combine strategy info with performance
            result = {**row.to_dict(), **perf}
            # Remove the full dataframe from results
            result.pop('df', None)
            performance_results.append(result)
    
    performance_df = pd.DataFrame(performance_results)
    print(f"\n✅ Calculated performance for {len(performance_df)} strategies")
else:
    performance_df = pd.DataFrame()
    print("⚠️ Skipping performance calculation")

Note: Limiting initial analysis to 40 strategies for performance


TypeError: NDFrame.sample() got an unexpected keyword argument 'min'

## Cross-Strategy Performance Analysis

In [None]:
if len(performance_df) > 0:
    # Top performers across ALL strategy types
    top_overall = performance_df.nlargest(top_n_strategies, 'sharpe_ratio')
    
    print(f"\n🏆 Top {top_n_strategies} Strategies (All Types):")
    print("=" * 80)
    
    display_cols = ['strategy_type', 'strategy_hash', 'sharpe_ratio', 'total_return', 'max_drawdown', 'num_trades']
    # Add parameter columns if they exist
    param_cols = [col for col in top_overall.columns if col.startswith('param_')]
    display_cols.extend(param_cols[:3])  # Show first 3 parameters
    
    for idx, row in top_overall.iterrows():
        print(f"\n{row['strategy_type']} - {row['strategy_hash'][:8]}")
        print(f"  Sharpe: {row['sharpe_ratio']:.2f} | Return: {row['total_return']:.1%} | Drawdown: {row['max_drawdown']:.1%}")
        if param_cols:
            params_str = " | ".join([f"{col.replace('param_', '')}: {row[col]}" for col in param_cols[:3] if pd.notna(row[col])])
            if params_str:
                print(f"  Params: {params_str}")

In [None]:
# Performance by strategy type
if len(performance_df) > 0:
    type_summary = performance_df.groupby('strategy_type').agg({
        'sharpe_ratio': ['mean', 'std', 'max'],
        'total_return': ['mean', 'std', 'max'],
        'strategy_hash': 'count'
    }).round(3)
    
    type_summary.columns = ['_'.join(col).strip() for col in type_summary.columns]
    type_summary = type_summary.rename(columns={'strategy_hash_count': 'count'})
    type_summary = type_summary.sort_values('sharpe_ratio_mean', ascending=False)
    
    print("\n📊 Performance by Strategy Type:")
    print(type_summary)

## Visualizations

In [None]:
# Sharpe distribution by strategy type
if len(performance_df) > 0 and performance_df['strategy_type'].nunique() > 1:
    plt.figure(figsize=(14, 6))
    
    # Box plot of Sharpe by type
    plt.subplot(1, 2, 1)
    performance_df.boxplot(column='sharpe_ratio', by='strategy_type', ax=plt.gca())
    plt.xticks(rotation=45, ha='right')
    plt.title('Sharpe Ratio Distribution by Strategy Type')
    plt.suptitle('')  # Remove default title
    plt.ylabel('Sharpe Ratio')
    
    # Scatter: Return vs Sharpe
    plt.subplot(1, 2, 2)
    for stype in performance_df['strategy_type'].unique():
        mask = performance_df['strategy_type'] == stype
        plt.scatter(performance_df.loc[mask, 'total_return'], 
                   performance_df.loc[mask, 'sharpe_ratio'],
                   label=stype, alpha=0.6)
    plt.xlabel('Total Return')
    plt.ylabel('Sharpe Ratio')
    plt.title('Return vs Risk-Adjusted Return')
    plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
    plt.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

## Correlation Analysis for Ensemble Building

In [None]:
def calculate_strategy_correlations(strategies_df, market_data, run_dir):
    """Calculate correlation matrix between strategies"""
    returns_dict = {}
    
    for idx, row in strategies_df.iterrows():
        try:
            # Load signals
            signals = pd.read_parquet(run_dir / row['trace_path'])
            signals['ts'] = pd.to_datetime(signals['ts'])
            
            # Merge and calculate returns
            df = market_data.merge(signals[['ts', 'val']], left_on='timestamp', right_on='ts', how='left')
            df['signal'] = df['val'].fillna(method='ffill').fillna(0)
            df['returns'] = df['close'].pct_change()
            df['strategy_returns'] = df['returns'] * df['signal'].shift(1)
            
            returns_dict[row['strategy_hash']] = df['strategy_returns']
        except:
            pass
    
    # Create returns DataFrame and calculate correlation
    if returns_dict:
        returns_df = pd.DataFrame(returns_dict)
        return returns_df.corr()
    return pd.DataFrame()

In [None]:
# Calculate correlations among top performers
if len(performance_df) > 0 and len(top_overall) > 1:
    print("\n🔗 Calculating correlations among top strategies...")
    
    corr_matrix = calculate_strategy_correlations(top_overall, market_data, run_dir)
    
    if not corr_matrix.empty:
        # Find uncorrelated strategies
        uncorrelated_pairs = []
        for i in range(len(corr_matrix)):
            for j in range(i+1, len(corr_matrix)):
                corr_val = corr_matrix.iloc[i, j]
                if abs(corr_val) < correlation_threshold:
                    uncorrelated_pairs.append({
                        'strategy1': corr_matrix.index[i],
                        'strategy2': corr_matrix.columns[j],
                        'correlation': corr_val
                    })
        
        print(f"\n✅ Found {len(uncorrelated_pairs)} uncorrelated pairs (correlation < {correlation_threshold})")
        
        # Visualize correlation matrix
        if len(corr_matrix) <= 20:  # Only plot if reasonable size
            plt.figure(figsize=(10, 8))
            sns.heatmap(corr_matrix, cmap='coolwarm', center=0, vmin=-1, vmax=1, 
                       xticklabels=[h[:8] for h in corr_matrix.columns],
                       yticklabels=[h[:8] for h in corr_matrix.index])
            plt.title('Strategy Correlation Matrix')
            plt.tight_layout()
            plt.show()

## Ensemble Recommendations

In [None]:
# Build optimal ensemble
if len(performance_df) > 0 and 'corr_matrix' in locals() and not corr_matrix.empty:
    # Start with best strategy
    ensemble = [top_overall.iloc[0]['strategy_hash']]
    ensemble_data = [top_overall.iloc[0]]
    
    # Add uncorrelated strategies
    for idx, candidate in top_overall.iloc[1:].iterrows():
        if len(ensemble) >= ensemble_size:
            break
            
        # Check correlation with existing ensemble members
        candidate_hash = candidate['strategy_hash']
        if candidate_hash in corr_matrix.columns:
            max_corr = 0
            for existing in ensemble:
                if existing in corr_matrix.index:
                    corr = abs(corr_matrix.loc[existing, candidate_hash])
                    max_corr = max(max_corr, corr)
            
            if max_corr < correlation_threshold:
                ensemble.append(candidate_hash)
                ensemble_data.append(candidate)
    
    print(f"\n🎯 Recommended Ensemble ({len(ensemble)} strategies):")
    print("=" * 80)
    
    ensemble_df = pd.DataFrame(ensemble_data)
    for idx, row in ensemble_df.iterrows():
        print(f"\n{idx+1}. {row['strategy_type']} - {row['strategy_hash'][:8]}")
        print(f"   Sharpe: {row['sharpe_ratio']:.2f} | Return: {row['total_return']:.1%}")
    
    # Calculate ensemble metrics
    print(f"\nEnsemble Statistics:")
    print(f"  Average Sharpe: {ensemble_df['sharpe_ratio'].mean():.2f}")
    print(f"  Average Return: {ensemble_df['total_return'].mean():.1%}")
    print(f"  Strategy Types: {', '.join(ensemble_df['strategy_type'].unique())}")

## Export Results

In [None]:
# Export recommendations
if len(performance_df) > 0:
    recommendations = {
        'run_info': {
            'run_id': run_dir.name,
            'config_name': config_name,
            'generated_at': datetime.now().isoformat(),
            'total_strategies': len(strategy_index) if strategy_index is not None else 0,
            'strategies_analyzed': len(performance_df)
        },
        'best_individual': {},
        'best_by_type': {},
        'ensemble': []
    }
    
    # Best overall
    if len(top_overall) > 0:
        best = top_overall.iloc[0]
        recommendations['best_individual'] = {
            'strategy_hash': best['strategy_hash'],
            'strategy_type': best['strategy_type'],
            'sharpe_ratio': float(best['sharpe_ratio']),
            'total_return': float(best['total_return']),
            'max_drawdown': float(best['max_drawdown']),
            'parameters': {col.replace('param_', ''): best[col] 
                          for col in best.index if col.startswith('param_') and pd.notna(best[col])}
        }
    
    # Best by type
    for stype in performance_df['strategy_type'].unique():
        type_best = performance_df[performance_df['strategy_type'] == stype].nlargest(1, 'sharpe_ratio')
        if len(type_best) > 0:
            row = type_best.iloc[0]
            recommendations['best_by_type'][stype] = {
                'strategy_hash': row['strategy_hash'],
                'sharpe_ratio': float(row['sharpe_ratio']),
                'total_return': float(row['total_return'])
            }
    
    # Ensemble
    if 'ensemble_df' in locals():
        for idx, row in ensemble_df.iterrows():
            recommendations['ensemble'].append({
                'strategy_hash': row['strategy_hash'],
                'strategy_type': row['strategy_type'],
                'sharpe_ratio': float(row['sharpe_ratio']),
                'weight': 1.0 / len(ensemble_df)  # Equal weight for now
            })
    
    # Save files
    with open(run_dir / 'recommendations.json', 'w') as f:
        json.dump(recommendations, f, indent=2)
    
    performance_df.to_csv(run_dir / 'performance_analysis.csv', index=False)
    
    print("\n✅ Results exported:")
    print(f"  - recommendations.json")
    print(f"  - performance_analysis.csv")
else:
    print("⚠️ No results to export")

## Additional Analysis with Snippets

You can now extend this analysis using pre-built snippets. Examples:

### Exploratory Analysis
```python
%load src/analytics/snippets/exploratory/signal_frequency.py
# Then edit parameters and run

%load src/analytics/snippets/exploratory/parameter_sweep.py
# Analyze specific strategy type parameters
```

### Ensemble Building
```python
%load src/analytics/snippets/ensembles/find_uncorrelated.py
# Advanced correlation analysis

%load src/analytics/snippets/ensembles/optimize_weights.py
# Optimize portfolio weights
```

### Regime Analysis
```python
%load src/analytics/snippets/regime/volatility_regimes.py
# Performance in different volatility environments
```

### Helper Functions
```python
%load src/analytics/snippets/helpers.py
# Load utility functions for custom analysis
```

Each snippet contains editable parameters at the top. Modify them before running to customize the analysis.

## Summary

Analysis complete! Key files generated:
- `recommendations.json` - Best strategies and ensemble recommendations
- `performance_analysis.csv` - Full performance data for all strategies

Next steps:
1. Use the recommended ensemble for live trading
2. Deep dive into specific strategy types if needed
3. Run regime-specific analysis to understand performance drivers