# Notebook 04: Deep-Dive Attribution Analysis
## Vietnam Factor Investing Platform - Phase 7 Institutional Backtesting Framework

**Objective**: Perform comprehensive attribution analysis to understand why the strategy generates alpha, when it works best, and what risks it faces across different market regimes.

**Key Questions to Answer**:
1. Does our alpha survive market corrections or is it purely bull-market beta?
2. Are we inadvertently making concentrated sector bets?
3. What happens to portfolio risk when factor correlations spike?
4. Do drawdowns cluster around specific market events or factor exposures?

**Success Criteria**:
- Demonstrate alpha persistence across different market regimes
- Identify any hidden concentration risks
- Understand factor correlation dynamics and their impact
- Create actionable insights for risk management

In [1]:
# ============================================================================
# Aureus Sigma Capital - Deep-Dive Attribution Analysis
# Notebook: 04_deep_dive_attribution_analysis.ipynb
#
# Description:
# This notebook performs institutional-grade attribution analysis on the QVM
# strategy, examining performance across market regimes, sector contributions,
# factor correlation dynamics, and drawdown characteristics.
#
# Author: Duc Nguyen, Quantitative Finance Expert
# Date: July 26, 2025
# Version: 1.0 - Institutional Attribution Framework
# ============================================================================

# Core imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import pickle
import warnings
from scipy import stats
from typing import Dict, List, Tuple, Optional
import yaml
from sqlalchemy import create_engine
from pathlib import Path

warnings.filterwarnings('ignore')

# --- INSTITUTIONAL PALETTE (from Notebook 03a) ---
FACTOR_COLORS = {
    'Strategy': '#16A085', 'Benchmark': '#34495E', 'Positive':'#27AE60',
    'Negative': '#C0392B', 'Drawdown': '#E67E22', 'Sharpe':'#2980B9',
    'Grid': '#BDC3C7', 'Text_Primary': '#2C3E50', 'Neutral':'#7F8C8D',
    # Additional colors for attribution analysis
    'Bear': '#C0392B', 'Bull': '#27AE60', 'Stress': '#E67E22','Sideways': '#3498DB',
    'Correlation': '#9B59B6', 'Sector1': '#E74C3C', 'Sector2':'#3498DB',
    'Sector3': '#2ECC71', 'Others': '#95A5A6'
}
GRADIENT_PALETTES = {'performance': ['#C0392B', '#FFFFFF','#27AE60']}

# --- ENHANCED VISUALIZATION CONFIGURATION (from Notebook 03a) ---
plt.style.use('default')
plt.rcParams.update({
    'figure.dpi': 300, 'savefig.dpi': 300, 'figure.figsize':(15, 8),
    'figure.facecolor': 'white', 'font.size': 11,
    'axes.facecolor': 'white', 'axes.edgecolor': FACTOR_COLORS['Text_Primary'],
    'axes.linewidth': 1.0, 'axes.grid': True, 'axes.axisbelow':True,
    'axes.labelcolor': FACTOR_COLORS['Text_Primary'],'axes.titlesize': 14,
    'axes.titleweight': 'bold', 'axes.titlecolor': FACTOR_COLORS['Text_Primary'],
    'grid.color': FACTOR_COLORS['Grid'], 'grid.alpha': 0.3, 'grid.linewidth': 0.5,
    'legend.frameon': False, 'legend.fontsize': 10,
    'xtick.color': FACTOR_COLORS['Text_Primary'], 'ytick.color': FACTOR_COLORS['Text_Primary'],
    'xtick.labelsize': 10, 'ytick.labelsize': 10,
    'lines.linewidth': 2.0, 'lines.solid_capstyle': 'round'
})

print("📊 Visualization environment configured with institutional palette.")

print("\n" + "=" * 70)
print("🔬 Aureus Sigma: Deep-Dive Attribution Analysis")
print(f"   Version: 1.0 - Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print("=" * 70)
print("\n📊 Analysis Framework:")
print("   1. Advanced Market Regime Identification")
print("   2. Performance Attribution Matrix")
print("   3. Sector Concentration Analysis")
print("   4. Factor Correlation Dynamics")
print("   5. Drawdown Forensics")
print("-" * 70)

📊 Visualization environment configured with institutional palette.

🔬 Aureus Sigma: Deep-Dive Attribution Analysis
   Version: 1.0 - Date: 2025-07-27 07:28:08

📊 Analysis Framework:
   1. Advanced Market Regime Identification
   2. Performance Attribution Matrix
   3. Sector Concentration Analysis
   4. Factor Correlation Dynamics
   5. Drawdown Forensics
----------------------------------------------------------------------


## 1. Load Core Data and Backtest Results

In [None]:
# # Modified Cell 2: Load Core Data and Backtest Results
# # Add this section after loading the core pickle files

# # Load data from previous notebooks
# project_root = Path.cwd()
# while not (project_root / 'production').exists() and \
#         not (project_root / 'config').exists():
#     if project_root.parent == project_root:
#         raise FileNotFoundError("Could not find project root")
#     project_root = project_root.parent

# data_path = project_root / "production" / "tests" / \
#     "phase7_institutional_backtesting"

# print("📂 Loading core data objects and backtest results...")

# # Load the core data objects
# with open(data_path / "factor_data.pkl", "rb") as f:
#     factor_data_obj = pickle.load(f)
# with open(data_path / "daily_returns.pkl", "rb") as f:
#     returns_data_obj = pickle.load(f)
# with open(data_path / "benchmark_returns.pkl", "rb") as f:
#     benchmark_data_obj = pickle.load(f)

# # Extract data
# factor_data = factor_data_obj['data']
# daily_returns = returns_data_obj['data']
# benchmark_returns = benchmark_data_obj['data']

# # CRITICAL: Load strategy returns from Notebook 03
# # First, check if saved results exist
# strategy_results_file = data_path / \
#     "canonical_backtest_results.pkl"

# if strategy_results_file.exists():
#     print("✅ Loading saved strategy results from Notebook 03...")
#     with open(strategy_results_file, "rb") as f:
#         backtest_results = pickle.load(f)

#     strategy_returns = backtest_results['net_returns']
#     backtest_log = backtest_results['backtest_log']
#     portfolio_holdings = \
#         backtest_results.get('portfolio_holdings', None)

#     print(f"    Strategy returns loaded: {len(strategy_returns)} days")
#     print(f"    Performance: {(strategy_returns.mean() * 252):.2%} annual return")

# else:
#     print("⚠️ WARNING: No saved strategy results found from Notebook 03!")
#     print("    You need to run Notebook 03 first and save the results.")
#     raise FileNotFoundError("Cannot proceed without strategy returns from Notebook 03")

# # Also load sector mappings
# print("\n🏗️ Loading sector information...")
# config_path = project_root / 'config' / 'database.yml'
# with open(config_path, 'r') as f:
#     db_config = yaml.safe_load(f)['production']

# engine = create_engine(
#     f"mysql+pymysql://{db_config['username']}:{db_config['password']}@"
#     f"{db_config['host']}/{db_config['schema_name']}"
# )

# sector_info = pd.read_sql("SELECT ticker, sector FROM master_info WHERE sector IS NOT NULL", engine)
# sector_info = sector_info.drop_duplicates(subset=['ticker']).set_index('ticker')
# engine.dispose()

# print(f"✅ Loaded sector mappings for {len(sector_info)} tickers")

# print("\n✅ All required data loaded successfully")
# print(f"    Date range: {benchmark_returns.index.min().date()} to {benchmark_returns.index.max().date()}")
# print(f"    Total days: {len(benchmark_returns):,}")
# print(f"    Strategy performance: {(strategy_returns.mean() * 252):.2%} annualized")

In [None]:
# ============================================================================
# CELL 2: LOAD CORE DATA AND BACKTEST RESULTS (UPDATED)
# ============================================================================

# Load data from previous notebooks
project_root = Path.cwd()
while not (project_root / 'production').exists() and not (project_root /
                                                           'config').exists():
    if project_root.parent == project_root:
        raise FileNotFoundError("Could not find project root")
    project_root = project_root.parent

data_path = project_root / "production" / "tests" / \
    "phase7_institutional_backtesting"

print("📂 Loading core data objects and backtest results...")

# Load the core data objects
with open(data_path / "factor_data.pkl", "rb") as f:
    factor_data_obj = pickle.load(f)
with open(data_path / "daily_returns.pkl", "rb") as f:
    returns_data_obj = pickle.load(f)
with open(data_path / "benchmark_returns.pkl", "rb") as f:
    benchmark_data_obj = pickle.load(f)

# Extract data
factor_data = factor_data_obj['data']
daily_returns = returns_data_obj['data']
benchmark_returns = benchmark_data_obj['data']

print("✅ Core data loaded successfully")

# CRITICAL: Load complete backtest results from Notebook 03
strategy_results_file = data_path / "canonical_backtest_results.pkl"

if strategy_results_file.exists():
    print("\n✅ Loading complete backtest results from Notebook 03...")
    with open(strategy_results_file, "rb") as f:
        backtest_results = pickle.load(f)

    # Extract all components
    strategy_returns = backtest_results['net_returns']
    backtest_log = backtest_results['backtest_log']
    daily_holdings = backtest_results.get('daily_holdings', None)
    monthly_holdings = backtest_results.get('monthly_holdings', None)

    print(f"    Strategy returns loaded: {len(strategy_returns)} days")
    print(f"    Performance: {(strategy_returns.mean() * 252):.2%} annual return")

    if daily_holdings is not None:
        print(f"    Daily holdings loaded: {daily_holdings.shape}")

    if monthly_holdings is not None:
        print(f"    Monthly holdings loaded: {len(monthly_holdings)} periods")
    else:
        print("    ⚠️ Monthly holdings not found - sector analysis will be limited")

else:
    print("⚠️ WARNING: No saved strategy results found from Notebook 03!")
    print("    You need to run Notebook 03 first and save the results.")
    raise FileNotFoundError("Cannot proceed without strategy returns from Notebook 03")

# Also load sector mappings
print("\n🏗️ Loading sector information...")
config_path = project_root / 'config' / 'database.yml'
with open(config_path, 'r') as f:
    db_config = yaml.safe_load(f)['production']

engine = create_engine(
    f"mysql+pymysql://{db_config['username']}:{db_config['password']}@"
    f"{db_config['host']}/{db_config['schema_name']}"
)

sector_info = pd.read_sql("SELECT ticker, sector FROM master_info WHERE sector IS NOT NULL", engine)
sector_info = sector_info.drop_duplicates(subset=['ticker']).set_index('ticker')
engine.dispose()

print(f"✅ Loaded sector mappings for {len(sector_info)} tickers")

print("\n✅ All required data loaded successfully")
print(f"    Date range: {benchmark_returns.index.min().date()} to {benchmark_returns.index.max().date()}")
print(f"    Total days: {len(benchmark_returns):,}")
print(f"    Strategy performance: {(strategy_returns.mean() * 252):.2%} annualized")

## 2. Advanced Market Regime Identification

In [None]:
def identify_market_regimes(benchmark_returns: pd.Series, 
                          bear_threshold: float = -0.20,
                          vol_window: int = 60,
                          trend_window: int = 200) -> pd.DataFrame:
    """
    Identifies market regimes using multiple criteria:
    - Bear: Drawdown > 20% from peak
    - Stress: Rolling volatility in top quartile
    - Bull: Price above trend MA and not Bear/Stress
    - Sideways: Everything else
    """
    print("🔍 Identifying market regimes...")
    
    # Calculate cumulative returns and drawdowns
    cumulative = (1 + benchmark_returns).cumprod()
    drawdown = (cumulative / cumulative.cummax() - 1)
    
    # 1. Bear Market Regime
    is_bear = drawdown < bear_threshold
    
    # 2. High-Stress Regime (rolling volatility)
    rolling_vol = benchmark_returns.rolling(vol_window).std() * np.sqrt(252)
    vol_75th = rolling_vol.quantile(0.75)
    is_stress = rolling_vol > vol_75th
    
    # 3. Bull/Sideways (trend-based)
    trend_ma = cumulative.rolling(trend_window).mean()
    is_above_trend = cumulative > trend_ma
    
    # Combine into regime classification
    regimes = pd.DataFrame(index=benchmark_returns.index)
    regimes['is_bear'] = is_bear
    regimes['is_stress'] = is_stress
    regimes['is_bull'] = is_above_trend & ~is_bear & ~is_stress
    regimes['is_sideways'] = ~is_above_trend & ~is_bear & ~is_stress
    
    # Create primary regime classification
    regimes['regime'] = 'Undefined'
    regimes.loc[regimes['is_bear'], 'regime'] = 'Bear'
    regimes.loc[regimes['is_stress'] & ~regimes['is_bear'], 'regime'] = 'Stress'
    regimes.loc[regimes['is_bull'], 'regime'] = 'Bull'
    regimes.loc[regimes['is_sideways'], 'regime'] = 'Sideways'
    
    # Summary statistics
    regime_counts = regimes['regime'].value_counts()
    regime_pcts = (regime_counts / len(regimes)) * 100
    
    print("\n📊 Regime Distribution:")
    for regime, pct in regime_pcts.items():
        days = regime_counts[regime]
        print(f"   {regime:10s}: {days:5d} days ({pct:5.1f}%)")
    
    # Add additional metrics
    regimes['drawdown'] = drawdown
    regimes['rolling_vol'] = rolling_vol
    regimes['cumulative_return'] = cumulative
    
    return regimes

# Execute regime identification
market_regimes = identify_market_regimes(benchmark_returns)

# Visualize regime distribution over time
fig, axes = plt.subplots(3, 1, figsize=(15, 10), sharex=True)
fig.suptitle('Market Regime Analysis', fontsize=16, fontweight='bold')

# Plot 1: Cumulative returns with regime shading
ax1 = axes[0]
ax1.plot(market_regimes.index, market_regimes['cumulative_return'], 
         color='black', linewidth=1.5, label='VN-Index')

# Shade different regimes
for regime, color in [('Bear', FACTOR_COLORS['Bear']), 
                     ('Stress', FACTOR_COLORS['Stress']),
                     ('Bull', FACTOR_COLORS['Bull']),
                     ('Sideways', FACTOR_COLORS['Sideways'])]:
    mask = market_regimes['regime'] == regime
    ax1.fill_between(market_regimes.index, 0, market_regimes['cumulative_return'].max(),
                     where=mask, alpha=0.2, color=color, label=regime)

ax1.set_ylabel('Cumulative Return')
ax1.legend(loc='upper left')
ax1.grid(True, alpha=0.3)
ax1.set_title('VN-Index Performance by Market Regime')

# Plot 2: Drawdown
ax2 = axes[1]
ax2.fill_between(market_regimes.index, market_regimes['drawdown'] * 100, 0,
                color=FACTOR_COLORS['Bear'], alpha=0.5)
ax2.axhline(y=-20, color='red', linestyle='--', alpha=0.5, label='Bear Threshold (-20%)')
ax2.set_ylabel('Drawdown (%)')
ax2.legend()
ax2.grid(True, alpha=0.3)
ax2.set_title('Market Drawdowns')

# Plot 3: Rolling volatility
ax3 = axes[2]
ax3.plot(market_regimes.index, market_regimes['rolling_vol'] * 100, 
         color=FACTOR_COLORS['Stress'], linewidth=1.5)
ax3.axhline(y=market_regimes['rolling_vol'].quantile(0.75) * 100, 
            color='orange', linestyle='--', alpha=0.5, label='75th Percentile')
ax3.set_ylabel('60-Day Volatility (%)')
ax3.set_xlabel('Date')
ax3.legend()
ax3.grid(True, alpha=0.3)
ax3.set_title('Rolling Volatility')

plt.tight_layout()
plt.show()

## 3. Performance Attribution Matrix

### Analysis 1: Does Alpha Survive Market Corrections?

In [None]:
def calculate_regime_performance(strategy_returns: pd.Series,
                                 benchmark_returns: pd.Series,
                                 regimes: pd.DataFrame) -> pd.DataFrame:
    """
    Calculate performance metrics for each market regime.
    """
    results = []

    # Overall performance
    overall_metrics = calculate_performance_metrics(strategy_returns, benchmark_returns)
    overall_metrics['Regime'] = 'Overall'
    overall_metrics['Days'] = len(strategy_returns)
    results.append(overall_metrics)

    # Performance by regime
    for regime in ['Bear', 'Stress', 'Bull', 'Sideways']:
        mask = regimes['regime'] == regime
        if mask.sum() > 20:  # Need at least 20 days
            regime_strat = strategy_returns[mask]
            regime_bench = benchmark_returns[mask]

            if len(regime_strat) > 0:
                metrics = calculate_performance_metrics(regime_strat, regime_bench)
                metrics['Regime'] = regime
                metrics['Days'] = len(regime_strat)
                results.append(metrics)

    return pd.DataFrame(results).set_index('Regime')

def calculate_performance_metrics(returns: pd.Series,
                                  benchmark: pd.Series,
                                  risk_free_rate: float = 0.0) -> Dict[str, float]:
    """
    Calculate comprehensive performance metrics.
    """
    # Align series
    common_idx = returns.index.intersection(benchmark.index)
    returns = returns.loc[common_idx]
    benchmark = benchmark.loc[common_idx]

    # Basic metrics
    total_return = (1 + returns).prod() - 1
    n_years = len(returns) / 252
    annual_return = (1 + total_return) ** (1 / n_years) - 1 if n_years > 0 else 0
    annual_vol = returns.std() * np.sqrt(252)
    sharpe_ratio = (annual_return - risk_free_rate) / annual_vol if annual_vol > 0 else 0

    # Drawdown
    cumulative = (1 + returns).cumprod()
    drawdown = (cumulative / cumulative.cummax() - 1)
    max_drawdown = drawdown.min()

    # vs Benchmark
    excess_returns = returns - benchmark
    tracking_error = excess_returns.std() * np.sqrt(252)
    information_ratio = (excess_returns.mean() * 252) / tracking_error if tracking_error > 0 else 0

    # Win rate
    win_rate = (returns > 0).mean()
    win_vs_bench = (returns > benchmark).mean()

    return {
        'Annual Return (%)': annual_return * 100,
        'Annual Vol (%)': annual_vol * 100,
        'Sharpe Ratio': sharpe_ratio,
        'Max Drawdown (%)': max_drawdown * 100,
        'Win Rate (%)': win_rate * 100,
        'Win vs Bench (%)': win_vs_bench * 100,
        'Information Ratio': information_ratio,
        'Annual Alpha (%)': (annual_return - benchmark.mean() * 252) * 100
    }

# Using ACTUAL strategy returns from Notebook 03
print("✅ Using actual strategy returns from canonical backtest")
print(f"    Strategy performance: {(strategy_returns.mean() * 252):.2%} annualized")

# Calculate regime performance with real data
regime_performance = calculate_regime_performance(strategy_returns, benchmark_returns, market_regimes)

# Display results
print("\n🎯 PERFORMANCE BY MARKET REGIME:")
print("=" * 100)
display(regime_performance.round(2))

# Visualize regime performance
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle('Strategy Performance Across Market Regimes', fontsize=16, fontweight='bold')

# Annual returns by regime
ax1 = axes[0, 0]
regime_returns = regime_performance['Annual Return (%)'].drop('Overall')
colors = [FACTOR_COLORS[r] for r in regime_returns.index]
bars = ax1.bar(regime_returns.index, regime_returns.values, color=colors)
ax1.axhline(y=0, color='black', linewidth=0.5)
ax1.set_title('Annual Returns by Regime')
ax1.set_ylabel('Annual Return (%)')
ax1.grid(True, alpha=0.3, axis='y')

# Sharpe ratios by regime
ax2 = axes[0, 1]
regime_sharpes = regime_performance['Sharpe Ratio'].drop('Overall')
bars = ax2.bar(regime_sharpes.index, regime_sharpes.values, color=colors)
ax2.axhline(y=1.0, color='green', linestyle='--', alpha=0.5, label='Sharpe = 1.0')
ax2.set_title('Sharpe Ratios by Regime')
ax2.set_ylabel('Sharpe Ratio')
ax2.legend()
ax2.grid(True, alpha=0.3, axis='y')

# Win rates
ax3 = axes[1, 0]
win_data = regime_performance[['Win Rate (%)', 'Win vs Bench (%)']].drop('Overall')
win_data.plot(kind='bar', ax=ax3,
              color=[FACTOR_COLORS['Strategy'], FACTOR_COLORS['Benchmark']])
ax3.axhline(y=50, color='black', linestyle='--', alpha=0.5)
ax3.set_title('Win Rates by Regime')
ax3.set_ylabel('Win Rate (%)')
ax3.legend(['Daily Win Rate', 'Win vs Benchmark'])
ax3.grid(True, alpha=0.3, axis='y')

# Information ratios
ax4 = axes[1, 1]
regime_ir = regime_performance['Information Ratio'].drop('Overall')
bars = ax4.bar(regime_ir.index, regime_ir.values, color=colors)
ax4.axhline(y=0, color='black', linewidth=0.5)
ax4.set_title('Information Ratios by Regime')
ax4.set_ylabel('Information Ratio')
ax4.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

# Key insights based on ACTUAL performance
print("\n💡 KEY INSIGHTS FROM ACTUAL STRATEGY PERFORMANCE:")
if 'Bear' in regime_performance.index and \
   regime_performance.loc['Bear', 'Annual Return (%)'] > 0:
    print("    ✅ Strategy maintains positive returns even in Bear markets")
else:
    print("    ⚠️ Strategy struggles in Bear markets - consider defensive overlays")

if 'Stress' in regime_performance.index and \
   regime_performance.loc['Stress', 'Sharpe Ratio'] > 0.5:
    print("    ✅ Strategy shows resilience during high-stress periods")
else:
    print("    ⚠️ High volatility periods significantly impact risk-adjusted returns")

# Additional insight based on overall performance
overall_sharpe = regime_performance.loc['Overall', 'Sharpe Ratio']
print(f"\n    📊 Overall Strategy Sharpe Ratio: {overall_sharpe:.2f}")
if overall_sharpe > 1.5:
    print("    ✅ Excellent risk-adjusted returns across full period")
elif overall_sharpe > 1.0:
    print("    ✅ Good risk-adjusted returns, consistent with institutional standards")
else:
    print("    ⚠️ Risk-adjusted returns below institutional targets")

### Analysis 2: Sector Concentration Analysis

In [None]:
# Load sector mappings
print("🏗️ Loading sector information...")

config_path = project_root / 'config' / 'database.yml'
with open(config_path, 'r') as f:
    db_config = yaml.safe_load(f)['production']
    
engine = create_engine(
    f"mysql+pymysql://{db_config['username']}:{db_config['password']}@"
    f"{db_config['host']}/{db_config['schema_name']}"
)

sector_info = pd.read_sql("SELECT ticker, sector FROM master_info WHERE sector IS NOT NULL", engine)
sector_info = sector_info.drop_duplicates(subset=['ticker']).set_index('ticker')
engine.dispose()

print(f"✅ Loaded sector mappings for {len(sector_info)} tickers")

# Note: In production, we would have portfolio holdings from Notebook 03
# For demonstration, we'll simulate sector exposures
print("\n⚠️ Note: Portfolio holdings should be loaded from Notebook 03")
print("   Creating demonstration sector analysis...")

# Analyze sector concentration over time
def analyze_sector_concentration(portfolio_weights: pd.DataFrame, 
                               sector_info: pd.DataFrame,
                               top_n: int = 3) -> pd.DataFrame:
    """
    Analyze sector concentration in the portfolio over time.
    """
    # Calculate sector weights for each rebalancing period
    sector_weights = pd.DataFrame(index=portfolio_weights.index)
    
    for date in portfolio_weights.index:
        # Get holdings for this date
        holdings = portfolio_weights.loc[date]
        holdings = holdings[holdings > 0]
        
        # Map to sectors
        sector_exposure = holdings.to_frame('weight').join(sector_info)
        
        # Calculate sector weights
        sector_sums = sector_exposure.groupby('sector')['weight'].sum()
        
        # Store top sectors
        top_sectors = sector_sums.nlargest(top_n)
        for i, (sector, weight) in enumerate(top_sectors.items()):
            sector_weights.loc[date, f'Sector_{i+1}'] = weight
            sector_weights.loc[date, f'Sector_{i+1}_Name'] = sector
            
        sector_weights.loc[date, 'Others'] = sector_sums.sum() - top_sectors.sum()
        sector_weights.loc[date, 'HHI'] = (sector_sums ** 2).sum()  # Herfindahl index
    
    return sector_weights

# Create visualization of sector concentration
fig, axes = plt.subplots(2, 1, figsize=(15, 10), sharex=True)
fig.suptitle('Portfolio Sector Concentration Analysis', fontsize=16, fontweight='bold')

# For demonstration, create synthetic sector weights
dates = pd.date_range(start=benchmark_returns.index[0], end=benchmark_returns.index[-1], freq='M')
synthetic_sectors = pd.DataFrame(index=dates)
synthetic_sectors['Financials'] = 0.25 + np.random.normal(0, 0.05, len(dates))
synthetic_sectors['Real Estate'] = 0.20 + np.random.normal(0, 0.04, len(dates))
synthetic_sectors['Industrials'] = 0.15 + np.random.normal(0, 0.03, len(dates))
synthetic_sectors['Others'] = 1 - synthetic_sectors[['Financials', 'Real Estate', 'Industrials']].sum(axis=1)

# Plot 1: Stacked area chart of sector weights
ax1 = axes[0]
ax1.stackplot(synthetic_sectors.index, 
              synthetic_sectors['Financials'],
              synthetic_sectors['Real Estate'],
              synthetic_sectors['Industrials'],
              synthetic_sectors['Others'],
              labels=['Financials', 'Real Estate', 'Industrials', 'Others'],
              colors=[FACTOR_COLORS['Sector1'], FACTOR_COLORS['Sector2'], 
                     FACTOR_COLORS['Sector3'], FACTOR_COLORS['Others']],
              alpha=0.8)
ax1.set_ylabel('Portfolio Weight')
ax1.set_title('Sector Allocation Over Time')
ax1.legend(loc='upper left', bbox_to_anchor=(1, 1))
ax1.grid(True, alpha=0.3)
ax1.set_ylim(0, 1)

# Plot 2: Concentration metrics
ax2 = axes[1]
concentration = synthetic_sectors[['Financials', 'Real Estate', 'Industrials']].max(axis=1)
ax2.plot(synthetic_sectors.index, concentration * 100, 
         color=FACTOR_COLORS['Correlation'], linewidth=2, label='Max Sector Weight')
ax2.axhline(y=40, color='red', linestyle='--', alpha=0.5, label='40% Limit')
ax2.set_ylabel('Concentration (%)')
ax2.set_xlabel('Date')
ax2.set_title('Maximum Sector Concentration')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Calculate sector contribution to returns
print("\n📊 SECTOR CONCENTRATION METRICS:")
print(f"   Average max sector weight: {(concentration * 100).mean():.1f}%")
print(f"   Maximum sector weight reached: {(concentration * 100).max():.1f}%")
print(f"   Times above 35% concentration: {(concentration > 0.35).sum()}")

print("\n💡 CONCENTRATION RISK ASSESSMENT:")
if (concentration * 100).mean() > 30:
    print("   ⚠️ High sector concentration detected - diversification may be suboptimal")
else:
    print("   ✅ Sector concentration within reasonable bounds")

In [None]:
# ==================================================
# ==========================
# CELL 5 REFACTORED: Sector Concentration Analysis with ACTUAL Portfolio Holdings
# ==================================================
# ==========================

# Load sector mappings (this part is correct)
print("🏗️ Loading sector information...")

config_path = project_root / 'config' / \
    'database.yml'
with open(config_path, 'r') as f:
    db_config = yaml.safe_load(f)['production']

engine = create_engine(
    f"mysql+pymysql://{db_config['username']}:{db_config['password']}@"
    f"{db_config['host']}/{db_config['schema_name']}"
)

sector_info = pd.read_sql("SELECT ticker, sector FROM master_info WHERE sector IS NOT NULL", engine)
sector_info = sector_info.drop_duplicates(subset=['ticker']).set_index('ticker')
engine.dispose()

print(f"✅ Loaded sector mappings for {len(sector_info)} tickers")

# ==================================================
# ==========================
# CRITICAL: Check for ACTUAL portfolio holdings from Notebook 03
# ==================================================
# ==========================

print("\n📊 Checking for actual portfolio holdings from canonical backtest...")

# Check if we have portfolio holdings in the backtest results
if 'portfolio_holdings' in backtest_results and \
        backtest_results['portfolio_holdings'] is not None:
    portfolio_holdings = backtest_results['portfolio_holdings']
    print(f"✅ Portfolio holdings loaded: {portfolio_holdings.shape}")
else:
    print("⚠️ Portfolio holdings not found in saved results.")
    print("    Performing alternative analysis using available data...")

# ==================================================
# ==========================
# Alternative: Analyze sector composition of factor universe and selections
# ==================================================
# ==========================

print("\n🔄 Analyzing sector composition of investment universe and selections...")

# Get the QVM scores
qvm_scores = factor_data.loc[:, ('qvm_composite_score', slice(None))]
qvm_scores.columns = qvm_scores.columns.droplevel(0)

# Get the latest factor scores to see current universe composition
latest_date = qvm_scores.index[-1]
latest_scores = qvm_scores.loc[latest_date].dropna()

print(f"📊 Universe Analysis for {latest_date.date()}:")
print(f"    Total stocks with factor scores: {len(latest_scores)}")

# Analyze sector composition of the investment universe
universe_sectors = \
    latest_scores.to_frame('score').join(sector_info, how='inner')
sector_composition = universe_sectors.groupby('sector').size().sort_values(ascending=False)

print(f"\n🏗️ Sector Composition of Investment Universe:")
for sector, count in sector_composition.head(10).items():
    pct = (count / len(universe_sectors)) * 100
    print(f"    {sector:25s}: {count:3d} stocks ({pct:5.1f}%)")

# Analyze top quintile (portfolio selection simulation)
top_quintile_cutoff = latest_scores.quantile(0.8)  # Top 20%
top_quintile_stocks = latest_scores[latest_scores >= top_quintile_cutoff]

top_quintile_sectors = top_quintile_stocks.to_frame('score').join(sector_info, how='inner')
selected_sector_comp = top_quintile_sectors.groupby('sector').size().sort_values(ascending=False)

print(f"\n🎯 Top Quintile Portfolio Simulation ({len(top_quintile_stocks)} stocks):")
for sector, count in selected_sector_comp.head(8).items():
    pct = (count / len(top_quintile_sectors)) * 100
    print(f"    {sector:25s}: {count:3d} stocks ({pct:5.1f}%)")

# Visualize sector analysis
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('Portfolio Sector Concentration Analysis', fontsize=16, fontweight='bold')

# Plot 1: Universe sector composition (pie chart)
ax1 = axes[0, 0]
top_8_sectors = sector_composition.head(8)
ax1.pie(top_8_sectors.values,
        labels=top_8_sectors.index,
        autopct='%1.1f%%',
        startangle=90,
        colors=[FACTOR_COLORS['Sector1'], FACTOR_COLORS['Sector2'],
                FACTOR_COLORS['Sector3']] + [FACTOR_COLORS['Others']] * 5)
ax1.set_title('Investment Universe\nSector Distribution')

# Plot 2: Top quintile sector composition
ax2 = axes[0, 1]
bars = ax2.bar(range(len(selected_sector_comp.head(8))),
               selected_sector_comp.head(8).values,
               color=[FACTOR_COLORS['Sector1'], FACTOR_COLORS['Sector2'],
                      FACTOR_COLORS['Sector3']] + [FACTOR_COLORS['Others']] * 5)
ax2.set_title('Top Quintile Selection\nSector Composition')
ax2.set_ylabel('Number of Stocks')
ax2.set_xticks(range(len(selected_sector_comp.head(8))))
ax2.set_xticklabels(selected_sector_comp.head(8).index, rotation=45, ha='right')
ax2.grid(True, alpha=0.3, axis='y')

# Plot 3: Concentration risk comparison
ax3 = axes[1, 0]
universe_pcts = (sector_composition.head(5) / len(universe_sectors)) * 100
selection_pcts = (selected_sector_comp.head(5) / len(top_quintile_sectors)) * 100

x = np.arange(len(universe_pcts))
width = 0.35

ax3.bar(x - width/2, universe_pcts.values, width, label='Universe',
        color=FACTOR_COLORS['Benchmark'], alpha=0.7)
ax3.bar(x + width/2, selection_pcts.values, width, label='Selection',
        color=FACTOR_COLORS['Strategy'], alpha=0.7)

ax3.axhline(y=40, color='red', linestyle='--', alpha=0.5, label='40% Risk Limit')
ax3.set_ylabel('Sector Weight (%)')
ax3.set_title('Sector Concentration Comparison')
ax3.set_xticks(x)
ax3.set_xticklabels(universe_pcts.index, rotation=45, ha='right')
ax3.legend()
ax3.grid(True, alpha=0.3, axis='y')

# Plot 4: Factor score distribution by sector
ax4 = axes[1, 1]
sector_scores = universe_sectors.groupby('sector')['score'].agg(['mean', 'std']).sort_values('mean', ascending=False)
top_sectors = sector_scores.head(8)

ax4.errorbar(range(len(top_sectors)), top_sectors['mean'],
             yerr=top_sectors['std'], fmt='o', capsize=5,
             color=FACTOR_COLORS['Strategy'], linewidth=2)
ax4.set_title('Average Factor Scores by Sector')
ax4.set_ylabel('QVM Composite Score')
ax4.set_xticks(range(len(top_sectors)))
ax4.set_xticklabels(top_sectors.index, rotation=45, ha='right')
ax4.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# ==================================================
# ==========================
# Concentration Risk Assessment
# ==================================================
# ==========================

print(f"\n💡 SECTOR CONCENTRATION RISK ASSESSMENT:")
print("=" * 60)

# Top sector concentration
top_sector_pct = (selected_sector_comp.iloc[0] / len(top_quintile_sectors)) * 100
print(f"Largest sector in selection: {selected_sector_comp.index[0]} ({top_sector_pct:.1f}%)")

# Top 3 sectors concentration
top_3_pct = (selected_sector_comp.head(3).sum() / len(top_quintile_sectors)) * 100
print(f"Top 3 sectors combined: {top_3_pct:.1f}%")

# Risk assessment
if top_sector_pct > 40:
    print("    ❌ HIGH RISK: Top sector exceeds 40% constraint")
elif top_sector_pct > 30:
    print("    ⚠️ MODERATE RISK: Top sector approaching concentration limit")
else:
    print("    ✅ LOW RISK: Sector concentration within acceptable bounds")

if top_3_pct > 70:
    print("    ⚠️ HIGH CONCENTRATION: Top 3 sectors dominate portfolio")
else:
    print("    ✅ DIVERSIFIED: Top 3 sectors show reasonable spread")

# Herfindahl index (concentration measure)
sector_weights = selected_sector_comp / len(top_quintile_sectors)
hhi = (sector_weights ** 2).sum()
print(f"\nHerfindahl Index: {hhi:.3f}")
if hhi > 0.25:
    print("    ⚠️ High concentration (HHI > 0.25)")
elif hhi > 0.15:
    print("    ✅ Moderate concentration (0.15 < HHI < 0.25)")
else:
    print("    ✅ Well diversified (HHI < 0.15)")

print(f"\n📋 NEXT STEPS:")
print("    1. ✅ Universe sector analysis complete")
print("    2. ⚠️ Need actual monthly portfolio holdings for time-series analysis")
print("    3. ⚠️ Need to validate 40% sector constraint compliance over time")
print("    4. ➡️ Proceed to factor correlation dynamics (Cell 6)")

In [None]:
# ==================================================
# ==========================
# CELL 5 CORRECTED: Sector Concentration Analysis with ACTUAL Portfolio Holdings
# ==================================================
# ==========================

# Load sector mappings
print("🏗️ Loading sector information...")

config_path = project_root / 'config' / \
    'database.yml'
with open(config_path, 'r') as f:
    db_config = yaml.safe_load(f)['production']

engine = create_engine(
    f"mysql+pymysql://{db_config['username']}:{db_config['password']}@"
    f"{db_config['host']}/{db_config['schema_name']}"
)

sector_info = pd.read_sql("SELECT ticker, sector FROM master_info WHERE sector IS NOT NULL", engine)
sector_info = sector_info.drop_duplicates(subset=['ticker']).set_index('ticker')
engine.dispose()

print(f"✅ Loaded sector mappings for {len(sector_info)} tickers")

# ==================================================
# ==========================
# Load ACTUAL portfolio holdings from Notebook 03
# ==================================================
# ==========================

print("\n📊 Loading actual portfolio holdings from canonical backtest...")

if 'monthly_holdings' in backtest_results and \
        backtest_results['monthly_holdings'] is not None:
    monthly_holdings = backtest_results['monthly_holdings']
    print(f"✅ Monthly portfolio holdings loaded: {len(monthly_holdings)} periods")

    # Analyze actual sector concentration over time
    def analyze_actual_sector_concentration(monthly_holdings, sector_info, top_n=3):
        """Analyze actual sector concentration using real portfolio holdings."""
        print("🔍 Analyzing actual sector concentration over time...")

        results = []

        for date, holdings in monthly_holdings.items():
            if len(holdings) > 10:  # Need reasonable portfolio size
                # Map holdings to sectors
                holdings_df = holdings.to_frame('weight')
                sector_exposure = holdings_df.join(sector_info, how='inner')

                if len(sector_exposure) > 0:
                    # Calculate sector weights
                    sector_weights = \
                        sector_exposure.groupby('sector')['weight'].sum()

                    # Store results
                    result = {
                        'Date': date,
                        'Total_Positions': len(holdings),
                        'Mapped_Positions': len(sector_exposure),
                        'Num_Sectors': len(sector_weights)
                    }

                    # Top sectors
                    top_sectors = sector_weights.nlargest(top_n)
                    for i, (sector, weight) in enumerate(top_sectors.items()):
                        result[f'Top_{i+1}_Sector'] = sector
                        result[f'Top_{i+1}_Weight'] = weight

                    # Concentration metrics
                    result['Max_Sector_Weight'] = sector_weights.max()
                    result['Top_3_Weight'] = top_sectors.sum()
                    result['HHI'] = (sector_weights ** 2).sum()

                    results.append(result)

        return pd.DataFrame(results).set_index('Date')

    # Perform the analysis
    sector_analysis = analyze_actual_sector_concentration(monthly_holdings, sector_info)

    if len(sector_analysis) > 0:
        print(f"✅ Sector analysis complete: {len(sector_analysis)} periods analyzed")

        # Display summary statistics
        print(f"\n📊 ACTUAL PORTFOLIO SECTOR CONCENTRATION METRICS:")
        print("=" * 70)
        print(f"Average portfolio size: {sector_analysis['Total_Positions'].mean():.0f} stocks")
        print(f"Average sectors represented: {sector_analysis['Num_Sectors'].mean():.0f}")
        print(f"Average max sector weight: {sector_analysis['Max_Sector_Weight'].mean():.1%}")
        print(f"Average top-3 concentration: {sector_analysis['Top_3_Weight'].mean():.1%}")
        print(f"Average HHI: {sector_analysis['HHI'].mean():.3f}")

        # Visualize actual sector concentration over time
        fig, axes = plt.subplots(3, 1, figsize=(15, 12), sharex=True)
        fig.suptitle('Actual Portfolio Sector Concentration Analysis', fontsize=16,
                     fontweight='bold')

        # Plot 1: Maximum sector weight over time
        ax1 = axes[0]
        ax1.plot(sector_analysis.index,
                 sector_analysis['Max_Sector_Weight'] * 100,
                 color=FACTOR_COLORS['Strategy'], linewidth=2, label='Max Sector Weight')
        ax1.axhline(y=40, color='red', linestyle='--', alpha=0.7, label='40% Constraint')
        ax1.axhline(y=30, color='orange', linestyle='--', alpha=0.5, label='30% Warning')
        ax1.set_ylabel('Max Sector Weight (%)')
        ax1.set_title('Maximum Sector Concentration Over Time')
        ax1.legend()
        ax1.grid(True, alpha=0.3)

        # Plot 2: Top 3 sector concentration
        ax2 = axes[1]
        ax2.plot(sector_analysis.index,
                 sector_analysis['Top_3_Weight'] * 100,
                 color=FACTOR_COLORS['Correlation'], linewidth=2, label='Top 3 Sectors')
        ax2.axhline(y=70, color='red', linestyle='--', alpha=0.7, label='70% High Risk')
        ax2.set_ylabel('Top 3 Sectors Weight (%)')
        ax2.set_title('Top 3 Sector Concentration')
        ax2.legend()
        ax2.grid(True, alpha=0.3)

        # Plot 3: Herfindahl Index (diversification measure)
        ax3 = axes[2]
        ax3.plot(sector_analysis.index,
                 sector_analysis['HHI'],
                 color=FACTOR_COLORS['Sharpe'], linewidth=2, label='HHI')
        ax3.axhline(y=0.25, color='red', linestyle='--', alpha=0.7, label='High Concentration')
        ax3.axhline(y=0.15, color='orange', linestyle='--', alpha=0.5, label='Moderate Concentration')
        ax3.set_ylabel('Herfindahl Index')
        ax3.set_xlabel('Date')
        ax3.set_title('Portfolio Diversification (Lower HHI = More Diversified)')
        ax3.legend()
        ax3.grid(True, alpha=0.3)

        plt.tight_layout()
        plt.show()

        # Risk assessment
        max_concentration = sector_analysis['Max_Sector_Weight'].max()
        violations_40 = \
            (sector_analysis['Max_Sector_Weight'] > 0.40).sum()
        violations_30 = \
            (sector_analysis['Max_Sector_Weight'] > 0.30).sum()

        print(f"\n💡 CONCENTRATION RISK ASSESSMENT:")
        print("=" * 50)
        print(f"Peak sector concentration: {max_concentration:.1%}")
        print(f"Periods above 40% limit: {violations_40} ({violations_40/len(sector_analysis)*100:.1f}%)")
        print(f"Periods above 30% warning: {violations_30} ({violations_30/len(sector_analysis)*100:.1f}%)")

        if violations_40 > 0:
            print("    ❌ CONSTRAINT VIOLATION: Portfolio exceeded 40% sector limit")
        elif violations_30 > 0:
            print("    ⚠️ ELEVATED RISK: Portfolio approached concentration limits")
        else:
            print("    ✅ COMPLIANT: Sector concentration within acceptable bounds")

        # Most concentrated sectors
        top_sectors_ever = {}
        for _, row in sector_analysis.iterrows():
            for i in range(1, 4):
                sector = row[f'Top_{i}_Sector']
                weight = row[f'Top_{i}_Weight']
                if pd.notna(sector):
                    if sector not in top_sectors_ever:
                        top_sectors_ever[sector] = \
                            []

                    top_sectors_ever[sector].append(weight)

        avg_weights = {sector: np.mean(weights) for
                       sector, weights in top_sectors_ever.items()}
        top_avg_sectors = \
            sorted(avg_weights.items(), key=lambda x: x[1],
                   reverse=True)[:5]

        print(f"\n🏗️ MOST CONCENTRATED SECTORS (Average Weight):")
        for sector, avg_weight in top_avg_sectors:
            print(f"    {sector:25s}: {avg_weight:.1%}")

    else:
        print("❌ No valid portfolio periods found for sector analysis")

else:
    print("❌ No monthly holdings found in backtest results")
    print("    Please run the updated Notebook 03 first to save portfolio holdings")

### Analysis 3: Factor Correlation Dynamics

In [None]:
def analyze_factor_correlation_dynamics(factor_returns: Dict[str, pd.Series],
                                      window: int = 90,
                                      correlation_threshold: float = 0.7) -> pd.DataFrame:
    """
    Analyze rolling correlations between factors and identify high-correlation periods.
    """
    print(f"🔍 Analyzing factor correlation dynamics (window={window} days)...")
    
    # Calculate rolling correlations
    correlations = pd.DataFrame(index=factor_returns['Quality'].index)
    
    # Pairwise correlations
    pairs = [('Quality', 'Value'), ('Quality', 'Momentum'), ('Value', 'Momentum')]
    
    for factor1, factor2 in pairs:
        rolling_corr = factor_returns[factor1].rolling(window).corr(factor_returns[factor2])
        correlations[f'{factor1}_vs_{factor2}'] = rolling_corr
    
    # Average correlation
    correlations['avg_correlation'] = correlations.mean(axis=1)
    
    # Identify high correlation periods
    correlations['high_corr_period'] = correlations['avg_correlation'] > correlation_threshold
    
    return correlations

# Load factor returns from Notebook 02 results
# For demonstration, create synthetic factor returns
print("⚠️ Note: Factor returns should be loaded from Notebook 02")
print("   Creating demonstration analysis...")

# Synthetic factor returns for demonstration
factor_returns = {
    'Quality': benchmark_returns + np.random.normal(0.0001, 0.002, len(benchmark_returns)),
    'Value': benchmark_returns + np.random.normal(0.0002, 0.0025, len(benchmark_returns)),
    'Momentum': -benchmark_returns * 0.5 + np.random.normal(0, 0.003, len(benchmark_returns))
}

# Analyze correlations
correlation_dynamics = analyze_factor_correlation_dynamics(factor_returns)

# Visualization
fig, axes = plt.subplots(3, 1, figsize=(15, 12), sharex=True)
fig.suptitle('Factor Correlation Dynamics and Portfolio Impact', fontsize=16, fontweight='bold')

# Plot 1: Rolling correlations
ax1 = axes[0]
for col in ['Quality_vs_Value', 'Quality_vs_Momentum', 'Value_vs_Momentum']:
    ax1.plot(correlation_dynamics.index, correlation_dynamics[col], 
             linewidth=2, alpha=0.8, label=col.replace('_', ' '))
ax1.axhline(y=0.7, color='red', linestyle='--', alpha=0.5, label='High Correlation Threshold')
ax1.set_ylabel('Correlation')
ax1.set_title('90-Day Rolling Factor Correlations')
ax1.legend()
ax1.grid(True, alpha=0.3)
ax1.set_ylim(-1, 1)

# Plot 2: Average correlation with high-correlation periods shaded
ax2 = axes[1]
ax2.plot(correlation_dynamics.index, correlation_dynamics['avg_correlation'], 
         color=FACTOR_COLORS['Correlation'], linewidth=2)
ax2.fill_between(correlation_dynamics.index, 0, 1, 
                 where=correlation_dynamics['high_corr_period'],
                 color='red', alpha=0.2, label='High Correlation Periods')
ax2.axhline(y=0.7, color='red', linestyle='--', alpha=0.5)
ax2.set_ylabel('Average Correlation')
ax2.set_title('Average Factor Correlation with High-Risk Periods')
ax2.legend()
ax2.grid(True, alpha=0.3)
ax2.set_ylim(0, 1)

# Plot 3: Strategy volatility
ax3 = axes[2]
strategy_vol = strategy_returns.rolling(60).std() * np.sqrt(252) * 100
ax3.plot(strategy_vol.index, strategy_vol, color=FACTOR_COLORS['Strategy'], linewidth=2)
# Shade high correlation periods
ax3.fill_between(correlation_dynamics.index, 0, strategy_vol.max(),
                where=correlation_dynamics['high_corr_period'],
                color='red', alpha=0.2, label='High Correlation Periods')
ax3.set_ylabel('Volatility (%)')
ax3.set_xlabel('Date')
ax3.set_title('Strategy Volatility vs Factor Correlation Regimes')
ax3.legend()
ax3.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Analysis of high correlation periods
high_corr_periods = correlation_dynamics['high_corr_period'].sum()
total_periods = len(correlation_dynamics.dropna())
high_corr_pct = (high_corr_periods / total_periods) * 100

print("\n📊 FACTOR CORRELATION ANALYSIS:")
print(f"   High correlation periods: {high_corr_periods} days ({high_corr_pct:.1f}% of time)")
print(f"   Average correlation: {correlation_dynamics['avg_correlation'].mean():.3f}")
print(f"   Maximum correlation reached: {correlation_dynamics['avg_correlation'].max():.3f}")

# Performance during high vs normal correlation periods
if high_corr_periods > 0:
    high_corr_returns = strategy_returns[correlation_dynamics['high_corr_period'].fillna(False)]
    normal_corr_returns = strategy_returns[~correlation_dynamics['high_corr_period'].fillna(True)]
    
    high_corr_sharpe = (high_corr_returns.mean() / high_corr_returns.std()) * np.sqrt(252)
    normal_corr_sharpe = (normal_corr_returns.mean() / normal_corr_returns.std()) * np.sqrt(252)
    
    print(f"\n   Sharpe during high correlation: {high_corr_sharpe:.2f}")
    print(f"   Sharpe during normal correlation: {normal_corr_sharpe:.2f}")
    
    print("\n💡 CORRELATION RISK ASSESSMENT:")
    if high_corr_sharpe < normal_corr_sharpe * 0.7:
        print("   ⚠️ Strategy performance significantly degraded during high correlation periods")
        print("   → Consider dynamic factor weighting or correlation-based risk overlay")
    else:
        print("   ✅ Strategy maintains reasonable performance during high correlation periods")

### Analysis 4: Drawdown Forensics

In [None]:
def analyze_drawdowns(returns: pd.Series, 
                     market_regimes: pd.DataFrame,
                     factor_correlations: pd.DataFrame,
                     n_worst: int = 5) -> pd.DataFrame:
    """
    Perform forensic analysis on the worst drawdowns.
    """
    print(f"🔍 Analyzing top {n_worst} drawdowns...")
    
    # Calculate drawdown series
    cumulative = (1 + returns).cumprod()
    running_max = cumulative.cummax()
    drawdown = (cumulative / running_max - 1)
    
    # Identify drawdown periods
    drawdown_periods = []
    in_drawdown = False
    start_date = None
    
    for date, dd in drawdown.items():
        if dd < 0 and not in_drawdown:
            in_drawdown = True
            start_date = date
            peak_value = running_max[date]
        elif dd == 0 and in_drawdown:
            in_drawdown = False
            end_date = date
            trough_date = drawdown[start_date:end_date].idxmin()
            trough_value = drawdown[trough_date]
            
            drawdown_periods.append({
                'start_date': start_date,
                'trough_date': trough_date,
                'end_date': end_date,
                'max_drawdown': trough_value,
                'duration_days': (end_date - start_date).days,
                'recovery_days': (end_date - trough_date).days
            })
    
    # Sort by magnitude and select worst
    drawdown_df = pd.DataFrame(drawdown_periods)
    if len(drawdown_df) > 0:
        drawdown_df = drawdown_df.nlargest(n_worst, 'max_drawdown', keep='all')
        drawdown_df['max_drawdown'] = drawdown_df['max_drawdown'] * 100  # Convert to percentage
        
        # Add regime and correlation information
        for idx, row in drawdown_df.iterrows():
            period_mask = (market_regimes.index >= row['start_date']) & (market_regimes.index <= row['end_date'])
            
            # Dominant regime during drawdown
            regime_counts = market_regimes.loc[period_mask, 'regime'].value_counts()
            drawdown_df.loc[idx, 'dominant_regime'] = regime_counts.index[0] if len(regime_counts) > 0 else 'Unknown'
            
            # Average correlation during drawdown
            avg_corr = factor_correlations.loc[period_mask, 'avg_correlation'].mean()
            drawdown_df.loc[idx, 'avg_correlation'] = avg_corr
    
    return drawdown_df

# Perform drawdown analysis
drawdown_analysis = analyze_drawdowns(strategy_returns, market_regimes, correlation_dynamics)

print("\n📊 TOP DRAWDOWN PERIODS:")
print("=" * 120)
if len(drawdown_analysis) > 0:
    display(drawdown_analysis[['start_date', 'trough_date', 'end_date', 'max_drawdown', 
                              'duration_days', 'dominant_regime', 'avg_correlation']].round(2))
else:
    print("No significant drawdowns found.")

# Visualize drawdown characteristics
if len(drawdown_analysis) > 0:
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    fig.suptitle('Drawdown Forensics', fontsize=16, fontweight='bold')
    
    # Drawdown magnitude by regime
    ax1 = axes[0, 0]
    regime_groups = drawdown_analysis.groupby('dominant_regime')['max_drawdown'].mean().abs()
    colors = [FACTOR_COLORS.get(r, 'gray') for r in regime_groups.index]
    regime_groups.plot(kind='bar', ax=ax1, color=colors)
    ax1.set_title('Average Drawdown by Market Regime')
    ax1.set_ylabel('Average Max Drawdown (%)')
    ax1.set_xlabel('Dominant Regime')
    ax1.grid(True, alpha=0.3, axis='y')
    
    # Drawdown duration
    ax2 = axes[0, 1]
    ax2.scatter(drawdown_analysis['duration_days'], 
               drawdown_analysis['max_drawdown'].abs(),
               s=100, alpha=0.6, color=FACTOR_COLORS['Drawdown'])
    ax2.set_title('Drawdown Magnitude vs Duration')
    ax2.set_xlabel('Duration (Days)')
    ax2.set_ylabel('Max Drawdown (%)')
    ax2.grid(True, alpha=0.3)
    
    # Correlation during drawdowns
    ax3 = axes[1, 0]
    ax3.scatter(drawdown_analysis['avg_correlation'], 
               drawdown_analysis['max_drawdown'].abs(),
               s=100, alpha=0.6, color=FACTOR_COLORS['Correlation'])
    ax3.set_title('Drawdown vs Factor Correlation')
    ax3.set_xlabel('Average Factor Correlation')
    ax3.set_ylabel('Max Drawdown (%)')
    ax3.grid(True, alpha=0.3)
    
    # Recovery time
    ax4 = axes[1, 1]
    ax4.scatter(drawdown_analysis['max_drawdown'].abs(),
               drawdown_analysis['recovery_days'],
               s=100, alpha=0.6, color=FACTOR_COLORS['Strategy'])
    ax4.set_title('Drawdown Magnitude vs Recovery Time')
    ax4.set_xlabel('Max Drawdown (%)')
    ax4.set_ylabel('Recovery Days')
    ax4.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

# Key insights
print("\n💡 DRAWDOWN INSIGHTS:")
if len(drawdown_analysis) > 0:
    avg_drawdown = drawdown_analysis['max_drawdown'].mean()
    avg_duration = drawdown_analysis['duration_days'].mean()
    avg_recovery = drawdown_analysis['recovery_days'].mean()
    
    print(f"   Average max drawdown: {abs(avg_drawdown):.1f}%")
    print(f"   Average duration: {avg_duration:.0f} days")
    print(f"   Average recovery time: {avg_recovery:.0f} days")
    
    # Regime analysis
    bear_drawdowns = drawdown_analysis[drawdown_analysis['dominant_regime'] == 'Bear']
    if len(bear_drawdowns) > 0:
        print(f"\n   Bear market drawdowns: {len(bear_drawdowns)} occurrences")
        print(f"   Average bear drawdown: {abs(bear_drawdowns['max_drawdown'].mean()):.1f}%")
    
    # Correlation analysis
    high_corr_drawdowns = drawdown_analysis[drawdown_analysis['avg_correlation'] > 0.7]
    if len(high_corr_drawdowns) > 0:
        print(f"\n   High-correlation drawdowns: {len(high_corr_drawdowns)} occurrences")
        print(f"   → These represent {len(high_corr_drawdowns)/len(drawdown_analysis)*100:.0f}% of major drawdowns")
        print("   ⚠️ Factor correlation is a significant risk factor")

## 4. Final Attribution Summary and Recommendations

In [None]:
print("\n" + "=" * 100)
print("🎯 DEEP-DIVE ATTRIBUTION: FINAL SUMMARY")
print("=" * 100)

print("\n📊 KEY FINDINGS:")

print("\n1. REGIME PERFORMANCE:")
print("   • Strategy demonstrates [bull/bear market characteristics based on actual results]")
print("   • Alpha generation is [consistent/regime-dependent]")
print("   • Stress periods show [resilience/vulnerability]")

print("\n2. CONCENTRATION RISKS:")
print("   • Sector concentration [is/is not] a significant risk factor")
print("   • Top 3 sectors typically represent [X]% of portfolio")
print("   • Concentration [increases/decreases] during market stress")

print("\n3. FACTOR CORRELATION DYNAMICS:")
print("   • Correlations spike to [X] during [specific conditions]")
print("   • High correlation periods represent [X]% of trading days")
print("   • Performance degradation during high correlation is [significant/manageable]")

print("\n4. DRAWDOWN CHARACTERISTICS:")
print("   • Worst drawdowns occur primarily in [regime type]")
print("   • Average recovery time is [X] days")
print("   • [X]% of major drawdowns coincide with high factor correlations")

print("\n💡 STRATEGIC RECOMMENDATIONS:")

print("\n1. REGIME-BASED ENHANCEMENTS:")
print("   • Implement defensive overlay during identified stress signals")
print("   • Consider reducing gross exposure when volatility exceeds [threshold]")
print("   • Develop regime-switching framework for factor weights")

print("\n2. CONCENTRATION MANAGEMENT:")
print("   • Tighten sector constraints during high-correlation periods")
print("   • Implement dynamic sector limits based on market conditions")
print("   • Monitor concentration metrics in real-time")

print("\n3. CORRELATION RISK MITIGATION:")
print("   • Develop correlation-based risk overlay")
print("   • Consider alternative factors during high-correlation regimes")
print("   • Implement dynamic hedging when correlations exceed threshold")

print("\n4. DRAWDOWN MANAGEMENT:")
print("   • Implement stop-loss at [X]% based on historical recovery patterns")
print("   • Develop early warning system based on regime and correlation signals")
print("   • Consider volatility targeting to limit drawdown magnitude")

print("\n" + "=" * 100)
print("✅ DEEP-DIVE ATTRIBUTION ANALYSIS COMPLETE")
print("   Ready to proceed with Notebook 05: Robustness Testing")
print("=" * 100)