# Phase 2 Action-Trace Analysis - Stairways V3
## OOS Training Results & Top 10 Checkpoint Analysis

**Team B Template for Team A Customization**

This notebook analyzes Phase 2 OOS training results to:
- Rank top 10 checkpoints by ep_rew_mean (secondary: Sharpe ratio)
- Generate required visualizations for stakeholder reporting
- Validate success criteria: Sharpe ≥ 0.3, ep_rew_mean ≥ 0.1

**Usage**: Run after Phase 2 OOS training completion (3 seeds × 50K steps)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import json
import warnings
warnings.filterwarnings('ignore')

# Set style for professional reports
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 11

print("📊 Phase 2 Action-Trace Analysis - Stairways V3")
print("=" * 60)
print("🎯 Team B Template - Ready for Team A Customization")

## 1. Configuration & Data Loading

In [None]:
# Phase 2 OOS Training Paths (Team A: Update these paths after training completion)
PHASE2_PATHS = [
    Path('../train_runs/phase2_oos_seed0'),
    Path('../train_runs/phase2_oos_seed1'), 
    Path('../train_runs/phase2_oos_seed2')
]

# Results output directory
RESULTS_PATH = Path('../results/phase2')
RESULTS_PATH.mkdir(parents=True, exist_ok=True)

# Success criteria thresholds
SUCCESS_CRITERIA = {
    'oos_sharpe_min': 0.3,
    'ep_rew_mean_min': 0.1,
    'top_n_checkpoints': 10
}

print(f"📁 Configured {len(PHASE2_PATHS)} Phase 2 training paths")
print(f"📊 Success criteria: Sharpe ≥ {SUCCESS_CRITERIA['oos_sharpe_min']}, ep_rew_mean ≥ {SUCCESS_CRITERIA['ep_rew_mean_min']}")
print(f"🏆 Will analyze top {SUCCESS_CRITERIA['top_n_checkpoints']} checkpoints")

In [None]:
def load_phase2_action_traces(seed_paths):
    """
    Load action traces from all Phase 2 OOS runs
    
    Args:
        seed_paths: List of Path objects to seed run directories
        
    Returns:
        dict: {seed_id: action_trace_dataframe}
    """
    action_traces = {}
    
    for i, seed_path in enumerate(seed_paths):
        seed_id = f"seed_{i}"
        
        # Try multiple possible action trace file formats
        possible_files = [
            seed_path / 'action_traces.csv',
            seed_path / 'action_traces.parquet',
            seed_path / 'detailed_logs.csv'
        ]
        
        loaded = False
        for trace_file in possible_files:
            if trace_file.exists():
                try:
                    if trace_file.suffix == '.parquet':
                        df = pd.read_parquet(trace_file)
                    else:
                        df = pd.read_csv(trace_file)
                    
                    df['seed_id'] = seed_id
                    action_traces[seed_id] = df
                    print(f"✅ Loaded {len(df)} action records from {seed_id}")
                    loaded = True
                    break
                except Exception as e:
                    print(f"⚠️ Failed to load {trace_file}: {e}")
        
        if not loaded:
            print(f"❌ No action traces found for {seed_id}")
            # Create sample data for template demonstration
            print(f"📝 Creating sample data for {seed_id}")
            np.random.seed(42 + i)
            n_episodes = 50
            n_steps_per_ep = 100
            n_samples = n_episodes * n_steps_per_ep
            
            sample_df = pd.DataFrame({
                'episode_id': np.repeat(range(n_episodes), n_steps_per_ep),
                'step': np.tile(range(n_steps_per_ep), n_episodes),
                'episode_reward': np.repeat(np.random.normal(0.15, 0.8, n_episodes), n_steps_per_ep),
                'step_reward': np.random.normal(0.001, 0.05, n_samples),
                'nvda_position': np.random.choice([-1, 0, 1], n_samples, p=[0.25, 0.5, 0.25]),
                'msft_position': np.random.choice([-1, 0, 1], n_samples, p=[0.25, 0.5, 0.25]),
                'step_pnl': np.random.normal(0, 25, n_samples),
                'nvda_price': 485 + np.cumsum(np.random.normal(0, 2, n_samples)),
                'msft_price': 412 + np.cumsum(np.random.normal(0, 1.5, n_samples)),
                'action': np.random.randint(0, 5, n_samples),
                'timestamp': pd.date_range('2024-01-01', periods=n_samples, freq='1min'),
                'seed_id': seed_id
            })
            action_traces[seed_id] = sample_df
    
    return action_traces

# Load all Phase 2 action traces
phase2_traces = load_phase2_action_traces(PHASE2_PATHS)
print(f"\n📈 Total seeds loaded: {len(phase2_traces)}")

## 2. Monitor.csv Analysis & Checkpoint Ranking

In [None]:
def load_monitor_data(seed_paths):
    """
    Load monitor.csv files from all Phase 2 runs for checkpoint ranking
    
    Returns:
        dict: {seed_id: monitor_dataframe}
    """
    monitor_data = {}
    
    for i, seed_path in enumerate(seed_paths):
        seed_id = f"seed_{i}"
        monitor_file = seed_path / 'monitor.csv'
        
        if monitor_file.exists():
            try:
                # Load monitor.csv (skip comment lines starting with #)
                df = pd.read_csv(monitor_file, comment='#')
                df['seed_id'] = seed_id
                monitor_data[seed_id] = df
                print(f"✅ Loaded {len(df)} episodes from {seed_id} monitor.csv")
            except Exception as e:
                print(f"❌ Failed to load monitor.csv for {seed_id}: {e}")
        else:
            print(f"⚠️ No monitor.csv found for {seed_id}, creating sample data")
            # Create sample monitor data
            np.random.seed(42 + i)
            n_episodes = 50
            sample_monitor = pd.DataFrame({
                'r': np.random.normal(0.15, 0.8, n_episodes),  # episode rewards
                'l': np.random.randint(60, 120, n_episodes),    # episode lengths
                't': np.cumsum(np.random.randint(60, 120, n_episodes)),  # timestamps
                'seed_id': seed_id
            })
            monitor_data[seed_id] = sample_monitor
    
    return monitor_data

# Load monitor data
monitor_data = load_monitor_data(PHASE2_PATHS)
print(f"\n📊 Monitor data loaded for {len(monitor_data)} seeds")

In [None]:
def rank_top10_checkpoints(monitor_data, success_criteria):
    """
    Rank checkpoints by ep_rew_mean, secondary by Sharpe ratio
    
    Args:
        monitor_data: Dict of monitor dataframes by seed
        success_criteria: Dict with ranking criteria
        
    Returns:
        pd.DataFrame: Ranked checkpoint results
    """
    checkpoint_results = []
    
    for seed_id, df in monitor_data.items():
        if len(df) == 0:
            continue
            
        # Calculate key metrics
        ep_rew_mean = df['r'].mean()
        ep_rew_std = df['r'].std()
        ep_len_mean = df['l'].mean()
        
        # Calculate Sharpe ratio (episode-level returns, annualized)
        if ep_rew_std > 0:
            sharpe = (ep_rew_mean / ep_rew_std) * np.sqrt(252)  # Assuming daily episodes
        else:
            sharpe = 0
        
        # Success criteria checks
        meets_sharpe = sharpe >= success_criteria['oos_sharpe_min']
        meets_reward = ep_rew_mean >= success_criteria['ep_rew_mean_min']
        
        checkpoint_results.append({
            'seed_id': seed_id,
            'ep_rew_mean': ep_rew_mean,
            'ep_rew_std': ep_rew_std,
            'sharpe_ratio': sharpe,
            'ep_len_mean': ep_len_mean,
            'total_episodes': len(df),
            'meets_sharpe_criteria': meets_sharpe,
            'meets_reward_criteria': meets_reward,
            'overall_success': meets_sharpe and meets_reward
        })
    
    # Convert to DataFrame and rank
    results_df = pd.DataFrame(checkpoint_results)
    
    if len(results_df) > 0:
        # Primary ranking: ep_rew_mean (descending)
        # Secondary ranking: sharpe_ratio (descending)
        results_df = results_df.sort_values(
            ['ep_rew_mean', 'sharpe_ratio'], 
            ascending=[False, False]
        ).reset_index(drop=True)
        
        # Add ranking
        results_df['rank'] = range(1, len(results_df) + 1)
    
    return results_df

# Rank checkpoints
checkpoint_rankings = rank_top10_checkpoints(monitor_data, SUCCESS_CRITERIA)

print("\n🏆 TOP 10 CHECKPOINT RANKINGS")
print("=" * 80)
if len(checkpoint_rankings) > 0:
    top_10 = checkpoint_rankings.head(SUCCESS_CRITERIA['top_n_checkpoints'])
    
    for _, row in top_10.iterrows():
        success_icon = "✅" if row['overall_success'] else "❌"
        print(f"{success_icon} Rank {row['rank']:2d}: {row['seed_id']} | "
              f"Reward: {row['ep_rew_mean']:6.3f} | "
              f"Sharpe: {row['sharpe_ratio']:6.3f} | "
              f"Episodes: {row['total_episodes']:3d}")
    
    # Success summary
    successful_checkpoints = checkpoint_rankings[checkpoint_rankings['overall_success']]
    print(f"\n📊 SUCCESS SUMMARY:")
    print(f"   Checkpoints meeting both criteria: {len(successful_checkpoints)}/{len(checkpoint_rankings)}")
    print(f"   Best ep_rew_mean: {checkpoint_rankings['ep_rew_mean'].max():.3f}")
    print(f"   Best Sharpe ratio: {checkpoint_rankings['sharpe_ratio'].max():.3f}")
else:
    print("❌ No checkpoint data available for ranking")

# Save rankings to file
checkpoint_rankings.to_csv(RESULTS_PATH / 'top10_checkpoints.csv', index=False)
print(f"\n💾 Rankings saved to: {RESULTS_PATH / 'top10_checkpoints.csv'}")

## 3. Required Visualizations
### Team A: Customize these visualization functions based on your specific data structure

In [None]:
def plot_reward_components_timeseries(action_traces, save_path):
    """
    Time-series of reward components per episode
    
    Team A: Customize this function based on your reward system components
    Expected components: pnl_reward, holding_bonus, exit_tax, smoothed_penalty
    """
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    fig.suptitle('Reward Components Time Series Analysis', fontsize=16, fontweight='bold')
    
    # Combine all seeds for analysis
    all_traces = pd.concat(action_traces.values(), ignore_index=True)
    
    # Episode-level aggregation
    episode_rewards = all_traces.groupby(['seed_id', 'episode_id']).agg({
        'episode_reward': 'first',
        'step_reward': 'sum',
        'step_pnl': 'sum',
        'step': 'count'
    }).reset_index()
    
    # 1. Episode rewards over time
    for seed_id in episode_rewards['seed_id'].unique():
        seed_data = episode_rewards[episode_rewards['seed_id'] == seed_id]
        axes[0,0].plot(seed_data['episode_id'], seed_data['episode_reward'], 
                      alpha=0.7, label=seed_id, linewidth=1)
    
    axes[0,0].set_title('Episode Rewards Over Time')
    axes[0,0].set_xlabel('Episode ID')
    axes[0,0].set_ylabel('Episode Reward')
    axes[0,0].legend()
    axes[0,0].grid(True, alpha=0.3)
    
    # 2. Cumulative P&L
    for seed_id in episode_rewards['seed_id'].unique():
        seed_data = episode_rewards[episode_rewards['seed_id'] == seed_id]
        cumulative_pnl = seed_data['step_pnl'].cumsum()
        axes[0,1].plot(seed_data['episode_id'], cumulative_pnl, 
                      alpha=0.7, label=seed_id, linewidth=2)
    
    axes[0,1].set_title('Cumulative P&L Over Episodes')
    axes[0,1].set_xlabel('Episode ID')
    axes[0,1].set_ylabel('Cumulative P&L')
    axes[0,1].legend()
    axes[0,1].grid(True, alpha=0.3)
    
    # 3. Episode length distribution
    axes[1,0].hist(episode_rewards['step'], bins=20, alpha=0.7, edgecolor='black')
    axes[1,0].axvline(episode_rewards['step'].mean(), color='red', linestyle='--', 
                     label=f'Mean: {episode_rewards["step"].mean():.1f}')
    axes[1,0].set_title('Episode Length Distribution')
    axes[1,0].set_xlabel('Episode Length (steps)')
    axes[1,0].set_ylabel('Frequency')
    axes[1,0].legend()
    
    # 4. Reward vs Episode Length scatter
    scatter = axes[1,1].scatter(episode_rewards['step'], episode_rewards['episode_reward'], 
                               alpha=0.6, c=episode_rewards['step_pnl'], cmap='RdYlGn')
    axes[1,1].set_title('Episode Reward vs Length (colored by P&L)')
    axes[1,1].set_xlabel('Episode Length (steps)')
    axes[1,1].set_ylabel('Episode Reward')
    plt.colorbar(scatter, ax=axes[1,1], label='Total P&L')
    
    plt.tight_layout()
    plt.savefig(save_path / 'reward_components_timeseries.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    return fig

# Generate reward components visualization
if phase2_traces:
    reward_fig = plot_reward_components_timeseries(phase2_traces, RESULTS_PATH)
    print("✅ Reward components time series saved")
else:
    print("❌ No action traces available for reward analysis")

In [None]:
def plot_pnl_actions_overlay(action_traces, save_path):
    """
    P&L vs action overlay with trade markers on price chart
    
    Team A: Customize based on your action space and position tracking
    """
    fig, axes = plt.subplots(3, 1, figsize=(16, 12))
    fig.suptitle('P&L vs Actions Overlay Analysis', fontsize=16, fontweight='bold')
    
    # Use best performing seed for detailed analysis
    if len(checkpoint_rankings) > 0:
        best_seed = checkpoint_rankings.iloc[0]['seed_id']
        trace_data = phase2_traces[best_seed]
        print(f"📊 Analyzing best performing seed: {best_seed}")
    else:
        # Use first available seed
        best_seed = list(phase2_traces.keys())[0]
        trace_data = phase2_traces[best_seed]
        print(f"📊 Analyzing seed: {best_seed}")
    
    # Select a representative episode for detailed view
    episode_rewards = trace_data.groupby('episode_id')['episode_reward'].first()
    best_episode_id = episode_rewards.idxmax()
    episode_data = trace_data[trace_data['episode_id'] == best_episode_id].copy()
    episode_data = episode_data.sort_values('step')
    
    print(f"📈 Analyzing best episode: {best_episode_id} (reward: {episode_rewards[best_episode_id]:.3f})")
    
    # 1. NVDA Price with position markers
    axes[0].plot(episode_data['step'], episode_data['nvda_price'], 'b-', linewidth=2, label='NVDA Price')
    
    # Mark position changes
    position_changes = episode_data[episode_data['nvda_position'].diff() != 0]
    for _, row in position_changes.iterrows():
        color = 'green' if row['nvda_position'] > 0 else 'red' if row['nvda_position'] < 0 else 'gray'
        marker = '^' if row['nvda_position'] > 0 else 'v' if row['nvda_position'] < 0 else 'o'
        axes[0].scatter(row['step'], row['nvda_price'], color=color, marker=marker, s=100, alpha=0.8)
    
    axes[0].set_title(f'NVDA Price with Position Changes (Episode {best_episode_id})')
    axes[0].set_ylabel('NVDA Price')
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)
    
    # 2. MSFT Price with position markers
    axes[1].plot(episode_data['step'], episode_data['msft_price'], 'r-', linewidth=2, label='MSFT Price')
    
    position_changes = episode_data[episode_data['msft_position'].diff() != 0]
    for _, row in position_changes.iterrows():
        color = 'green' if row['msft_position'] > 0 else 'red' if row['msft_position'] < 0 else 'gray'
        marker = '^' if row['msft_position'] > 0 else 'v' if row['msft_position'] < 0 else 'o'
        axes[1].scatter(row['step'], row['msft_price'], color=color, marker=marker, s=100, alpha=0.8)
    
    axes[1].set_title(f'MSFT Price with Position Changes (Episode {best_episode_id})')
    axes[1].set_ylabel('MSFT Price')
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)
    
    # 3. Cumulative P&L with action markers
    episode_data['cumulative_pnl'] = episode_data['step_pnl'].cumsum()
    axes[2].plot(episode_data['step'], episode_data['cumulative_pnl'], 'g-', linewidth=2, label='Cumulative P&L')
    
    # Mark significant actions
    action_changes = episode_data[episode_data['action'].diff() != 0]
    for _, row in action_changes.iterrows():
        axes[2].scatter(row['step'], row['cumulative_pnl'], color='orange', marker='D', s=60, alpha=0.7)
    
    axes[2].set_title(f'Cumulative P&L with Action Changes (Episode {best_episode_id})')
    axes[2].set_xlabel('Time Step')
    axes[2].set_ylabel('Cumulative P&L')
    axes[2].legend()
    axes[2].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.savefig(save_path / 'pnl_actions_overlay.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    return fig

# Generate P&L vs actions overlay
if phase2_traces:
    pnl_fig = plot_pnl_actions_overlay(phase2_traces, RESULTS_PATH)
    print("✅ P&L vs actions overlay saved")
else:
    print("❌ No action traces available for P&L analysis")

In [None]:
def plot_drawdown_holding_scatter(action_traces, save_path):
    """
    Drawdown vs holding time scatter plot
    
    Team A: Customize based on your drawdown calculation method
    """
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    fig.suptitle('Drawdown vs Holding Time Analysis', fontsize=16, fontweight='bold')
    
    # Combine all traces for comprehensive analysis
    all_traces = pd.concat(phase2_traces.values(), ignore_index=True)
    
    # Calculate episode-level metrics
    episode_metrics = []
    
    for (seed_id, episode_id), episode_data in all_traces.groupby(['seed_id', 'episode_id']):
        episode_data = episode_data.sort_values('step')
        
        # Calculate cumulative P&L and drawdown
        cumulative_pnl = episode_data['step_pnl'].cumsum()
        running_max = cumulative_pnl.expanding().max()
        drawdown = cumulative_pnl - running_max
        max_drawdown = drawdown.min()
        
        # Calculate holding times for each position
        nvda_holding_time = 0
        msft_holding_time = 0
        
        # Simple holding time calculation (consecutive non-zero positions)
        nvda_positions = episode_data['nvda_position'].values
        msft_positions = episode_data['msft_position'].values
        
        # Count consecutive holding periods
        nvda_holding_time = np.sum(nvda_positions != 0)
        msft_holding_time = np.sum(msft_positions != 0)
        total_holding_time = nvda_holding_time + msft_holding_time
        
        episode_metrics.append({
            'seed_id': seed_id,
            'episode_id': episode_id,
            'max_drawdown': max_drawdown,
            'nvda_holding_time': nvda_holding_time,
            'msft_holding_time': msft_holding_time,
            'total_holding_time': total_holding_time,
            'episode_reward': episode_data['episode_reward'].iloc[0],
            'episode_length': len(episode_data)
        })
    
    metrics_df = pd.DataFrame(episode_metrics)
    
    # 1. Total holding time vs max drawdown
    scatter1 = axes[0,0].scatter(metrics_df['total_holding_time'], metrics_df['max_drawdown'], 
                                alpha=0.6, c=metrics_df['episode_reward'], cmap='RdYlGn')
    axes[0,0].set_title('Total Holding Time vs Max Drawdown')
    axes[0,0].set_xlabel('Total Holding Time (steps)')
    axes[0,0].set_ylabel('Max Drawdown')
    plt.colorbar(scatter1, ax=axes[0,0], label='Episode Reward')
    
    # 2. NVDA holding time vs drawdown
    axes[0,1].scatter(metrics_df['nvda_holding_time'], metrics_df['max_drawdown'], 
                     alpha=0.6, color='green', label='NVDA')
    axes[0,1].set_title('NVDA Holding Time vs Max Drawdown')
    axes[0,1].set_xlabel('NVDA Holding Time (steps)')
    axes[0,1].set_ylabel('Max Drawdown')
    
    # 3. MSFT holding time vs drawdown
    axes[1,0].scatter(metrics_df['msft_holding_time'], metrics_df['max_drawdown'], 
                     alpha=0.6, color='blue', label='MSFT')
    axes[1,0].set_title('MSFT Holding Time vs Max Drawdown')
    axes[1,0].set_xlabel('MSFT Holding Time (steps)')
    axes[1,0].set_ylabel('Max Drawdown')
    
    # 4. Holding time distribution by performance quartiles
    # Divide episodes into performance quartiles
    quartiles = metrics_df['episode_reward'].quantile([0.25, 0.5, 0.75])
    
    q1_data = metrics_df[metrics_df['episode_reward'] <= quartiles[0.25]]
    q2_data = metrics_df[(metrics_df['episode_reward'] > quartiles[0.25]) & 
                        (metrics_df['episode_reward'] <= quartiles[0.5])]
    q3_data = metrics_df[(metrics_df['episode_reward'] > quartiles[0.5]) & 
                        (metrics_df['episode_reward'] <= quartiles[0.75])]
    q4_data = metrics_df[metrics_df['episode_reward'] > quartiles[0.75]]
    
    axes[1,1].hist([q1_data['total_holding_time'], q2_data['total_holding_time'], 
                   q3_data['total_holding_time'], q4_data['total_holding_time']], 
                  bins=15, alpha=0.7, label=['Q1 (worst)', 'Q2', 'Q3', 'Q4 (best)'])
    axes[1,1].set_title('Holding Time Distribution by Performance Quartile')
    axes[1,1].set_xlabel('Total Holding Time (steps)')
    axes[1,1].set_ylabel('Frequency')
    axes[1,1].legend()
    
    plt.tight_layout()
    plt.savefig(save_path / 'drawdown_holding_scatter.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    # Print summary statistics
    print("\n📊 DRAWDOWN vs HOLDING TIME SUMMARY:")
    print(f"   Average max drawdown: {metrics_df['max_drawdown'].mean():.2f}")
    print(f"   Average total holding time: {metrics_df['total_holding_time'].mean():.1f} steps")
    print(f"   Correlation (holding time vs drawdown): {metrics_df['total_holding_time'].corr(metrics_df['max_drawdown']):.3f}")
    
    return fig

# Generate drawdown vs holding time analysis
if phase2_traces:
    drawdown_fig = plot_drawdown_holding_scatter(phase2_traces, RESULTS_PATH)
    print("✅ Drawdown vs holding time scatter saved")
else:
    print("❌ No action traces available for drawdown analysis")

In [None]:
def plot_episode_distributions(monitor_data, save_path):
    """
    Distribution histograms of ep_len, ep_rew
    """
    fig, axes = plt.subplots(2, 3, figsize=(18, 12))
    fig.suptitle('Episode Distributions Analysis', fontsize=16, fontweight='bold')
    
    # Combine all monitor data
    all_monitor = pd.concat(monitor_data.values(), ignore_index=True)
    
    # 1. Episode reward distribution
    axes[0,0].hist(all_monitor['r'], bins=30, alpha=0.7, edgecolor='black', color='skyblue')
    axes[0,0].axvline(all_monitor['r'].mean(), color='red', linestyle='--', 
                     label=f'Mean: {all_monitor["r"].mean():.3f}')
    axes[0,0].axvline(SUCCESS_CRITERIA['ep_rew_mean_min'], color='green', linestyle='--', 
                     label=f'Target: {SUCCESS_CRITERIA["ep_rew_mean_min"]}')
    axes[0,0].set_title('Episode Reward Distribution')
    axes[0,0].set_xlabel('Episode Reward')
    axes[0,0].set_ylabel('Frequency')
    axes[0,0].legend()
    
    # 2. Episode length distribution
    axes[0,1].hist(all_monitor['l'], bins=30, alpha=0.7, edgecolor='black', color='lightgreen')
    axes[0,1].axvline(all_monitor['l'].mean(), color='red', linestyle='--', 
                     label=f'Mean: {all_monitor["l"].mean():.1f}')
    axes[0,1].axvline(80, color='orange', linestyle='--', label='Target: 80')
    axes[0,1].set_title('Episode Length Distribution')
    axes[0,1].set_xlabel('Episode Length (steps)')
    axes[0,1].set_ylabel('Frequency')
    axes[0,1].legend()
    
    # 3. Reward vs Length scatter
    scatter = axes[0,2].scatter(all_monitor['l'], all_monitor['r'], alpha=0.6)
    axes[0,2].set_title('Episode Reward vs Length')
    axes[0,2].set_xlabel('Episode Length (steps)')
    axes[0,2].set_ylabel('Episode Reward')
    
    # Add correlation coefficient
    correlation = all_monitor['l'].corr(all_monitor['r'])
    axes[0,2].text(0.05, 0.95, f'Correlation: {correlation:.3f}', 
                  transform=axes[0,2].transAxes, verticalalignment='top',
                  bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))
    
    # 4. Seed-wise reward comparison
    seed_rewards = []
    seed_labels = []
    for seed_id, df in monitor_data.items():
        seed_rewards.append(df['r'].values)
        seed_labels.append(seed_id)
    
    axes[1,0].boxplot(seed_rewards, labels=seed_labels)
    axes[1,0].set_title('Episode Rewards by Seed')
    axes[1,0].set_ylabel('Episode Reward')
    axes[1,0].tick_params(axis='x', rotation=45)
    
    # 5. Seed-wise length comparison
    seed_lengths = []
    for seed_id, df in monitor_data.items():
        seed_lengths.append(df['l'].values)
    
    axes[1,1].boxplot(seed_lengths, labels=seed_labels)
    axes[1,1].set_title('Episode Lengths by Seed')
    axes[1,1].set_ylabel('Episode Length (steps)')
    axes[1,1].tick_params(axis='x', rotation=45)
    
    # 6. Success criteria summary
    axes[1,2].axis('off')
    
    # Calculate success metrics
    mean_reward = all_monitor['r'].mean()
    mean_length = all_monitor['l'].mean()
    reward_success = mean_reward >= SUCCESS_CRITERIA['ep_rew_mean_min']
    length_success = mean_length >= 80
    
    # Calculate Sharpe ratio
    sharpe = (all_monitor['r'].mean() / all_monitor['r'].std()) * np.sqrt(252) if all_monitor['r'].std() > 0 else 0
    sharpe_success = sharpe >= SUCCESS_CRITERIA['oos_sharpe_min']
    
    success_text = f"""
PHASE 2 SUCCESS CRITERIA:

📊 Episode Reward Mean:
   Current: {mean_reward:.3f}
   Target:  {SUCCESS_CRITERIA['ep_rew_mean_min']:.3f}
   Status:  {'✅ PASS' if reward_success else '❌ FAIL'}

📈 Sharpe Ratio:
   Current: {sharpe:.3f}
   Target:  {SUCCESS_CRITERIA['oos_sharpe_min']:.3f}
   Status:  {'✅ PASS' if sharpe_success else '❌ FAIL'}

⏱️ Episode Length Mean:
   Current: {mean_length:.1f}
   Target:  80.0
   Status:  {'✅ PASS' if length_success else '❌ FAIL'}

🎯 OVERALL STATUS:
   {'✅ PHASE 2 SUCCESS' if (reward_success and sharpe_success) else '❌ PHASE 2 NEEDS WORK'}
"""
    
    axes[1,2].text(0.1, 0.9, success_text, transform=axes[1,2].transAxes, 
                  verticalalignment='top', fontfamily='monospace', fontsize=10,
                  bbox=dict(boxstyle='round', facecolor='lightgray', alpha=0.8))
    
    plt.tight_layout()
    plt.savefig(save_path / 'episode_distributions.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    return fig

# Generate episode distributions
if monitor_data:
    dist_fig = plot_episode_distributions(monitor_data, RESULTS_PATH)
    print("✅ Episode distributions saved")
else:
    print("❌ No monitor data available for distribution analysis")

## 4. Phase 2 Summary Report Generation

In [None]:
def generate_phase2_summary_report():
    """
    Generate comprehensive Phase 2 summary report
    """
    from datetime import datetime
    
    # Collect all metrics
    all_monitor = pd.concat(monitor_data.values(), ignore_index=True)
    
    mean_reward = all_monitor['r'].mean()
    mean_length = all_monitor['l'].mean()
    sharpe = (all_monitor['r'].mean() / all_monitor['r'].std()) * np.sqrt(252) if all_monitor['r'].std() > 0 else 0
    
    # Success criteria evaluation
    reward_success = mean_reward >= SUCCESS_CRITERIA['ep_rew_mean_min']
    sharpe_success = sharpe >= SUCCESS_CRITERIA['oos_sharpe_min']
    overall_success = reward_success and sharpe_success
    
    # Create summary report
    report = {
        "phase": "Phase 2: Curriculum Training & Validation",
        "completion_date": datetime.now().isoformat(),
        "training_configuration": {
            "seeds_trained": len(monitor_data),
            "total_timesteps": 50000,
            "training_period": "2022-01-01 to 2023-12-31",
            "test_period": "2024-01-01 to 2024-12-31",
            "exit_tax_enabled": True,
            "governor_enabled": True
        },
        "performance_metrics": {
            "ep_rew_mean": float(mean_reward),
            "ep_rew_std": float(all_monitor['r'].std()),
            "sharpe_ratio": float(sharpe),
            "ep_len_mean": float(mean_length),
            "total_episodes": len(all_monitor)
        },
        "success_criteria": {
            "oos_sharpe_target": SUCCESS_CRITERIA['oos_sharpe_min'],
            "oos_sharpe_achieved": float(sharpe),
            "oos_sharpe_success": sharpe_success,
            "ep_rew_mean_target": SUCCESS_CRITERIA['ep_rew_mean_min'],
            "ep_rew_mean_achieved": float(mean_reward),
            "ep_rew_mean_success": reward_success,
            "overall_success": overall_success
        },
        "top_checkpoints": checkpoint_rankings.head(10).to_dict('records') if len(checkpoint_rankings) > 0 else [],
        "next_steps": {
            "proceed_to_phase3": overall_success,
            "recommended_action": "Proceed to Phase 3 Mini-Grid" if overall_success else "Investigate reward system tuning",
            "best_checkpoint": checkpoint_rankings.iloc[0]['seed_id'] if len(checkpoint_rankings) > 0 else None
        },
        "files_generated": [
            "top10_checkpoints.csv",
            "reward_components_timeseries.png",
            "pnl_actions_overlay.png",
            "drawdown_holding_scatter.png",
            "episode_distributions.png"
        ]
    }
    
    # Save JSON report
    with open(RESULTS_PATH / 'phase2_summary_report.json', 'w') as f:
        json.dump(report, f, indent=2)
    
    # Generate markdown summary for stakeholders
    markdown_summary = f"""
# Phase 2 OOS Training Results - Stairways V3

**Completion Date:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}

## 🎯 Success Criteria Evaluation

| Metric | Target | Achieved | Status |
|--------|--------|----------|--------|
| OOS Sharpe Ratio | ≥ {SUCCESS_CRITERIA['oos_sharpe_min']} | {sharpe:.3f} | {'✅ PASS' if sharpe_success else '❌ FAIL'} |
| Episode Reward Mean | ≥ {SUCCESS_CRITERIA['ep_rew_mean_min']} | {mean_reward:.3f} | {'✅ PASS' if reward_success else '❌ FAIL'} |
| **Overall Status** | - | - | **{'✅ SUCCESS' if overall_success else '❌ NEEDS WORK'}** |

## 📊 Performance Summary

- **Seeds Trained:** {len(monitor_data)}
- **Total Episodes:** {len(all_monitor):,}
- **Average Episode Length:** {mean_length:.1f} steps
- **Episode Reward Std:** {all_monitor['r'].std():.3f}

## 🏆 Top 3 Checkpoints

"""
    
    if len(checkpoint_rankings) > 0:
        for i, (_, row) in enumerate(checkpoint_rankings.head(3).iterrows()):
            markdown_summary += f"""
**{i+1}. {row['seed_id']}**
- Episode Reward Mean: {row['ep_rew_mean']:.3f}
- Sharpe Ratio: {row['sharpe_ratio']:.3f}
- Total Episodes: {row['total_episodes']}
"""
    
    markdown_summary += f"""

## 🚀 Next Steps

**Recommendation:** {report['next_steps']['recommended_action']}

{'✅ **PROCEED TO PHASE 3**: Mini-grid hyperparameter optimization' if overall_success else '⚠️ **INVESTIGATE**: Reward system requires tuning before Phase 3'}

## 📁 Generated Files

- `phase2_summary_report.json` - Complete results data
- `top10_checkpoints.csv` - Checkpoint rankings
- `reward_components_timeseries.png` - Reward analysis
- `pnl_actions_overlay.png` - Trading behavior analysis
- `drawdown_holding_scatter.png` - Risk analysis
- `episode_distributions.png` - Performance distributions

---
*Generated by Team B Action-Trace Analysis Template*
"""
    
    # Save markdown report
    with open(RESULTS_PATH / 'phase2_summary_report.md', 'w') as f:
        f.write(markdown_summary)
    
    print("\n📋 PHASE 2 SUMMARY REPORT GENERATED")
    print("=" * 50)
    print(f"📊 Overall Success: {'✅ YES' if overall_success else '❌ NO'}")
    print(f"📁 Reports saved to: {RESULTS_PATH}")
    print(f"   - phase2_summary_report.json")
    print(f"   - phase2_summary_report.md")
    
    return report

# Generate final report
final_report = generate_phase2_summary_report()
print("\n🎉 Phase 2 Action-Trace Analysis Complete!")

## 5. Team A Customization Notes

**🔧 CUSTOMIZATION REQUIRED:**

1. **Data Loading Paths**: Update `PHASE2_PATHS` with actual training run directories
2. **Action Trace Format**: Modify `load_phase2_action_traces()` based on your actual logging format
3. **Reward Components**: Customize `plot_reward_components_timeseries()` for your specific reward system
4. **Action Space**: Update action interpretation in `plot_pnl_actions_overlay()` based on your 5-action space
5. **Position Tracking**: Verify position change detection logic matches your environment
6. **Drawdown Calculation**: Implement your specific drawdown methodology in `plot_drawdown_holding_scatter()`

**✅ READY TO USE:**
- Checkpoint ranking by ep_rew_mean (primary) and Sharpe (secondary)
- Success criteria validation (Sharpe ≥ 0.3, ep_rew_mean ≥ 0.1)
- Professional visualization templates
- Automated report generation (JSON + Markdown)
- Results saved to `../results/phase2/`

**🎯 TEAM A TODO:**
1. Run Phase 2 OOS training (3 seeds × 50K steps)
2. Update paths in Configuration section
3. Customize visualization functions for your data structure
4. Execute notebook after training completion
5. Review generated reports and visualizations
6. Share results with stakeholders for Phase 3 approval
