# üîÄ Objective 2: Voting Method Comparison - Core Analysis
## MCM Problem C 2026

**Goal:** Apply BOTH voting methods (rank and percent) to ALL 34 seasons and create a "counterfactual history" of DWTS.

### Key Questions:
1. How often do the two methods produce different elimination outcomes?
2. Which method favors fan votes more?
3. What patterns emerge in method disagreement?

### Table of Contents
1. [Setup & Data Loading](#1-setup)
2. [Method Simulation Functions](#2-methods)
3. [Counterfactual History Generation](#3-counterfactual)
4. [Basic Divergence Statistics](#4-divergence)
5. [Uncertainty-Aware Analysis (Monte Carlo)](#5-uncertainty)

---

## 1. Setup & Data Loading <a id='1-setup'></a>

In [None]:
# Core libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
from scipy.stats import rankdata, spearmanr
from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')

# Display settings
pd.set_option('display.max_columns', 60)
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('husl')

print("‚úì Libraries loaded")

In [None]:
# Load data
DATA_PATH = Path('../../data')

# Original contestant data
df = pd.read_csv(DATA_PATH / '2026_MCM_Problem_C_Data.csv', na_values=['N/A', 'n/a', ''])

# Fan vote estimates from Objective 1
fan_votes_df = pd.read_csv(DATA_PATH / 'obj1' / 'fan_vote_estimates.csv')

print(f"Original data: {df.shape[0]} contestants")
print(f"Fan vote estimates: {fan_votes_df.shape[0]} records")
print(f"\nFan votes columns: {list(fan_votes_df.columns)}")

In [None]:
# Check for uncertainty bounds from Objective 1
has_uncertainty = 'fan_votes_min' in fan_votes_df.columns and 'fan_votes_max' in fan_votes_df.columns

if has_uncertainty:
    print("‚úì Uncertainty bounds available")
    print(f"  - fan_votes_min: {fan_votes_df['fan_votes_min'].notna().sum()} values")
    print(f"  - fan_votes_max: {fan_votes_df['fan_votes_max'].notna().sum()} values")
else:
    print("‚ö†Ô∏è No uncertainty bounds - will use point estimates only")

# Overview
print(f"\nSeasons: {fan_votes_df['season'].min()} to {fan_votes_df['season'].max()}")
print(f"Unique weeks with eliminations: {fan_votes_df[fan_votes_df['was_eliminated']].shape[0]}")

---

## 2. Method Simulation Functions <a id='2-methods'></a>

### Mathematical Definitions

**Rank-Based Method (S1-2, S28-34):**
$$S_i = R_i^{judge} + R_i^{fan}$$
Eliminated: $\arg\max_i S_i$ (highest combined rank = worst)

**Percent-Based Method (S3-27):**
$$S_i = P_i^{judge} + P_i^{fan} = \frac{J_i}{\sum J} + \frac{F_i}{\sum F}$$
Eliminated: $\arg\min_i S_i$ (lowest combined percentage = worst)

In [None]:
def simulate_rank_elimination(judge_scores, fan_votes, tie_method='average'):
    """
    Simulate elimination under RANK-BASED method.
    
    Args:
        judge_scores: Array of judge scores (higher = better)
        fan_votes: Array of fan votes (higher = better)
        tie_method: How to handle ties in ranking
        
    Returns:
        dict with elimination index, combined ranks, and metadata
    """
    n = len(judge_scores)
    
    # Rank in descending order (1 = highest score/most votes)
    judge_ranks = rankdata(-np.array(judge_scores), method=tie_method)
    fan_ranks = rankdata(-np.array(fan_votes), method=tie_method)
    
    # Combined rank sum (higher = worse)
    combined_ranks = judge_ranks + fan_ranks
    
    # Eliminated = highest combined rank
    eliminated_idx = np.argmax(combined_ranks)
    
    # Margin: how much higher than 2nd-worst?
    sorted_ranks = np.sort(combined_ranks)
    margin = combined_ranks[eliminated_idx] - sorted_ranks[-2] if n > 1 else 0
    
    return {
        'eliminated_idx': eliminated_idx,
        'judge_ranks': judge_ranks,
        'fan_ranks': fan_ranks,
        'combined_ranks': combined_ranks,
        'margin': margin
    }


def simulate_percent_elimination(judge_scores, fan_votes):
    """
    Simulate elimination under PERCENT-BASED method.
    
    Args:
        judge_scores: Array of judge scores
        fan_votes: Array of fan votes
        
    Returns:
        dict with elimination index, combined percentages, and metadata
    """
    judge_scores = np.array(judge_scores)
    fan_votes = np.array(fan_votes)
    
    # Convert to percentages
    judge_pct = judge_scores / judge_scores.sum()
    fan_pct = fan_votes / fan_votes.sum()
    
    # Combined percentage (higher = better)
    combined_pct = judge_pct + fan_pct
    
    # Eliminated = lowest combined percentage
    eliminated_idx = np.argmin(combined_pct)
    
    # Margin: how much lower than 2nd-lowest?
    sorted_pct = np.sort(combined_pct)
    margin = sorted_pct[1] - combined_pct[eliminated_idx] if len(judge_scores) > 1 else 0
    
    return {
        'eliminated_idx': eliminated_idx,
        'judge_pct': judge_pct,
        'fan_pct': fan_pct,
        'combined_pct': combined_pct,
        'margin': margin
    }


def simulate_judges_bottom2(judge_scores, fan_votes, method='percent'):
    """
    Simulate the S28+ rule: identify bottom 2, then judges choose.
    
    Bottom 2 determined by combined scores, then judges eliminate
    the one with the lower judge score.
    """
    judge_scores = np.array(judge_scores)
    fan_votes = np.array(fan_votes)
    
    if method == 'percent':
        result = simulate_percent_elimination(judge_scores, fan_votes)
        combined = result['combined_pct']
        # Bottom 2 = two lowest combined percentages
        bottom2_idx = np.argsort(combined)[:2]
    else:  # rank
        result = simulate_rank_elimination(judge_scores, fan_votes)
        combined = result['combined_ranks']
        # Bottom 2 = two highest combined ranks
        bottom2_idx = np.argsort(combined)[-2:]
    
    # Judges choose: eliminate the one with lower judge score
    if judge_scores[bottom2_idx[0]] <= judge_scores[bottom2_idx[1]]:
        eliminated_idx = bottom2_idx[0]
    else:
        eliminated_idx = bottom2_idx[1]
    
    return {
        'eliminated_idx': eliminated_idx,
        'bottom2_idx': bottom2_idx,
        'combined': combined
    }


print("‚úì Simulation functions defined")

In [None]:
# Test the functions with example data
example_judge = [28, 25, 30, 22]
example_fan = [2.5e6, 3.2e6, 1.8e6, 2.0e6]
names = ['Alice', 'Bob', 'Carol', 'Dave']

rank_result = simulate_rank_elimination(example_judge, example_fan)
pct_result = simulate_percent_elimination(example_judge, example_fan)
b2_result = simulate_judges_bottom2(example_judge, example_fan, method='percent')

print("Example with 4 contestants:")
print(f"  Judge scores: {example_judge}")
print(f"  Fan votes (M): {[v/1e6 for v in example_fan]}")
print()
print(f"RANK method ‚Üí Eliminates: {names[rank_result['eliminated_idx']]}")
print(f"  Combined ranks: {rank_result['combined_ranks']}")
print()
print(f"PERCENT method ‚Üí Eliminates: {names[pct_result['eliminated_idx']]}")
print(f"  Combined %: {[f'{p:.1%}' for p in pct_result['combined_pct']]}")
print()
print(f"JUDGES BOTTOM 2 ‚Üí Eliminates: {names[b2_result['eliminated_idx']]}")
print(f"  Bottom 2: {[names[i] for i in b2_result['bottom2_idx']]}")

---

## 3. Counterfactual History Generation <a id='3-counterfactual'></a>

Apply BOTH methods to ALL seasons, creating a complete counterfactual history.

In [None]:
def generate_counterfactual_history(fan_votes_df):
    """
    Generate counterfactual elimination history under all three methods.
    
    Returns DataFrame with actual vs counterfactual eliminations.
    """
    results = []
    
    # Group by season and week
    grouped = fan_votes_df.groupby(['season', 'week'])
    
    for (season, week), week_df in tqdm(grouped, desc="Processing weeks"):
        # Skip weeks with no elimination
        if week_df['was_eliminated'].sum() == 0:
            continue
        
        # Get data
        judge_scores = week_df['judge_score'].values
        fan_votes = week_df['fan_votes_estimate'].values
        contestants = week_df['celebrity_name'].values
        actual_method = week_df['method'].iloc[0]
        
        # Find actual eliminated contestant
        actual_elim_mask = week_df['was_eliminated'].values
        actual_elim_idx = np.where(actual_elim_mask)[0][0]
        actual_elim_name = contestants[actual_elim_idx]
        
        # Simulate all three methods
        rank_result = simulate_rank_elimination(judge_scores, fan_votes)
        pct_result = simulate_percent_elimination(judge_scores, fan_votes)
        b2_rank_result = simulate_judges_bottom2(judge_scores, fan_votes, method='rank')
        b2_pct_result = simulate_judges_bottom2(judge_scores, fan_votes, method='percent')
        
        # Store results
        results.append({
            'season': season,
            'week': week,
            'n_contestants': len(contestants),
            'actual_method': actual_method,
            'actual_eliminated': actual_elim_name,
            'actual_elim_idx': actual_elim_idx,
            
            # Counterfactual eliminations
            'rank_would_eliminate': contestants[rank_result['eliminated_idx']],
            'pct_would_eliminate': contestants[pct_result['eliminated_idx']],
            'b2_rank_would_eliminate': contestants[b2_rank_result['eliminated_idx']],
            'b2_pct_would_eliminate': contestants[b2_pct_result['eliminated_idx']],
            
            # Agreement flags
            'rank_matches_actual': contestants[rank_result['eliminated_idx']] == actual_elim_name,
            'pct_matches_actual': contestants[pct_result['eliminated_idx']] == actual_elim_name,
            'methods_agree': rank_result['eliminated_idx'] == pct_result['eliminated_idx'],
            
            # Margins (how clearcut was the elimination?)
            'rank_margin': rank_result['margin'],
            'pct_margin': pct_result['margin'],
            
            # Judge-Fan alignment
            'jfac': spearmanr(judge_scores, fan_votes)[0] if len(judge_scores) > 2 else np.nan
        })
    
    return pd.DataFrame(results)

print("‚úì Counterfactual generator defined")

In [None]:
# Generate counterfactual history
print("Generating counterfactual history for all seasons...")
counterfactual_df = generate_counterfactual_history(fan_votes_df)

print(f"\n‚úì Generated {len(counterfactual_df)} elimination records")
print(f"  Seasons: {counterfactual_df['season'].min()} to {counterfactual_df['season'].max()}")

counterfactual_df.head(10)

In [None]:
# Save counterfactual history
OUTPUT_PATH = DATA_PATH / 'obj2'
OUTPUT_PATH.mkdir(exist_ok=True)

counterfactual_df.to_csv(OUTPUT_PATH / 'counterfactual_history.csv', index=False)
print(f"‚úì Saved to {OUTPUT_PATH / 'counterfactual_history.csv'}")

---

## 4. Basic Divergence Statistics <a id='4-divergence'></a>

How often do the methods disagree?

In [None]:
print("="*70)
print("OUTCOME DIVERGENCE ANALYSIS")
print("="*70)

# Overall statistics
total_weeks = len(counterfactual_df)
methods_agree = counterfactual_df['methods_agree'].sum()
methods_disagree = total_weeks - methods_agree

print(f"\nüìä Overall Results (Point Estimates):")
print(f"   Total elimination weeks: {total_weeks}")
print(f"   Methods AGREE: {methods_agree} ({methods_agree/total_weeks:.1%})")
print(f"   Methods DISAGREE: {methods_disagree} ({methods_disagree/total_weeks:.1%})")

# By actual method used
print(f"\nüìà Disagreement by Actual Method Used:")
for method, group in counterfactual_df.groupby('actual_method'):
    n = len(group)
    disagree = (~group['methods_agree']).sum()
    print(f"   {method.upper()}: {disagree}/{n} = {disagree/n:.1%} disagreement")

In [None]:
# Show all cases where methods disagree
disagreements = counterfactual_df[~counterfactual_df['methods_agree']].copy()
disagreements = disagreements.sort_values(['season', 'week'])

print(f"\nüî¥ ALL {len(disagreements)} DISAGREEMENT CASES:")
print("="*90)

display_cols = ['season', 'week', 'actual_eliminated', 'rank_would_eliminate', 
                'pct_would_eliminate', 'actual_method', 'jfac']

if len(disagreements) > 0:
    print(disagreements[display_cols].to_string(index=False))
else:
    print("No disagreements found!")

In [None]:
# Visualize disagreement patterns
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# Plot 1: Disagreement by season
ax1 = axes[0, 0]
season_disagree = counterfactual_df.groupby('season').agg({
    'methods_agree': lambda x: (~x).sum(),
    'week': 'count'
}).reset_index()
season_disagree.columns = ['season', 'disagreements', 'total_weeks']
season_disagree['disagree_pct'] = season_disagree['disagreements'] / season_disagree['total_weeks'] * 100

# Color by actual method
colors = ['steelblue' if s in [1, 2] or s >= 28 else 'coral' 
          for s in season_disagree['season']]

ax1.bar(season_disagree['season'], season_disagree['disagree_pct'], color=colors, edgecolor='black')
ax1.axhline(methods_disagree/total_weeks*100, color='red', linestyle='--', 
            label=f'Overall: {methods_disagree/total_weeks:.1%}')
ax1.set_xlabel('Season')
ax1.set_ylabel('Disagreement Rate (%)')
ax1.set_title('Method Disagreement by Season\n(Blue=Rank seasons, Orange=Percent seasons)')
ax1.legend()

# Plot 2: Disagreement by week number
ax2 = axes[0, 1]
week_disagree = counterfactual_df.groupby('week').agg({
    'methods_agree': lambda x: (~x).mean() * 100
}).reset_index()
week_disagree.columns = ['week', 'disagree_pct']

ax2.bar(week_disagree['week'], week_disagree['disagree_pct'], color='purple', edgecolor='black')
ax2.set_xlabel('Week')
ax2.set_ylabel('Disagreement Rate (%)')
ax2.set_title('Method Disagreement by Week Number')

# Plot 3: Disagreement vs Judge-Fan Alignment (JFAC)
ax3 = axes[1, 0]
valid_jfac = counterfactual_df[counterfactual_df['jfac'].notna()].copy()
agree_jfac = valid_jfac[valid_jfac['methods_agree']]['jfac']
disagree_jfac = valid_jfac[~valid_jfac['methods_agree']]['jfac']

ax3.hist(agree_jfac, bins=20, alpha=0.5, label=f'Agree (n={len(agree_jfac)})', color='green')
ax3.hist(disagree_jfac, bins=20, alpha=0.5, label=f'Disagree (n={len(disagree_jfac)})', color='red')
ax3.axvline(agree_jfac.mean(), color='green', linestyle='--', linewidth=2)
ax3.axvline(disagree_jfac.mean(), color='red', linestyle='--', linewidth=2)
ax3.set_xlabel('Judge-Fan Alignment (JFAC)')
ax3.set_ylabel('Count')
ax3.set_title('JFAC Distribution: Agreement vs Disagreement Cases')
ax3.legend()

# Plot 4: Disagreement by number of contestants
ax4 = axes[1, 1]
n_disagree = counterfactual_df.groupby('n_contestants').agg({
    'methods_agree': lambda x: (~x).mean() * 100,
    'week': 'count'
}).reset_index()
n_disagree.columns = ['n_contestants', 'disagree_pct', 'count']

ax4.bar(n_disagree['n_contestants'], n_disagree['disagree_pct'], color='teal', edgecolor='black')
ax4.set_xlabel('Number of Contestants')
ax4.set_ylabel('Disagreement Rate (%)')
ax4.set_title('Method Disagreement by Remaining Contestants')

plt.tight_layout()
plt.savefig(OUTPUT_PATH / 'disagreement_patterns.png', dpi=150, bbox_inches='tight')
plt.show()

print(f"\n‚úì Saved to {OUTPUT_PATH / 'disagreement_patterns.png'}")

In [None]:
# Key insight: When do methods disagree?
print("="*70)
print("KEY INSIGHT: When Do Methods Disagree?")
print("="*70)

if len(disagree_jfac) > 0:
    print(f"\nüìä Judge-Fan Alignment (JFAC):")
    print(f"   When methods AGREE: Mean JFAC = {agree_jfac.mean():.3f}")
    print(f"   When methods DISAGREE: Mean JFAC = {disagree_jfac.mean():.3f}")
    print(f"   ‚Üí Disagreements occur when judges and fans are LESS aligned!")

# Margin analysis
print(f"\nüìä Elimination Margins:")
agree_margin_rank = counterfactual_df[counterfactual_df['methods_agree']]['rank_margin'].mean()
disagree_margin_rank = counterfactual_df[~counterfactual_df['methods_agree']]['rank_margin'].mean()
print(f"   Rank margin when AGREE: {agree_margin_rank:.2f}")
print(f"   Rank margin when DISAGREE: {disagree_margin_rank:.2f}")
print(f"   ‚Üí Disagreements occur when eliminations are CLOSER!")

---

## 5. Uncertainty-Aware Analysis (Monte Carlo) <a id='5-uncertainty'></a>

**Critical Question:** Our fan vote estimates have uncertainty. How robust are our counterfactual conclusions?

We'll sample from the feasible fan vote region and check if disagreement conclusions hold.

In [None]:
def monte_carlo_counterfactual(week_df, n_samples=1000):
    """
    Run Monte Carlo simulation over uncertainty bounds.
    
    For each sample, draw fan votes uniformly from [min, max] bounds
    and check if methods agree/disagree.
    
    Returns probability of disagreement.
    """
    judge_scores = week_df['judge_score'].values
    fan_votes_est = week_df['fan_votes_estimate'].values
    
    # Check if uncertainty bounds exist
    if 'fan_votes_min' in week_df.columns and week_df['fan_votes_min'].notna().all():
        fan_min = week_df['fan_votes_min'].values
        fan_max = week_df['fan_votes_max'].values
    else:
        # If no bounds, use point estimate ¬± 20% as proxy
        fan_min = fan_votes_est * 0.8
        fan_max = fan_votes_est * 1.2
    
    n = len(judge_scores)
    
    # Track outcomes
    rank_elims = []
    pct_elims = []
    agree_count = 0
    
    for _ in range(n_samples):
        # Sample fan votes uniformly from bounds
        fan_sample = np.random.uniform(fan_min, fan_max)
        
        # Normalize to ensure they sum to reasonable total
        fan_sample = fan_sample / fan_sample.sum() * fan_votes_est.sum()
        
        # Simulate both methods
        rank_result = simulate_rank_elimination(judge_scores, fan_sample)
        pct_result = simulate_percent_elimination(judge_scores, fan_sample)
        
        rank_elims.append(rank_result['eliminated_idx'])
        pct_elims.append(pct_result['eliminated_idx'])
        
        if rank_result['eliminated_idx'] == pct_result['eliminated_idx']:
            agree_count += 1
    
    # Compute statistics
    rank_elim_probs = np.bincount(rank_elims, minlength=n) / n_samples
    pct_elim_probs = np.bincount(pct_elims, minlength=n) / n_samples
    
    return {
        'p_agree': agree_count / n_samples,
        'p_disagree': 1 - agree_count / n_samples,
        'rank_elim_probs': rank_elim_probs,
        'pct_elim_probs': pct_elim_probs,
        'n_samples': n_samples
    }


print("‚úì Monte Carlo counterfactual function defined")

In [None]:
# Run Monte Carlo on all weeks
print("Running Monte Carlo uncertainty analysis...")
print("(This may take a few minutes)")

mc_results = []

grouped = fan_votes_df.groupby(['season', 'week'])

for (season, week), week_df in tqdm(grouped, desc="Monte Carlo"):
    if week_df['was_eliminated'].sum() == 0:
        continue
    
    mc = monte_carlo_counterfactual(week_df, n_samples=500)
    
    mc_results.append({
        'season': season,
        'week': week,
        'p_agree': mc['p_agree'],
        'p_disagree': mc['p_disagree']
    })

mc_df = pd.DataFrame(mc_results)
print(f"\n‚úì Monte Carlo complete for {len(mc_df)} weeks")

In [None]:
# Merge with counterfactual results
counterfactual_df = counterfactual_df.merge(mc_df, on=['season', 'week'], how='left')

# Summary statistics
print("="*70)
print("UNCERTAINTY-AWARE DISAGREEMENT ANALYSIS")
print("="*70)

# Point estimate said disagree - what does MC say?
point_disagree = counterfactual_df[~counterfactual_df['methods_agree']]
point_agree = counterfactual_df[counterfactual_df['methods_agree']]

print(f"\nüìä Point Estimate: {len(point_disagree)} disagreements")
print(f"\n   Of those 'disagreements':")
print(f"   - Always disagree (p_disagree > 0.95): {(point_disagree['p_disagree'] > 0.95).sum()}")
print(f"   - Usually disagree (p_disagree > 0.75): {(point_disagree['p_disagree'] > 0.75).sum()}")
print(f"   - Uncertain (0.25 < p < 0.75): {((point_disagree['p_disagree'] > 0.25) & (point_disagree['p_disagree'] < 0.75)).sum()}")
print(f"   - Usually agree (p_disagree < 0.25): {(point_disagree['p_disagree'] < 0.25).sum()}")

print(f"\nüìä Mean P(disagree) across all weeks: {mc_df['p_disagree'].mean():.1%}")
print(f"   95% CI: [{mc_df['p_disagree'].quantile(0.025):.1%}, {mc_df['p_disagree'].quantile(0.975):.1%}]")

In [None]:
# Visualize uncertainty in disagreement
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Plot 1: Distribution of P(disagree)
ax1 = axes[0]
ax1.hist(mc_df['p_disagree'], bins=30, color='purple', alpha=0.7, edgecolor='black')
ax1.axvline(0.5, color='red', linestyle='--', label='50% threshold')
ax1.axvline(mc_df['p_disagree'].mean(), color='orange', linestyle='-', linewidth=2,
            label=f'Mean: {mc_df["p_disagree"].mean():.1%}')
ax1.set_xlabel('P(Methods Disagree)')
ax1.set_ylabel('Count')
ax1.set_title('Distribution of Disagreement Probability\n(Monte Carlo, 500 samples per week)')
ax1.legend()

# Plot 2: P(disagree) by season
ax2 = axes[1]
season_mc = mc_df.groupby(counterfactual_df['season'])['p_disagree'].mean()
colors = ['steelblue' if s in [1, 2] or s >= 28 else 'coral' for s in season_mc.index]
ax2.bar(season_mc.index, season_mc.values * 100, color=colors, edgecolor='black')
ax2.axhline(mc_df['p_disagree'].mean() * 100, color='red', linestyle='--',
            label=f'Overall mean: {mc_df["p_disagree"].mean():.1%}')
ax2.set_xlabel('Season')
ax2.set_ylabel('Mean P(Disagree) %')
ax2.set_title('Uncertainty-Aware Disagreement Rate by Season')
ax2.legend()

plt.tight_layout()
plt.savefig(OUTPUT_PATH / 'uncertainty_disagreement.png', dpi=150, bbox_inches='tight')
plt.show()

In [None]:
# Save enhanced counterfactual data
counterfactual_df.to_csv(OUTPUT_PATH / 'counterfactual_history_with_uncertainty.csv', index=False)
print(f"‚úì Saved enhanced results to {OUTPUT_PATH / 'counterfactual_history_with_uncertainty.csv'}")

---

## Summary

### Key Findings:

In [None]:
print("="*70)
print("NOTEBOOK SUMMARY: Method Comparison Core Analysis")
print("="*70)

print(f"""
üìä OVERALL STATISTICS:
   - Total elimination weeks analyzed: {len(counterfactual_df)}
   - Point estimate disagreement rate: {(~counterfactual_df['methods_agree']).mean():.1%}
   - Monte Carlo mean P(disagree): {mc_df['p_disagree'].mean():.1%}

üîç KEY INSIGHTS:
   1. Methods disagree more when Judge-Fan alignment (JFAC) is LOW
   2. Disagreements tend to occur in CLOSE eliminations (small margins)
   3. Uncertainty bounds matter: some "disagreements" become uncertain under MC

üìÅ FILES CREATED:
   - {OUTPUT_PATH / 'counterfactual_history.csv'}
   - {OUTPUT_PATH / 'counterfactual_history_with_uncertainty.csv'}
   - {OUTPUT_PATH / 'disagreement_patterns.png'}
   - {OUTPUT_PATH / 'uncertainty_disagreement.png'}

‚û°Ô∏è NEXT: See 06_divergence_analysis.ipynb for deeper divergence metrics
""")