# üìà 04 ‚Äî Modality Analysis

**Purpose:** Compare Full-Frame vs Face-Only vs Face+Hands model performance per class.

**Sections:**
1. Inline Setup
2. Load Predictions from All Modalities
3. Overall Metrics Comparison (Accuracy, Macro-F1)
4. Per-Class F1 Comparison Table
5. Delta Heatmap (thesis figure)
6. Confusion Matrix Comparison (thesis figure)

**Prerequisites:** Predictions exist for all modalities you want to compare (from 03_evaluation.ipynb)


In [None]:
# --- INLINE SETUP ---
import os, subprocess, sys

REPO_DIRNAME   = "CNNs-distracted-driving"
PROJECT_ROOT   = f"/content/{REPO_DIRNAME}"
DRIVE_PATH     = "/content/drive/MyDrive/TFM"
OUT_ROOT       = f"{DRIVE_PATH}/outputs"
CKPT_ROOT      = f"{DRIVE_PATH}/checkpoints"

from google.colab import drive
drive.mount('/content/drive', force_remount=False)

if not os.path.isdir(PROJECT_ROOT):
    subprocess.call(f"git clone https://github.com/ClaudiaCPach/CNNs-distracted-driving {PROJECT_ROOT}", shell=True)
subprocess.call(f"pip install -q -e {PROJECT_ROOT}", shell=True)

os.environ["OUT_ROOT"] = OUT_ROOT
os.environ["CKPT_ROOT"] = CKPT_ROOT

sys.path.insert(0, PROJECT_ROOT)
sys.path.insert(0, os.path.join(PROJECT_ROOT, "src"))
print("‚úÖ Setup complete")


## üìÅ Section 2: Configure Prediction Paths

Set the paths to your prediction CSVs for each modality.

**5-Run Experimental Plan:**
| Run | Description | Prediction File Example |
|-----|-------------|-------------------------|
| 1 | Full-frame (all IDs) | `effb0_fullframe_v1_test.csv` |
| 2 | Face ROI (natural) | `effb0_face_v1_test.csv` |
| 3 | Face+Hands ROI (natural) | `effb0_face_hands_v1_test.csv` |
| 4 | Full-frame (facesubset control) | `effb0_fullframe_facesubset_v1_test.csv` |
| 5 | Full-frame (fhsubset control) | `effb0_fullframe_fhsubset_v1_test.csv` |

**Key comparisons:**
- ROI vs Full-frame: Compare Run 2/3 with Run 1
- Control analysis: Compare Run 3 vs Run 5 (same IDs, different input)


In [None]:
# Configure prediction file paths
from pathlib import Path

# ============== PREDICTION PATHS ==============
# Set these to match YOUR prediction files from 03_evaluation.ipynb
# Set any path to None to exclude it from comparison

# --- Natural runs (different ID sets) ---
FULL_FRAME_PREDS = Path(OUT_ROOT) / "preds/test/effb0_fullframe_v1_test.csv"        # Run 1: Full-frame, all IDs
FACE_ONLY_PREDS = Path(OUT_ROOT) / "preds/test/effb0_face_v1_test.csv"              # Run 2: Face ROI
FACE_HANDS_PREDS = Path(OUT_ROOT) / "preds/test/effb0_face_hands_v1_test.csv"       # Run 3: Face+Hands ROI

# --- Control runs (filtered to match ROI ID sets) ---
CONTROL_FACESUBSET_PREDS = Path(OUT_ROOT) / "preds/test/effb0_fullframe_facesubset_v1_test.csv"  # Run 4: Full-frame, face-available IDs
CONTROL_FHSUBSET_PREDS = Path(OUT_ROOT) / "preds/test/effb0_fullframe_fhsubset_v1_test.csv"      # Run 5: Full-frame, FH-available IDs

CLASS_NAMES = {
    0: "Safe driving", 1: "Texting (R)", 2: "Phone (R)", 3: "Texting (L)",
    4: "Phone (L)", 5: "Radio", 6: "Drinking", 7: "Reaching back",
    8: "Hair/makeup", 9: "Passenger",
}

# Verify files exist
preds_dir = Path(OUT_ROOT) / "preds" / "test"
print("Available prediction files:")
for f in sorted(preds_dir.glob("*.csv")):
    print(f"  - {f.name}")

print("\nüìã Configured paths:")
for name, path in [
    ("Full-frame (natural)", FULL_FRAME_PREDS),
    ("Face ROI", FACE_ONLY_PREDS),
    ("Face+Hands ROI", FACE_HANDS_PREDS),
    ("Control (facesubset)", CONTROL_FACESUBSET_PREDS),
    ("Control (fhsubset)", CONTROL_FHSUBSET_PREDS),
]:
    status = "‚úÖ" if path and path.exists() else "‚ùå"
    print(f"  {status} {name}: {path.name if path else 'None'}")


## üìä Section 3: Load & Compare Overall Metrics


In [None]:
# Load predictions and compute metrics
import pandas as pd
import numpy as np
from pathlib import Path
from sklearn.metrics import f1_score, precision_score, recall_score

def load_and_compute_metrics(pred_path, name):
    if pred_path is None or not Path(pred_path).exists():
        return None
    
    df = pd.read_csv(pred_path)
    
    def extract_class(path):
        parts = Path(path).parts
        for p in reversed(parts):
            if p.startswith("c") and len(p) == 2 and p[1].isdigit():
                return int(p[1])
        return -1
    
    df["true"] = df["path"].apply(extract_class)
    df["pred"] = df["pred_class_id"].apply(
        lambda x: int(x[1]) if isinstance(x, str) and x.startswith("c") else int(x)
    )
    df = df[df["true"] >= 0]
    
    y_true, y_pred = df["true"].values, df["pred"].values
    
    return {
        "accuracy": (y_true == y_pred).mean(),
        "macro_f1": f1_score(y_true, y_pred, average="macro", zero_division=0),
        "weighted_f1": f1_score(y_true, y_pred, average="weighted", zero_division=0),
        "macro_precision": precision_score(y_true, y_pred, average="macro", zero_division=0),
        "macro_recall": recall_score(y_true, y_pred, average="macro", zero_division=0),
        "per_class_f1": f1_score(y_true, y_pred, average=None, labels=range(10), zero_division=0),
        "n_samples": len(df),
    }

# Load all modalities (natural runs + control runs)
modalities = {}

# Natural runs
if FULL_FRAME_PREDS and FULL_FRAME_PREDS.exists():
    modalities["Full Frame"] = load_and_compute_metrics(FULL_FRAME_PREDS, "full")
if FACE_ONLY_PREDS and FACE_ONLY_PREDS.exists():
    modalities["Face Only"] = load_and_compute_metrics(FACE_ONLY_PREDS, "face")
if FACE_HANDS_PREDS and FACE_HANDS_PREDS.exists():
    modalities["Face+Hands"] = load_and_compute_metrics(FACE_HANDS_PREDS, "face_hands")

# Control runs (for isolating filtering effect)
if CONTROL_FACESUBSET_PREDS and CONTROL_FACESUBSET_PREDS.exists():
    modalities["Ctrl-FaceSub"] = load_and_compute_metrics(CONTROL_FACESUBSET_PREDS, "ctrl_face")
if CONTROL_FHSUBSET_PREDS and CONTROL_FHSUBSET_PREDS.exists():
    modalities["Ctrl-FHSub"] = load_and_compute_metrics(CONTROL_FHSUBSET_PREDS, "ctrl_fh")

print(f"Loaded {len(modalities)} modalities: {list(modalities.keys())}")

# Highlight key comparison if both available
if "Face+Hands" in modalities and "Ctrl-FHSub" in modalities:
    print("\nüéØ KEY COMPARISON: Face+Hands vs Ctrl-FHSub (same IDs, different representation)")
    print(f"   Face+Hands: {modalities['Face+Hands']['accuracy']*100:.2f}% accuracy")
    print(f"   Ctrl-FHSub: {modalities['Ctrl-FHSub']['accuracy']*100:.2f}% accuracy")
    delta = (modalities['Face+Hands']['accuracy'] - modalities['Ctrl-FHSub']['accuracy']) * 100
    print(f"   Œî = {delta:+.2f} pp ({'ROI helps!' if delta > 0 else 'ROI hurts' if delta < 0 else 'No difference'})")


In [None]:
# Display overall metrics table
print("=" * 95)
print("üìã MAIN RESULTS TABLE: Accuracy + Macro-F1 + Weighted-F1")
print("=" * 95)
print(f"{'Model':<15} {'Accuracy':>10} {'Macro-F1':>10} {'Wgt-F1':>10} {'Macro-P':>10} {'Macro-R':>10} {'N':>8}")
print("-" * 95)
for mod_name, mod_data in modalities.items():
    if mod_data:
        print(f"{mod_name:<15} {mod_data['accuracy']*100:>9.2f}% {mod_data['macro_f1']*100:>9.2f}% "
              f"{mod_data['weighted_f1']*100:>9.2f}% {mod_data['macro_precision']*100:>9.2f}% "
              f"{mod_data['macro_recall']*100:>9.2f}% {mod_data['n_samples']:>8d}")
print("-" * 95)

best_acc = max(modalities.items(), key=lambda x: x[1]["accuracy"] if x[1] else 0)
best_f1 = max(modalities.items(), key=lambda x: x[1]["macro_f1"] if x[1] else 0)
print(f"\nüèÜ Best Accuracy: {best_acc[0]} ({best_acc[1]['accuracy']*100:.2f}%)")
print(f"üèÜ Best Macro-F1: {best_f1[0]} ({best_f1[1]['macro_f1']*100:.2f}%)")


## üìä Section 4: Per-Class F1 Comparison


In [None]:
# Per-class F1 comparison table with deltas (including control runs)
import pandas as pd
import numpy as np

table_rows = []
for c in range(10):
    row_data = {"Class": f"c{c}", "Name": CLASS_NAMES.get(c, f"Class {c}")}
    
    # Natural runs
    f1_full = modalities.get("Full Frame", {}).get("per_class_f1", np.zeros(10))[c] if modalities.get("Full Frame") else None
    f1_face = modalities.get("Face Only", {}).get("per_class_f1", np.zeros(10))[c] if modalities.get("Face Only") else None
    f1_fh = modalities.get("Face+Hands", {}).get("per_class_f1", np.zeros(10))[c] if modalities.get("Face+Hands") else None
    
    # Control runs
    f1_ctrl_fh = modalities.get("Ctrl-FHSub", {}).get("per_class_f1", np.zeros(10))[c] if modalities.get("Ctrl-FHSub") else None
    
    row_data["F1 Full"] = f"{f1_full*100:.1f}" if f1_full is not None else "‚Äî"
    row_data["F1 Face"] = f"{f1_face*100:.1f}" if f1_face is not None else "‚Äî"
    row_data["F1 F+H"] = f"{f1_fh*100:.1f}" if f1_fh is not None else "‚Äî"
    row_data["F1 Ctrl-FH"] = f"{f1_ctrl_fh*100:.1f}" if f1_ctrl_fh is not None else "‚Äî"
    
    # Deltas
    if f1_face is not None and f1_full is not None:
        row_data["Œî Face‚àíFull"] = f"{(f1_face - f1_full)*100:+.1f}"
    else:
        row_data["Œî Face‚àíFull"] = "‚Äî"
    
    if f1_fh is not None and f1_face is not None:
        row_data["Œî FH‚àíFace"] = f"{(f1_fh - f1_face)*100:+.1f}"
    else:
        row_data["Œî FH‚àíFace"] = "‚Äî"
    
    # KEY: F+H vs Ctrl-FH (same IDs, isolates ROI effect)
    if f1_fh is not None and f1_ctrl_fh is not None:
        row_data["Œî FH‚àíCtrl"] = f"{(f1_fh - f1_ctrl_fh)*100:+.1f}"
    else:
        row_data["Œî FH‚àíCtrl"] = "‚Äî"
    
    table_rows.append(row_data)

enhanced_df = pd.DataFrame(table_rows)
print("=" * 120)
print("üìä PER-CLASS F1 TABLE WITH DELTAS (for thesis)")
print("=" * 120)
print(enhanced_df.to_string(index=False))

# Highlight the key column
if "Ctrl-FHSub" in modalities:
    print("\nüéØ KEY COLUMN: 'Œî FH‚àíCtrl' shows ROI benefit vs full-frame on SAME IDs")
    print("   Positive = ROI representation helps for that class")
    print("   Negative = Full-frame better for that class")

# Save
enhanced_df.to_csv(Path(OUT_ROOT) / "metrics" / "perclass_f1_with_deltas.csv", index=False)
print(f"\nüíæ Saved to {Path(OUT_ROOT) / 'metrics/perclass_f1_with_deltas.csv'}")


In [None]:
# Per-class F1 bar chart
import matplotlib.pyplot as plt
import numpy as np
from pathlib import Path

fig, ax = plt.subplots(figsize=(14, 6))
x = np.arange(10)
n_models = len(modalities)
width = 0.8 / n_models
offset = -width * (n_models - 1) / 2

colors = {"Full Frame": "#9B59B6", "Face Only": "#FF6B6B", "Face+Hands": "#45B7D1"}

for mod_name, mod_data in modalities.items():
    if mod_data:
        f1_scores = mod_data["per_class_f1"] * 100
        ax.bar(x + offset, f1_scores, width, label=mod_name, color=colors.get(mod_name, "gray"), edgecolor="white")
        offset += width

ax.set_xlabel("Class", fontsize=12)
ax.set_ylabel("F1 Score (%)", fontsize=12)
ax.set_title("Per-Class F1 Score: Full-Frame vs Face-Only vs Face+Hands", fontsize=14, fontweight="bold")
ax.set_xticks(x)
ax.set_xticklabels([CLASS_NAMES.get(c, f"c{c}") for c in range(10)], rotation=45, ha="right")
ax.legend(loc="lower right")
ax.set_ylim(0, 105)
ax.grid(axis="y", alpha=0.3)

plt.tight_layout()
out_path = Path(OUT_ROOT) / "metrics" / "perclass_f1_comparison.png"
plt.savefig(out_path, dpi=150, bbox_inches="tight")
plt.show()
print(f"üíæ Saved to {out_path}")


## üéØ Section 5: Delta Heatmap (Thesis Figure)


In [None]:
# Delta heatmap showing F1 changes across modalities
import seaborn as sns

delta_data = []
for c in range(10):
    f1_full = modalities.get("Full Frame", {}).get("per_class_f1", np.zeros(10))[c] if modalities.get("Full Frame") else 0
    f1_face = modalities.get("Face Only", {}).get("per_class_f1", np.zeros(10))[c] if modalities.get("Face Only") else 0
    f1_fh = modalities.get("Face+Hands", {}).get("per_class_f1", np.zeros(10))[c] if modalities.get("Face+Hands") else 0
    
    delta_data.append({
        "Class": CLASS_NAMES.get(c, f"c{c}"),
        "Face ‚àí Full": (f1_face - f1_full) * 100,
        "F+H ‚àí Full": (f1_fh - f1_full) * 100,
        "F+H ‚àí Face": (f1_fh - f1_face) * 100,
    })

delta_df = pd.DataFrame(delta_data)
delta_matrix = delta_df.set_index("Class")[["Face ‚àí Full", "F+H ‚àí Full", "F+H ‚àí Face"]]

fig, ax = plt.subplots(figsize=(8, 10))
sns.heatmap(delta_matrix, annot=True, fmt=".1f", cmap="RdYlGn", center=0, vmin=-50, vmax=50, ax=ax,
            linewidths=0.5, cbar_kws={"label": "F1 Change (pp)", "shrink": 0.8})
ax.set_title("Per-Class F1 Changes Across Modalities\nRed = Drop | Green = Gain", fontsize=12, fontweight="bold")
plt.tight_layout()

out_path = Path(OUT_ROOT) / "metrics" / "delta_heatmap_f1_modalities.png"
plt.savefig(out_path, dpi=150, bbox_inches="tight")
plt.show()
print(f"üíæ Saved to {out_path}")

# Save CSV
delta_matrix.to_csv(Path(OUT_ROOT) / "metrics" / "delta_f1_by_class.csv")


## üîç Section 6: Confusion Matrix Comparison (Thesis Figure)


In [None]:
# Confusion matrix comparison: All modalities (natural + control)
from sklearn.metrics import confusion_matrix, f1_score as sklearn_f1

CLASS_NAMES_SHORT = ["Safe", "TxtR", "PhR", "TxtL", "PhL", "Radio", "Drink", "Reach", "Hair", "Pass"]

def load_preds_for_cm(pred_path):
    df = pd.read_csv(pred_path)
    def extract_class(path):
        for p in reversed(Path(path).parts):
            if p.startswith("c") and len(p) == 2 and p[1].isdigit():
                return int(p[1])
        return -1
    df["true"] = df["path"].apply(extract_class)
    df["pred"] = df["pred_class_id"].apply(lambda x: int(x[1]) if isinstance(x, str) and x.startswith("c") else int(x))
    return df[df["true"] >= 0]

# Load all (natural + control)
all_runs = [
    ("Full Frame", FULL_FRAME_PREDS),
    ("Face Only", FACE_ONLY_PREDS),
    ("Face+Hands", FACE_HANDS_PREDS),
    ("Ctrl-FaceSub", CONTROL_FACESUBSET_PREDS),
    ("Ctrl-FHSub", CONTROL_FHSUBSET_PREDS),
]

cms = {}
f1s = {}
for name, path in all_runs:
    if path and path.exists():
        df = load_preds_for_cm(path)
        cm = confusion_matrix(df["true"], df["pred"], labels=range(10))
        cms[name] = cm
        f1s[name] = sklearn_f1(df["true"], df["pred"], average='macro')

# Plot all available
n_plots = len(cms)
if n_plots == 0:
    print("‚ö†Ô∏è No prediction files found!")
else:
    ncols = min(3, n_plots)
    nrows = (n_plots + ncols - 1) // ncols
    fig, axes = plt.subplots(nrows, ncols, figsize=(6*ncols, 5*nrows))
    axes = np.array(axes).flatten() if n_plots > 1 else [axes]
    
    colors = {
        "Full Frame": "Purples", "Face Only": "Reds", "Face+Hands": "Blues",
        "Ctrl-FaceSub": "Oranges", "Ctrl-FHSub": "Greens",
    }
    
    for idx, (name, cm) in enumerate(cms.items()):
        ax = axes[idx]
        cm_norm = cm.astype(float) / cm.sum(axis=1, keepdims=True) * 100
        cm_norm = np.nan_to_num(cm_norm)
        sns.heatmap(cm_norm, annot=True, fmt=".1f", cmap=colors.get(name, "Blues"), ax=ax,
                    xticklabels=CLASS_NAMES_SHORT, yticklabels=CLASS_NAMES_SHORT, vmin=0, vmax=100)
        ax.set_title(f"{name}\nMacro-F1: {f1s[name]*100:.1f}%", fontweight="bold")
        ax.set_xlabel("Predicted")
        ax.set_ylabel("True")
    
    # Hide empty axes
    for idx in range(n_plots, len(axes)):
        axes[idx].set_visible(False)
    
    plt.tight_layout()
    out_path = Path(OUT_ROOT) / "metrics" / "confusion_matrices_all.png"
    plt.savefig(out_path, dpi=150, bbox_inches="tight")
    plt.show()
    print(f"üíæ Saved to {out_path}")


## üéØ Section 7: Control Run Analysis (5-Run Comparison)

This section compares ROI models against their matched full-frame controls to isolate the effect of ROI cropping vs ID filtering.


In [None]:
# Control Run Analysis: Isolate the ROI effect
# This answers: "Is it the crop or just the data subset that matters?"

control_analysis_available = (
    "Face+Hands" in modalities and "Ctrl-FHSub" in modalities
) or ("Face Only" in modalities and "Ctrl-FaceSub" in modalities)

if control_analysis_available:
    print("=" * 80)
    print("üéØ CONTROL RUN ANALYSIS: Isolating ROI Effect")
    print("=" * 80)
    
    comparisons = []
    
    # Face+Hands vs Ctrl-FHSub (same IDs, different representation)
    if "Face+Hands" in modalities and "Ctrl-FHSub" in modalities:
        fh_data = modalities["Face+Hands"]
        ctrl_fh_data = modalities["Ctrl-FHSub"]
        
        acc_delta = (fh_data["accuracy"] - ctrl_fh_data["accuracy"]) * 100
        f1_delta = (fh_data["macro_f1"] - ctrl_fh_data["macro_f1"]) * 100
        
        print("\nüìä Face+Hands ROI vs Full-Frame Control (same FH-available IDs):")
        print(f"   Face+Hands:    Acc={fh_data['accuracy']*100:.2f}%  Macro-F1={fh_data['macro_f1']*100:.2f}%")
        print(f"   Ctrl-FHSub:    Acc={ctrl_fh_data['accuracy']*100:.2f}%  Macro-F1={ctrl_fh_data['macro_f1']*100:.2f}%")
        print(f"   Œî (ROI effect): Acc={acc_delta:+.2f}pp  Macro-F1={f1_delta:+.2f}pp")
        
        if acc_delta > 1:
            print("   ‚û°Ô∏è  ROI cropping HELPS: Face+Hands extraction provides valuable signal")
        elif acc_delta < -1:
            print("   ‚û°Ô∏è  ROI cropping HURTS: Full-frame retains important context")
        else:
            print("   ‚û°Ô∏è  Minimal difference: ROI extraction neither helps nor hurts much")
        
        comparisons.append({
            "Comparison": "Face+Hands vs Ctrl-FHSub",
            "ROI_Acc": fh_data["accuracy"] * 100,
            "Ctrl_Acc": ctrl_fh_data["accuracy"] * 100,
            "Œî_Acc_pp": acc_delta,
            "ROI_F1": fh_data["macro_f1"] * 100,
            "Ctrl_F1": ctrl_fh_data["macro_f1"] * 100,
            "Œî_F1_pp": f1_delta,
        })
    
    # Face Only vs Ctrl-FaceSub
    if "Face Only" in modalities and "Ctrl-FaceSub" in modalities:
        face_data = modalities["Face Only"]
        ctrl_face_data = modalities["Ctrl-FaceSub"]
        
        acc_delta = (face_data["accuracy"] - ctrl_face_data["accuracy"]) * 100
        f1_delta = (face_data["macro_f1"] - ctrl_face_data["macro_f1"]) * 100
        
        print("\nüìä Face ROI vs Full-Frame Control (same face-available IDs):")
        print(f"   Face Only:     Acc={face_data['accuracy']*100:.2f}%  Macro-F1={face_data['macro_f1']*100:.2f}%")
        print(f"   Ctrl-FaceSub:  Acc={ctrl_face_data['accuracy']*100:.2f}%  Macro-F1={ctrl_face_data['macro_f1']*100:.2f}%")
        print(f"   Œî (ROI effect): Acc={acc_delta:+.2f}pp  Macro-F1={f1_delta:+.2f}pp")
        
        comparisons.append({
            "Comparison": "Face vs Ctrl-FaceSub",
            "ROI_Acc": face_data["accuracy"] * 100,
            "Ctrl_Acc": ctrl_face_data["accuracy"] * 100,
            "Œî_Acc_pp": acc_delta,
            "ROI_F1": face_data["macro_f1"] * 100,
            "Ctrl_F1": ctrl_face_data["macro_f1"] * 100,
            "Œî_F1_pp": f1_delta,
        })
    
    # Save comparison table
    if comparisons:
        control_df = pd.DataFrame(comparisons)
        out_path = Path(OUT_ROOT) / "metrics" / "control_run_comparison.csv"
        control_df.to_csv(out_path, index=False)
        print(f"\nüíæ Saved control analysis to {out_path}")
        print("\n" + control_df.to_string(index=False))
else:
    print("‚ö†Ô∏è Control run analysis requires both ROI and matched control predictions.")
    print("   Run experiments 4-5 (control runs) and generate their predictions first.")


## üìä Section 8: Stability Analysis (Multi-Seed Comparison)

Compare multiple runs of the same configuration with different random seeds to assess training stability.


In [None]:
# Stability Analysis: Compare runs with different seeds
# This helps determine if results are robust or just lucky initialization

# ============== CONFIGURE SEED RUNS ==============
# Add paths to prediction files from same config but different seeds
# Format: (seed, pred_path)

SEED_RUNS = {
    "Full Frame": [
        # (42, Path(OUT_ROOT) / "preds/test/effb0_fullframe_v1_seed42_test.csv"),
        # (123, Path(OUT_ROOT) / "preds/test/effb0_fullframe_v1_seed123_test.csv"),
        # (456, Path(OUT_ROOT) / "preds/test/effb0_fullframe_v1_seed456_test.csv"),
    ],
    "Face+Hands": [
        # (42, Path(OUT_ROOT) / "preds/test/effb0_face_hands_v1_seed42_test.csv"),
        # (123, Path(OUT_ROOT) / "preds/test/effb0_face_hands_v1_seed123_test.csv"),
    ],
}

# ============== RUN STABILITY ANALYSIS ==============
stability_results = []

for config_name, seed_paths in SEED_RUNS.items():
    # Filter to existing files only
    valid_runs = [(seed, path) for seed, path in seed_paths if path.exists()]
    
    if len(valid_runs) < 2:
        print(f"‚ö†Ô∏è {config_name}: Need at least 2 seed runs for stability analysis (found {len(valid_runs)})")
        continue
    
    print(f"\n{'='*70}")
    print(f"üìä STABILITY ANALYSIS: {config_name} ({len(valid_runs)} seeds)")
    print(f"{'='*70}")
    
    seed_metrics = []
    for seed, pred_path in valid_runs:
        metrics = load_and_compute_metrics(pred_path, f"{config_name}_seed{seed}")
        if metrics:
            metrics["seed"] = seed
            seed_metrics.append(metrics)
            print(f"   Seed {seed}: Acc={metrics['accuracy']*100:.2f}%  F1={metrics['macro_f1']*100:.2f}%")
    
    if len(seed_metrics) >= 2:
        accs = [m["accuracy"] * 100 for m in seed_metrics]
        f1s = [m["macro_f1"] * 100 for m in seed_metrics]
        
        print(f"\n   üìà SUMMARY:")
        print(f"   Accuracy:  Œº={np.mean(accs):.2f}% ¬± œÉ={np.std(accs):.2f}%  (range: {np.min(accs):.2f}-{np.max(accs):.2f}%)")
        print(f"   Macro-F1:  Œº={np.mean(f1s):.2f}% ¬± œÉ={np.std(f1s):.2f}%  (range: {np.min(f1s):.2f}-{np.max(f1s):.2f}%)")
        
        # Interpret stability
        if np.std(accs) < 0.5:
            stability = "Very Stable ‚úÖ"
        elif np.std(accs) < 1.0:
            stability = "Stable ‚úîÔ∏è"
        elif np.std(accs) < 2.0:
            stability = "Moderate ‚ö†Ô∏è"
        else:
            stability = "Unstable ‚ùå"
        
        print(f"   Stability: {stability}")
        
        stability_results.append({
            "Config": config_name,
            "N_Seeds": len(seed_metrics),
            "Acc_Mean": np.mean(accs),
            "Acc_Std": np.std(accs),
            "Acc_Min": np.min(accs),
            "Acc_Max": np.max(accs),
            "F1_Mean": np.mean(f1s),
            "F1_Std": np.std(f1s),
            "Stability": stability,
        })

# Save stability results
if stability_results:
    stability_df = pd.DataFrame(stability_results)
    out_path = Path(OUT_ROOT) / "metrics" / "stability_analysis.csv"
    stability_df.to_csv(out_path, index=False)
    print(f"\nüíæ Saved stability analysis to {out_path}")
    print("\n" + stability_df.to_string(index=False))
else:
    print("\nüìã To run stability analysis:")
    print("   1. Train same config with different seeds (e.g., 42, 123, 456)")
    print("   2. Generate predictions for each")
    print("   3. Add paths to SEED_RUNS dict above")
    print("   4. Re-run this cell")


## üìù Section 9: Thesis Summary Exporter

Generate publication-ready tables in CSV and LaTeX format for your thesis.


In [None]:
# Thesis Summary Exporter: Generate LaTeX and CSV tables

def generate_thesis_tables():
    """Generate publication-ready summary tables for thesis."""
    
    # ========== TABLE 1: Main Results (5-Run Comparison) ==========
    print("=" * 80)
    print("üìä TABLE 1: Main Experimental Results (5-Run Plan)")
    print("=" * 80)
    
    main_results = []
    run_order = [
        ("Run 1", "Full Frame", "All IDs", "full-frame"),
        ("Run 2", "Face Only", "Face-available", "face ROI"),
        ("Run 3", "Face+Hands", "FH-available", "face+hands ROI"),
        ("Run 4", "Ctrl-FaceSub", "Face-available", "full-frame"),
        ("Run 5", "Ctrl-FHSub", "FH-available", "full-frame"),
    ]
    
    for run_id, mod_name, id_set, input_type in run_order:
        if mod_name in modalities and modalities[mod_name]:
            m = modalities[mod_name]
            main_results.append({
                "Run": run_id,
                "Model": mod_name,
                "ID Set": id_set,
                "Input": input_type,
                "N": m["n_samples"],
                "Accuracy (%)": f"{m['accuracy']*100:.2f}",
                "Macro-F1 (%)": f"{m['macro_f1']*100:.2f}",
                "Precision (%)": f"{m['macro_precision']*100:.2f}",
                "Recall (%)": f"{m['macro_recall']*100:.2f}",
            })
    
    main_df = pd.DataFrame(main_results)
    print(main_df.to_string(index=False))
    
    # ========== TABLE 2: Control Comparison (ROI Effect) ==========
    print("\n" + "=" * 80)
    print("üìä TABLE 2: ROI Effect Analysis (Same IDs, Different Input)")
    print("=" * 80)
    
    roi_effect = []
    comparisons = [
        ("Face+Hands", "Ctrl-FHSub", "Face+Hands ROI vs Full-Frame"),
        ("Face Only", "Ctrl-FaceSub", "Face ROI vs Full-Frame"),
    ]
    
    for roi_name, ctrl_name, desc in comparisons:
        if roi_name in modalities and ctrl_name in modalities:
            roi_m = modalities[roi_name]
            ctrl_m = modalities[ctrl_name]
            roi_effect.append({
                "Comparison": desc,
                "ROI Acc (%)": f"{roi_m['accuracy']*100:.2f}",
                "Ctrl Acc (%)": f"{ctrl_m['accuracy']*100:.2f}",
                "Œî Acc (pp)": f"{(roi_m['accuracy']-ctrl_m['accuracy'])*100:+.2f}",
                "ROI F1 (%)": f"{roi_m['macro_f1']*100:.2f}",
                "Ctrl F1 (%)": f"{ctrl_m['macro_f1']*100:.2f}",
                "Œî F1 (pp)": f"{(roi_m['macro_f1']-ctrl_m['macro_f1'])*100:+.2f}",
            })
    
    if roi_effect:
        roi_df = pd.DataFrame(roi_effect)
        print(roi_df.to_string(index=False))
    
    # ========== SAVE CSV FILES ==========
    out_dir = Path(OUT_ROOT) / "metrics" / "thesis_tables"
    out_dir.mkdir(parents=True, exist_ok=True)
    
    main_df.to_csv(out_dir / "table1_main_results.csv", index=False)
    print(f"\nüíæ Saved: {out_dir / 'table1_main_results.csv'}")
    
    if roi_effect:
        roi_df.to_csv(out_dir / "table2_roi_effect.csv", index=False)
        print(f"üíæ Saved: {out_dir / 'table2_roi_effect.csv'}")
    
    # ========== GENERATE LATEX ==========
    print("\n" + "=" * 80)
    print("üìù LATEX TABLE 1: Main Results")
    print("=" * 80)
    
    latex_main = r"""\begin{table}[htbp]
\centering
\caption{5-Run Experimental Results: Accuracy and Macro-F1 across modalities}
\label{tab:main_results}
\begin{tabular}{llllrrr}
\toprule
Run & Model & ID Set & Input & N & Accuracy (\%) & Macro-F1 (\%) \\
\midrule
"""
    for _, row in main_df.iterrows():
        latex_main += f"{row['Run']} & {row['Model']} & {row['ID Set']} & {row['Input']} & {row['N']} & {row['Accuracy (%)']} & {row['Macro-F1 (%)']} \\\\\n"
    
    latex_main += r"""\bottomrule
\end{tabular}
\end{table}
"""
    print(latex_main)
    
    # Save LaTeX
    with open(out_dir / "table1_main_results.tex", "w") as f:
        f.write(latex_main)
    print(f"üíæ Saved: {out_dir / 'table1_main_results.tex'}")
    
    if roi_effect:
        print("\n" + "=" * 80)
        print("üìù LATEX TABLE 2: ROI Effect")
        print("=" * 80)
        
        latex_roi = r"""\begin{table}[htbp]
\centering
\caption{ROI Effect Analysis: Comparing ROI crops vs full-frame on identical image IDs}
\label{tab:roi_effect}
\begin{tabular}{lrrrrrr}
\toprule
Comparison & ROI Acc & Ctrl Acc & $\Delta$ Acc & ROI F1 & Ctrl F1 & $\Delta$ F1 \\
\midrule
"""
        for _, row in roi_df.iterrows():
            latex_roi += f"{row['Comparison']} & {row['ROI Acc (%)']} & {row['Ctrl Acc (%)']} & {row['Œî Acc (pp)']} & {row['ROI F1 (%)']} & {row['Ctrl F1 (%)']} & {row['Œî F1 (pp)']} \\\\\n"
        
        latex_roi += r"""\bottomrule
\end{tabular}
\end{table}
"""
        print(latex_roi)
        
        with open(out_dir / "table2_roi_effect.tex", "w") as f:
            f.write(latex_roi)
        print(f"üíæ Saved: {out_dir / 'table2_roi_effect.tex'}")
    
    return main_df, roi_df if roi_effect else None

# Generate tables
main_table, roi_table = generate_thesis_tables()


In [None]:
# Generate per-class F1 LaTeX table for thesis
print("=" * 80)
print("üìù LATEX TABLE 3: Per-Class F1 Scores")
print("=" * 80)

out_dir = Path(OUT_ROOT) / "metrics" / "thesis_tables"

# Build per-class data
perclass_data = []
for c in range(10):
    row = {"Class": f"c{c}", "Name": CLASS_NAMES.get(c, f"Class {c}")}
    
    for mod_name in ["Full Frame", "Face Only", "Face+Hands", "Ctrl-FHSub"]:
        if mod_name in modalities and modalities[mod_name]:
            f1 = modalities[mod_name]["per_class_f1"][c] * 100
            row[mod_name] = f"{f1:.1f}"
        else:
            row[mod_name] = "‚Äî"
    
    perclass_data.append(row)

perclass_df = pd.DataFrame(perclass_data)

# Generate LaTeX
latex_perclass = r"""\begin{table}[htbp]
\centering
\caption{Per-class F1 scores (\%) across modalities}
\label{tab:perclass_f1}
\begin{tabular}{llrrrr}
\toprule
Class & Description & Full-Frame & Face & Face+Hands & Ctrl-FH \\
\midrule
"""

for _, row in perclass_df.iterrows():
    name_escaped = row["Name"].replace("&", r"\&")
    latex_perclass += f"{row['Class']} & {name_escaped} & {row.get('Full Frame', '‚Äî')} & {row.get('Face Only', '‚Äî')} & {row.get('Face+Hands', '‚Äî')} & {row.get('Ctrl-FHSub', '‚Äî')} \\\\\n"

latex_perclass += r"""\bottomrule
\end{tabular}
\end{table}
"""

print(latex_perclass)

# Save
with open(out_dir / "table3_perclass_f1.tex", "w") as f:
    f.write(latex_perclass)
print(f"üíæ Saved: {out_dir / 'table3_perclass_f1.tex'}")

perclass_df.to_csv(out_dir / "table3_perclass_f1.csv", index=False)
print(f"üíæ Saved: {out_dir / 'table3_perclass_f1.csv'}")


## ‚úÖ Modality Analysis Complete!

**Outputs saved to Drive:**

üìä **Analysis Results:**
- `OUT_ROOT/metrics/perclass_f1_with_deltas.csv` ‚Äî Per-class comparison table (all 5 runs)
- `OUT_ROOT/metrics/perclass_f1_comparison.png` ‚Äî Bar chart
- `OUT_ROOT/metrics/delta_heatmap_f1_modalities.png` ‚Äî Delta heatmap
- `OUT_ROOT/metrics/confusion_matrices_all.png` ‚Äî Confusion matrices (up to 5 runs)
- `OUT_ROOT/metrics/control_run_comparison.csv` ‚Äî Control vs ROI analysis
- `OUT_ROOT/metrics/stability_analysis.csv` ‚Äî Multi-seed stability results (if available)

üìù **Thesis Tables (LaTeX + CSV):**
- `OUT_ROOT/metrics/thesis_tables/table1_main_results.tex` ‚Äî Main 5-run results
- `OUT_ROOT/metrics/thesis_tables/table2_roi_effect.tex` ‚Äî ROI effect analysis
- `OUT_ROOT/metrics/thesis_tables/table3_perclass_f1.tex` ‚Äî Per-class F1 scores

**5-Run Experimental Summary:**
| Run | Type | Comparison |
|-----|------|------------|
| 1 | Full Frame | Baseline (all IDs) |
| 2 | Face ROI | ROI extraction |
| 3 | Face+Hands ROI | ROI extraction |
| 4 | Ctrl-FaceSub | Same IDs as Run 2 |
| 5 | Ctrl-FHSub | Same IDs as Run 3 |

**Key comparisons for thesis:**
- Run 3 vs Run 5 ‚Üí Isolates ROI effect for face+hands (same IDs, different input)
- Run 2 vs Run 4 ‚Üí Isolates ROI effect for face only

**Stability Analysis:**
- Add multi-seed runs to Section 8 to assess training variance
- Report mean ¬± std for robust conclusions

**Next steps:**
- Run **05_gradcam.ipynb** for attention visualizations
