# Example Predictions Figure

**STATUS: DEFERRED** - Waiting for evaluations to complete.

This notebook will generate a figure comparing example predictions across architectures:
- Side-by-side comparison of model outputs
- Ground truth vs predicted grids
- Highlighting differences between encoder types

Output: `docs/project-report/figures/example_predictions.png`

In [None]:
import sys
from pathlib import Path

sys.path.insert(0, str(Path.cwd()))

import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import numpy as np
import json

from figure_utils import (
    setup_paper_style,
    save_figure,
    COLORS,
)

# Apply paper styling
setup_paper_style()

## TODO: Implementation

When evaluations complete, implement the following:

1. **Load evaluation results** from each final experiment (F1-F4)
2. **Select representative puzzles** showing:
   - Cases where models succeed
   - Cases where models fail differently
   - Cases highlighting encoder architecture differences
3. **Create visualization** showing:
   - Input grid
   - Ground truth output
   - Each model's prediction
   - Difference highlighting

In [None]:
# ARC color palette (official)
ARC_COLORS = {
    0: '#000000',  # Black
    1: '#0074D9',  # Blue
    2: '#FF4136',  # Red
    3: '#2ECC40',  # Green
    4: '#FFDC00',  # Yellow
    5: '#AAAAAA',  # Gray
    6: '#F012BE',  # Fuchsia
    7: '#FF851B',  # Orange
    8: '#7FDBFF',  # Teal
    9: '#870C25',  # Brown
}

def plot_grid(ax, grid, title=None):
    """Plot a single ARC grid."""
    grid = np.array(grid)
    h, w = grid.shape
    
    # Create colored image
    img = np.zeros((h, w, 3))
    for val, hex_color in ARC_COLORS.items():
        r = int(hex_color[1:3], 16) / 255
        g = int(hex_color[3:5], 16) / 255
        b = int(hex_color[5:7], 16) / 255
        mask = grid == val
        img[mask] = [r, g, b]
    
    ax.imshow(img, interpolation='nearest')
    
    # Add grid lines
    ax.set_xticks(np.arange(-0.5, w, 1), minor=True)
    ax.set_yticks(np.arange(-0.5, h, 1), minor=True)
    ax.grid(which='minor', color='white', linewidth=0.5)
    ax.tick_params(which='both', size=0)
    ax.set_xticks([])
    ax.set_yticks([])
    
    if title:
        ax.set_title(title, fontsize=8)
    
    return ax

In [None]:
# Placeholder: Example figure structure
# This shows the intended layout

fig, axes = plt.subplots(2, 5, figsize=(10, 4))

# Row labels
row_labels = ['Puzzle 1', 'Puzzle 2']
col_labels = ['Input', 'Ground Truth', 'F1: Standard', 'F2: Hybrid', 'F3: ETRMTRM']

# Create placeholder grids
for i, row_ax in enumerate(axes):
    for j, ax in enumerate(row_ax):
        # Placeholder grid
        grid = np.random.randint(0, 10, (5, 5))
        plot_grid(ax, grid)
        
        if i == 0:
            ax.set_title(col_labels[j], fontsize=8)
        if j == 0:
            ax.set_ylabel(row_labels[i], fontsize=8)

plt.suptitle('Example Predictions (PLACEHOLDER)', fontsize=10)
plt.tight_layout()
plt.show()

print("\n⚠️  This is a placeholder figure with random data.")
print("    Re-run this notebook after evaluations complete.")

## Data Loading (TODO)

After evaluations complete, load predictions from:
- `outputs/etrm-final/F1_standard/predictions/`
- `outputs/etrm-final/F2_hybrid_var/predictions/`
- `outputs/etrm-final/F3_etrmtrm/predictions/`
- `outputs/etrm-final/F4_lpn_var/predictions/`

In [None]:
# TODO: Load actual predictions when available
# 
# def load_predictions(experiment_name):
#     """Load predictions from evaluation output."""
#     pred_dir = Path(f"../../outputs/etrm-final/{experiment_name}/predictions")
#     if not pred_dir.exists():
#         print(f"Predictions not found: {pred_dir}")
#         return None
#     
#     predictions = {}
#     for pred_file in pred_dir.glob("*.json"):
#         puzzle_id = pred_file.stem
#         with open(pred_file) as f:
#             predictions[puzzle_id] = json.load(f)
#     
#     return predictions

In [None]:
print("\n" + "="*60)
print("STATUS: WAITING FOR EVALUATIONS")
print("="*60)
print("\nThis notebook is a placeholder.")
print("Once evaluations complete for F1-F4, re-run to generate:")
print("  - docs/project-report/figures/example_predictions.png")
print("\nThe figure will compare predictions across architectures")
print("to highlight differences in model behavior.")