# Interactive Contrastive Activation Explorer

This notebook demonstrates two features:

1. **Interactive 3D visualization** — Contrastive projections across all 80 layers with rotation, layer slider, hover tooltips, pair lines, and pun-boost star markers.
2. **Holdout analysis** — Split-half cross-validation showing that the contrastive direction generalizes to unseen pairs.

In [None]:
from pathlib import Path
import numpy as np
from IPython.display import HTML, display
import matplotlib.pyplot as plt

from analyze_activations import (
    load_activations, analyze_all_layers,
    stable_contrastive_projections, holdout_analysis,
)
from puns_viz import make_layer_viz, layer_scatter_3d

In [None]:
META_FILE = Path("results/raw_activations/llama31_70b_instruct_pred_c_meta.json")
meta, layer_data, layer_indices = load_activations(META_FILE)
print(f"Loaded {len(layer_indices)} layers, {meta['n_prompts']} prompts")
print(f"Hidden dim: {meta['hidden_dim']}, Model: {meta['model']}")

## Part 1: Interactive 3D Layer Visualization

Drag to rotate, use the slider to move through layers, hover for tooltips, click to highlight pairs.

In [None]:
html = make_layer_viz(META_FILE)
HTML(html)

## Part 2: Holdout Analysis

Does the contrastive direction generalize? We split the 50 pairs into two halves, compute the contrastive direction from one half, and evaluate Cohen's d on the other half.

In [None]:
holdout = holdout_analysis(layer_data, meta, n_splits=2, seed=42)

print(f"Layers analyzed: {len(holdout['layer_indices'])}")
peak_full_idx = np.argmax(holdout['cohens_d_full'])
peak_cv_idx = np.argmax(holdout['cohens_d_cv'])
print(f"Peak full-data Cohen's d: {holdout['cohens_d_full'][peak_full_idx]:.2f} "
      f"at layer {holdout['layer_indices'][peak_full_idx]}")
print(f"Peak cross-validated Cohen's d: {holdout['cohens_d_cv'][peak_cv_idx]:.2f} "
      f"at layer {holdout['layer_indices'][peak_cv_idx]}")

In [None]:
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8), sharex=True)

layers = holdout["layer_indices"]

# Panel 1: Cohen's d comparison
ax1.plot(layers, holdout["cohens_d_full"], color="#2EAD6B", linewidth=2,
         label="Full data (50 pairs)")
ax1.plot(layers, holdout["cohens_d_cv"], color="#E85D75", linewidth=2,
         linestyle="--", label="Cross-validated (25-pair holdout)")
ax1.set_ylabel("Cohen's d", fontsize=12)
ax1.legend(fontsize=10, loc="upper left")
ax1.set_title("Contrastive Direction: Full Data vs. Cross-Validated",
              fontsize=14, fontweight="bold")
ax1.spines["top"].set_visible(False)
ax1.spines["right"].set_visible(False)
ax1.axhline(y=0, color="#555", linewidth=0.5)

# Panel 2: Direction angle
ax2.plot(layers, holdout["direction_angle"], color="#7B68EE", linewidth=2)
ax2.set_xlabel("Layer", fontsize=12)
ax2.set_ylabel("Angle between fold\nand full-data direction (deg)", fontsize=11)
ax2.set_title("Direction Stability Across Folds", fontsize=14, fontweight="bold")
ax2.spines["top"].set_visible(False)
ax2.spines["right"].set_visible(False)

fig.tight_layout()
plt.show()

### Interpretation

If the cross-validated Cohen's d closely tracks the full-data curve, the contrastive direction generalizes across pairs — it reflects a systematic funny-vs-straight encoding, not overfitting to specific pairs.

The direction angle plot shows how stable the contrastive direction is: small angles mean the direction computed from half the data agrees well with the direction from all the data.

In [None]:
# Summary statistics
ratio = holdout["cohens_d_cv"] / np.maximum(holdout["cohens_d_full"], 0.01)
# Focus on layers where there's meaningful separation (d > 0.5)
sig_mask = holdout["cohens_d_full"] > 0.5
if sig_mask.any():
    print(f"Layers with Cohen's d > 0.5: {sig_mask.sum()}")
    print(f"Mean CV/full ratio at those layers: {ratio[sig_mask].mean():.2f}")
    print(f"Mean direction angle at those layers: {holdout['direction_angle'][sig_mask].mean():.1f} deg")
else:
    print("No layers with Cohen's d > 0.5 found.")