# Funny vs. Serious Single Sentence Explorer

**Research question:** Does an LLM represent "funny" and "serious" sentence endings differently, even when both sentences are grammatically valid and make sense?

## Dataset

We use the **funny_serious_150** dataset: 75 pairs of completed sentences where each pair has:
- A **serious** version: `"The dangerous iPhone was arrested and charged with assault."`
- A **funny** version: `"The dangerous iPhone was arrested and charged with battery."`

Both versions are valid English sentences. The only difference is the final word — one has the "pun" word, one has a straightforward word.

## Method

We collect activations from Llama-3.1-70B-Instruct at the **final period** of each sentence, then analyze whether the model's internal representation differs based on whether it just processed a funny or serious completion.

## Contents
1. Load data and extract group labels
2. Compute separation metrics (Fisher, Cohen's d)  
3. Visualizations
4. Cross-validation to check for overfitting

In [None]:
from pathlib import Path
import json
import numpy as np
from IPython.display import HTML, display
import matplotlib.pyplot as plt
%matplotlib inline

# Data loading
from analyze_activations import load_activations, get_pair_indices

# Explicit-argument analysis functions (show exactly what data is needed)
from analyze_activations import (
    mean_difference,           # (X, is_group_a, is_group_b) → direction
    compute_fisher_separation, # (X, is_group_a, is_group_b) → float
    compute_cohens_d,          # (X, is_group_a, is_group_b, direction) → float
)

# For visualization
from analyze_activations import contrastive_projection, holdout_analysis
from puns_viz import make_layer_viz

In [None]:
# Paths
RAW_DIR = Path("results/raw_activations")
DATASET_FILE = Path("datasets/funny_serious_150.json")
META_FILE = RAW_DIR / "llama31_70b_instruct_funnyserious150_pred_c_meta.json"

# Load activations: returns metadata, per-layer activations, and layer indices
meta, layer_data, layer_indices = load_activations(META_FILE)

# Extract the key arrays we need for analysis:
#   pair_ids:    which pair each prompt belongs to (0-74)
#   is_funny:    True for prompts with the pun completion  
#   is_serious:  True for prompts with the serious completion
pair_ids, is_funny, is_serious = get_pair_indices(meta)

print(f"Model: {meta['model']}")
print(f"Layers: {len(layer_indices)} (indices 0-{layer_indices[-1]})")
print(f"Hidden dimension: {meta['hidden_dim']}")
print(f"Total prompts: {len(is_funny)}")
print(f"  - Funny completions: {is_funny.sum()}")
print(f"  - Serious completions: {is_serious.sum()}")

---
## 1. Separation Metrics Across Layers

We'll compute two metrics at each layer:

**Fisher separation** = (distance between group means) / (average within-group spread)
- Measures how well-separated the two groups are in the full high-dimensional space
- Higher = better separation

**Cohen's d** = (mean_funny - mean_serious) / pooled_std, along the "contrastive direction"
- The contrastive direction is simply: mean(funny) - mean(serious), normalized
- Cohen's d tells us the effect size along this optimal separating direction
- Interpretation: 0.2 = small, 0.5 = medium, 0.8 = large, >1.0 = very large

In [None]:
# Compute metrics at each layer
fisher_scores = []
cohens_d_scores = []

for layer_idx in layer_indices:
    # Get activations for this layer: shape (150, 8192)
    X = layer_data[layer_idx]
    
    # Fisher separation: uses the full activation vectors
    fisher = compute_fisher_separation(X, is_funny, is_serious)
    fisher_scores.append(fisher)
    
    # Contrastive direction: the normalized mean difference
    direction = mean_difference(X, is_funny, is_serious)
    
    # Cohen's d: effect size along the contrastive direction
    d = compute_cohens_d(X, is_funny, is_serious, direction)
    cohens_d_scores.append(d)

# Find peak layers
peak_fisher_idx = np.argmax(fisher_scores)
peak_cd_idx = np.argmax(cohens_d_scores)

print(f"Fisher separation peaks at layer {layer_indices[peak_fisher_idx]}: {fisher_scores[peak_fisher_idx]:.3f}")
print(f"Cohen's d peaks at layer {layer_indices[peak_cd_idx]}: {cohens_d_scores[peak_cd_idx]:.2f}")

In [None]:
# Plot both metrics across layers
fig, ax1 = plt.subplots(figsize=(12, 4))

ax1.plot(layer_indices, fisher_scores, color='#E85D75', lw=2, label='Fisher separation')
ax1.set_xlabel('Layer', fontsize=11)
ax1.set_ylabel('Fisher separation', fontsize=11, color='#E85D75')
ax1.tick_params(axis='y', labelcolor='#E85D75')
ax1.spines['top'].set_visible(False)
ax1.spines['right'].set_visible(False)

ax2 = ax1.twinx()
ax2.plot(layer_indices, cohens_d_scores, color='#2EAD6B', lw=2, ls='--', label="Cohen's d")
ax2.set_ylabel("Cohen's d", fontsize=11, color='#2EAD6B')
ax2.tick_params(axis='y', labelcolor='#2EAD6B')
ax2.spines['top'].set_visible(False)

lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, fontsize=9, loc='upper left')
ax1.set_title('Funny vs. Serious: Separation by Layer', fontsize=13, fontweight='bold')
fig.tight_layout()
plt.show()

---
## 2. Visualizations at Peak Layer

Let's look at the actual activations at the layer with highest Cohen's d.

The "contrastive projection" shows:
- **X-axis**: Projection onto the contrastive direction (mean_funny - mean_serious)
- **Y-axis**: First principal component of the residual (what's left after removing the contrastive direction)

Lines connect each pair (funny and serious versions of the same joke).

In [None]:
# Get activations at peak Cohen's d layer
peak_layer = layer_indices[peak_cd_idx]
X_peak = layer_data[peak_layer]

# Compute 2D contrastive projection (uses meta internally for pair structure)
X_proj, components, var_ratios = contrastive_projection(X_peak, meta, n_components=2)

# Plot
fig, ax = plt.subplots(figsize=(10, 8))

# Draw lines connecting each pair
for pid in sorted(set(pair_ids)):
    mask = pair_ids == pid
    if mask.sum() == 2:
        pts = X_proj[mask]
        ax.plot(pts[:, 0], pts[:, 1], color='#888', alpha=0.4, lw=0.8, zorder=1)

# Scatter points
ax.scatter(X_proj[is_serious, 0], X_proj[is_serious, 1], c='#4A90D9', 
           s=40, alpha=0.7, label='Serious', edgecolors='white', lw=0.5, zorder=2)
ax.scatter(X_proj[is_funny, 0], X_proj[is_funny, 1], c='#E85D75',
           s=40, alpha=0.7, label='Funny', edgecolors='white', lw=0.5, zorder=2)

ax.set_xlabel(f'Contrastive direction ({var_ratios[0]:.1%} variance)', fontsize=11)
ax.set_ylabel(f'Residual PC1 ({var_ratios[1]:.1%} variance)', fontsize=11)
ax.set_title(f'Layer {peak_layer}: Contrastive Projection (Cohen\'s d = {cohens_d_scores[peak_cd_idx]:.2f})', 
             fontsize=13, fontweight='bold')
ax.legend(fontsize=10)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
fig.tight_layout()
plt.show()

In [None]:
# 1D histogram: projection onto contrastive direction
direction = mean_difference(X_peak, is_funny, is_serious)
projections = X_peak @ direction

fig, ax = plt.subplots(figsize=(10, 4))
ax.hist(projections[is_serious], bins=15, alpha=0.6, color='#4A90D9', label='Serious', edgecolor='white')
ax.hist(projections[is_funny], bins=15, alpha=0.6, color='#E85D75', label='Funny', edgecolor='white')
ax.set_xlabel('Projection onto contrastive direction', fontsize=11)
ax.set_ylabel('Count', fontsize=11)
ax.set_title(f'Layer {peak_layer}: 1D Projections (Cohen\'s d = {cohens_d_scores[peak_cd_idx]:.2f})',
             fontsize=13, fontweight='bold')
ax.legend(fontsize=10)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
fig.tight_layout()
plt.show()

---
## 3. Interactive 3D Layer Explorer

- **Drag** to rotate
- **Scroll/pinch** to zoom
- **Shift-drag** to pan
- **Layer slider** to move through all layers

In [None]:
# Note: pred_file=False skips loading predictions because pun boost isn't meaningful
# for this dataset - these are completed sentences, not cloze predictions.
html = make_layer_viz(META_FILE, pred_file=False, width=900, height=600)
HTML(html)

---
## 4. Cross-Validation: Are We Overfitting?

The contrastive direction is computed from the same data we use to measure Cohen's d. This could overfit — we might find a direction that separates *these specific* prompts but wouldn't generalize.

**Holdout analysis**: Split pairs into two groups, compute the contrastive direction from one group, measure Cohen's d on the other. If the cross-validated Cohen's d is close to the full-data Cohen's d, the separation is real (not overfit).

In [None]:
# Run holdout analysis (splits pairs, trains direction on half, tests on other half)
holdout = holdout_analysis(layer_data, meta, n_splits=2, seed=42)

fig, ax = plt.subplots(figsize=(12, 4))
ax.plot(holdout['layer_indices'], holdout['cohens_d_full'], color='#2EAD6B', lw=2, label='Full data')
ax.plot(holdout['layer_indices'], holdout['cohens_d_cv'], color='#E85D75', lw=2, ls='--', label='Cross-validated')
ax.set_xlabel('Layer', fontsize=11)
ax.set_ylabel("Cohen's d", fontsize=11)
ax.set_title("Full Data vs. Cross-Validated Cohen's d", fontsize=13, fontweight='bold')
ax.legend(fontsize=10)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
fig.tight_layout()
plt.show()

# Report the gap at peak layer
cv_at_peak = holdout['cohens_d_cv'][peak_cd_idx]
full_at_peak = holdout['cohens_d_full'][peak_cd_idx]
print(f"\nAt peak layer {peak_layer}:")
print(f"  Full-data Cohen's d: {full_at_peak:.2f}")
print(f"  Cross-validated:     {cv_at_peak:.2f}")
print(f"  Ratio (CV/full):     {cv_at_peak/full_at_peak:.1%}")

---
## 5. Sample Prompts

View some example prompt pairs to understand the dataset.

In [9]:
with open(DATASET_FILE) as f:
    dataset = json.load(f)

print("Sample prompt pairs:\n")
for i in range(0, min(10, len(dataset)), 2):
    serious = dataset[i]
    funny = dataset[i+1]
    print(f"Pair {serious['pair_id']}:")
    print(f"  Serious: \"{serious['prompt']}\"")
    print(f"  Funny:   \"{funny['prompt']}\"")
    print()

Sample prompt pairs:

Pair 0:
  Serious: "The dangerous iPhone was arrested and charged with assault."
  Funny:   "The dangerous iPhone was arrested and charged with battery."

Pair 1:
  Serious: "The bread was afraid to be best man, because he would have to make a speech."
  Funny:   "The bread was afraid to be best man, because he would have to make a toast."

Pair 2:
  Serious: "The wizard ran a fantastic music school because he had a great passion."
  Funny:   "The wizard ran a fantastic music school because he had a great staff."

Pair 3:
  Serious: "The mushroom's salary never went up because there was always a ceiling."
  Funny:   "The mushroom's salary never went up because there was always a cap."

Pair 4:
  Serious: "The vulture boarded the airplane with two dead raccoons but was told he was only allowed one carry-on."
  Funny:   "The vulture boarded the airplane with two dead raccoons but was told he was only allowed one carrion."

