# Pythia Family Sweep: Residual Stream Gain (H25 Validation)

## CORRECTED VERSION - Without Final LayerNorm Artifact

**Paper #3 Experiment:** Dimensional Crowding Hypothesis Validation

**Critical Fix:** Previous versions measured `hidden_states[-1] / hidden_states[-2]`, which INCLUDES the final LayerNorm. This caused ALL models to appear as "dampening" due to the normalization artifact.

**Correct Methodology:**
```
WRONG:   G = ||hidden_states[-1]|| / ||hidden_states[-2]||  (includes final LN)
CORRECT: G = ||hidden_states[-2]|| / ||hidden_states[-3]||  (true last layer)
```

**Primary Question:** Is the Pythia dampening pattern consistent across ALL 8 Pythia sizes?

**H25 Prediction:**
- High œÅ (n_heads/d_head ‚â• 0.2) ‚Üí Dampening (G < 1.0)
- Low œÅ (n_heads/d_head < 0.2) ‚Üí Expansion (G > 1.0)

**Expected Pattern:** Dampening should correlate with œÅ

In [None]:
# Install dependencies
!pip install transformers torch matplotlib numpy scipy --quiet

In [None]:
import torch
import numpy as np
import matplotlib.pyplot as plt
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig
import json
from datetime import datetime
import warnings
import gc
from scipy import stats
warnings.filterwarnings('ignore')

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    gpu_mem = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f"GPU: {gpu_name}")
    print(f"GPU memory: {gpu_mem:.1f} GB")

In [None]:
# All 8 Pythia models with architectural details
PYTHIA_MODELS = {
    'pythia-70m': {'params': 70e6, 'n_layers': 6, 'n_heads': 8, 'd_model': 512, 'd_head': 64, 'memory_gb': 0.5},
    'pythia-160m': {'params': 160e6, 'n_layers': 12, 'n_heads': 12, 'd_model': 768, 'd_head': 64, 'memory_gb': 1},
    'pythia-410m': {'params': 410e6, 'n_layers': 24, 'n_heads': 16, 'd_model': 1024, 'd_head': 64, 'memory_gb': 2},
    'pythia-1b': {'params': 1e9, 'n_layers': 16, 'n_heads': 8, 'd_model': 2048, 'd_head': 256, 'memory_gb': 4},
    'pythia-1.4b': {'params': 1.4e9, 'n_layers': 24, 'n_heads': 16, 'd_model': 2048, 'd_head': 128, 'memory_gb': 6},
    'pythia-2.8b': {'params': 2.8e9, 'n_layers': 32, 'n_heads': 32, 'd_model': 2560, 'd_head': 80, 'memory_gb': 10},
    'pythia-6.9b': {'params': 6.9e9, 'n_layers': 32, 'n_heads': 32, 'd_model': 4096, 'd_head': 128, 'memory_gb': 20},
    'pythia-12b': {'params': 12e9, 'n_layers': 36, 'n_heads': 40, 'd_model': 5120, 'd_head': 128, 'memory_gb': 30},
}

# Compute œÅ for each model
for name, config in PYTHIA_MODELS.items():
    config['rho'] = config['n_heads'] / config['d_head']

# Print rho values
print("œÅ (Head Density) for Pythia Family:")
print("-" * 40)
for name, config in sorted(PYTHIA_MODELS.items(), key=lambda x: x[1]['rho']):
    print(f"{name:15} | œÅ = {config['n_heads']}/{config['d_head']} = {config['rho']:.4f}")

In [None]:
# Select models based on available GPU memory
if torch.cuda.is_available():
    available_mem = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f"\nAvailable GPU memory: {available_mem:.1f} GB")
    
    MODELS_TO_TEST = []
    for name, config in PYTHIA_MODELS.items():
        if config['memory_gb'] < (available_mem - 2):
            MODELS_TO_TEST.append(name)
    
    print(f"Models to test: {MODELS_TO_TEST}")
else:
    MODELS_TO_TEST = ['pythia-70m', 'pythia-160m', 'pythia-410m']
    print(f"CPU mode - testing small models only: {MODELS_TO_TEST}")

In [None]:
# Test prompts (same as 8-model benchmark)
TEST_PROMPTS = [
    # Factual
    "The capital of France is",
    "Water freezes at a temperature of",
    "The largest planet in our solar system is",
    "Einstein is famous for the theory of",
    "The chemical symbol for gold is",
    # Syntactic
    "The man who saw the woman who wore the hat that was red and had feathers left the party early because",
    "Despite the rain that had been falling for three days straight the team decided to continue their journey through the forest which seemed to stretch on endlessly and the leader said",
    "After the meeting that was scheduled for Tuesday but moved to Wednesday due to the holiday which fell on Monday the committee announced their decision which was to",
    "The book which the author who won the prize wrote during the summer after the accident happened tells the story of a young girl who discovers that she has the ability to",
    "Although the evidence suggested otherwise and the witnesses testified against him and the prosecutor demanded the harshest penalty the jury surprisingly decided to",
    # Cliche
    "Actions speak louder than",
    "The early bird catches the",
    "A stitch in time saves",
    "When in Rome do as the Romans",
    "Birds of a feather flock",
    # Novel
    "The epistemological implications of quantum decoherence suggest that consciousness might be fundamentally",
    "The Voynich manuscript's undeciphered text has led some researchers to propose that it represents a constructed language designed to",
    "The Banach-Tarski paradox demonstrates that in mathematics with the axiom of choice one can decompose a sphere and reassemble it into",
    "The Mpemba effect remains controversial because it challenges our intuition about thermal dynamics by suggesting that under certain conditions hot water can",
    "The Riemann hypothesis if proven true would have profound implications for our understanding of the distribution of",
    # Nonsense
    "Table sky run blue jump",
    "Syntax of purple dreams calculates the",
    "When squared thoughts evaporate into crystalline networks the",
    "If democracy could photosynthesize under marginal propensity then",
    "The hypotenuse of existential dread interpolates between"
]

print(f"Using {len(TEST_PROMPTS)} test prompts")

In [None]:
def compute_residual_gain(model, tokenizer, prompts):
    """
    Compute Residual Stream Gain - CORRECTED VERSION (no final LayerNorm)
    
    WRONG:   G = ||hidden_states[-1]|| / ||hidden_states[-2]||  (includes final LN)
    CORRECT: G = ||hidden_states[-2]|| / ||hidden_states[-3]||  (true last layer)
    
    Returns: 
        - gain_no_ln_mean: Mean gain WITHOUT final LN (correct)
        - gain_no_ln_std: Std of gains
        - gain_with_ln_mean: Mean gain WITH final LN (for comparison)
        - all_gains_no_ln: All individual gains without LN
        - all_layer_gains: Full layer-by-layer gains for analysis
    """
    gains_no_ln = []
    gains_with_ln = []
    all_layer_gains_list = []
    
    for prompt in prompts:
        inputs = tokenizer(prompt, return_tensors="pt")
        if torch.cuda.is_available():
            inputs = {k: v.to(model.device) for k, v in inputs.items()}
        
        with torch.no_grad():
            outputs = model(**inputs, output_hidden_states=True)
        
        hidden_states = outputs.hidden_states  # (n_layers+1,) tuple
        
        # Compute norms for all hidden states (last token)
        norms = []
        for h in hidden_states:
            norm = torch.norm(h[:, -1, :].float(), dim=-1).item()
            norms.append(norm)
        
        # Compute all layer-by-layer gains
        layer_gains = []
        for i in range(1, len(norms)):
            gain = norms[i] / (norms[i-1] + 1e-10)
            layer_gains.append(gain)
        
        all_layer_gains_list.append(layer_gains)
        
        # WRONG metric (includes final LN): last gain
        gain_with_ln = layer_gains[-1]
        gains_with_ln.append(gain_with_ln)
        
        # CORRECT metric (no final LN): second-to-last gain
        gain_no_ln = layer_gains[-2] if len(layer_gains) >= 2 else layer_gains[-1]
        gains_no_ln.append(gain_no_ln)
    
    # Average layer gains across prompts
    avg_layer_gains = np.mean(all_layer_gains_list, axis=0).tolist()
    
    return {
        'gain_no_ln_mean': float(np.mean(gains_no_ln)),
        'gain_no_ln_std': float(np.std(gains_no_ln)),
        'gain_with_ln_mean': float(np.mean(gains_with_ln)),
        'gain_with_ln_std': float(np.std(gains_with_ln)),
        'all_gains_no_ln': [float(g) for g in gains_no_ln],
        'all_gains_with_ln': [float(g) for g in gains_with_ln],
        'all_layer_gains': avg_layer_gains
    }

In [None]:
def analyze_model(model_name):
    """Analyze a single Pythia model with CORRECTED methodology."""
    full_name = f"EleutherAI/{model_name}"
    print(f"\n{'='*60}")
    print(f"Analyzing: {full_name}")
    print(f"{'='*60}")
    
    # Get config first (for metadata)
    config = AutoConfig.from_pretrained(full_name)
    
    # Load model
    tokenizer = AutoTokenizer.from_pretrained(full_name)
    model = AutoModelForCausalLM.from_pretrained(
        full_name,
        torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
        device_map="auto" if torch.cuda.is_available() else None,
        low_cpu_mem_usage=True
    )
    model.eval()
    
    # Get architecture details
    n_layers = config.num_hidden_layers
    n_heads = config.num_attention_heads
    d_model = config.hidden_size
    d_head = d_model // n_heads
    rho = n_heads / d_head
    rotary_pct = getattr(config, 'rotary_pct', None) or getattr(config, 'rotary_percent', 0.25)
    
    print(f"Layers: {n_layers}, Heads: {n_heads}, d_model: {d_model}, d_head: {d_head}")
    print(f"œÅ = {n_heads}/{d_head} = {rho:.4f}")
    
    # Compute residual gain with CORRECTED methodology
    gain_results = compute_residual_gain(model, tokenizer, TEST_PROMPTS)
    
    # Extract metrics
    gain_no_ln = gain_results['gain_no_ln_mean']
    gain_with_ln = gain_results['gain_with_ln_mean']
    
    print(f"\nüî¨ RESIDUAL STREAM GAIN:")
    print(f"   WITH final LN (WRONG):    {gain_with_ln:.4f} ¬± {gain_results['gain_with_ln_std']:.4f}")
    print(f"   WITHOUT final LN (CORRECT): {gain_no_ln:.4f} ¬± {gain_results['gain_no_ln_std']:.4f}")
    
    is_dampening = gain_no_ln < 1.0
    if is_dampening:
        print(f"   ‚Üí DAMPENING (G < 1.0)")
    else:
        print(f"   ‚Üí EXPANSION (G > 1.0)")
    
    # IMPORTANT: Convert all values to native Python types for JSON serialization
    results = {
        'model': str(model_name),
        'n_layers': int(n_layers),
        'n_heads': int(n_heads),
        'd_model': int(d_model),
        'd_head': int(d_head),
        'rho': float(rho),
        'rotary_pct': float(rotary_pct) if rotary_pct is not None else 0.25,
        # CORRECT metric (without final LN)
        'residual_gain_mean': float(gain_no_ln),
        'residual_gain_std': float(gain_results['gain_no_ln_std']),
        'residual_gain_all': gain_results['all_gains_no_ln'],
        # For comparison: WITH final LN (wrong)
        'gain_with_ln_mean': float(gain_with_ln),
        'gain_with_ln_std': float(gain_results['gain_with_ln_std']),
        # Layer-by-layer for analysis
        'all_layer_gains': gain_results['all_layer_gains'],
        'is_dampening': bool(is_dampening)
    }
    
    # Cleanup
    del model, tokenizer
    gc.collect()
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
    
    return results

In [None]:
# Run analysis on all selected models
all_results = []

for model_name in MODELS_TO_TEST:
    try:
        results = analyze_model(model_name)
        all_results.append(results)
    except Exception as e:
        print(f"Error analyzing {model_name}: {e}")
        continue

print(f"\n\nSuccessfully analyzed {len(all_results)} models")

In [None]:
# Summary Table - CORRECTED VERSION
print("\n" + "=" * 100)
print("PYTHIA FAMILY RESIDUAL STREAM GAIN SUMMARY (CORRECTED - No Final LayerNorm)")
print("=" * 100)
print(f"\n{'Model':<15} {'œÅ':>8} {'G (no LN)':>12} {'G (with LN)':>12} {'Status':>12}")
print("-" * 60)

for r in sorted(all_results, key=lambda x: x['rho']):
    status = "DAMPENING" if r['is_dampening'] else "EXPANSION"
    marker = "üîµ" if r['is_dampening'] else "üî¥"
    gain_no_ln = r['residual_gain_mean']
    gain_with_ln = r.get('gain_with_ln_mean', 'N/A')
    
    if isinstance(gain_with_ln, float):
        print(f"{r['model']:<15} {r['rho']:>8.4f} {gain_no_ln:>12.4f} {gain_with_ln:>12.4f} {marker} {status}")
    else:
        print(f"{r['model']:<15} {r['rho']:>8.4f} {gain_no_ln:>12.4f} {'N/A':>12} {marker} {status}")

print("\n‚ö†Ô∏è  'G (no LN)' is the CORRECT metric (without final LayerNorm)")
print("   'G (with LN)' is shown for comparison only (includes LN artifact)")

In [None]:
# Correlation Analysis: œÅ vs Gain
if len(all_results) >= 3:
    rhos = [r['rho'] for r in all_results]
    gains = [r['residual_gain_mean'] for r in all_results]
    
    # Pearson correlation
    corr, p_value = stats.pearsonr(rhos, gains)
    
    print("\n" + "=" * 60)
    print("CORRELATION ANALYSIS: œÅ vs Residual Gain")
    print("=" * 60)
    print(f"\nPearson r = {corr:.4f}")
    print(f"p-value = {p_value:.4e}")
    
    if corr < 0 and p_value < 0.05:
        print(f"\n‚úÖ H25 VALIDATED: Higher œÅ ‚Üí Lower Gain (Dampening)")
    elif corr < 0:
        print(f"\n‚ö†Ô∏è H25 TREND SUPPORTED but p > 0.05 (needs more data points)")
    else:
        print(f"\n‚ùå H25 NOT SUPPORTED: No negative correlation")
else:
    print("\n‚ö†Ô∏è Need at least 3 models for correlation analysis")

In [None]:
# Visualization
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
fig.suptitle('Pythia Family: Head Density (œÅ) vs Residual Stream Gain', fontsize=14, fontweight='bold')

# Sort by rho
sorted_results = sorted(all_results, key=lambda x: x['rho'])
rhos = [r['rho'] for r in sorted_results]
gains = [r['residual_gain_mean'] for r in sorted_results]
stds = [r['residual_gain_std'] for r in sorted_results]
names = [r['model'].replace('pythia-', '') for r in sorted_results]

# Panel 1: œÅ vs Gain scatter
ax1 = axes[0]
colors = ['blue' if g < 1.0 else 'red' for g in gains]
ax1.scatter(rhos, gains, c=colors, s=100, zorder=5)
ax1.errorbar(rhos, gains, yerr=stds, fmt='none', ecolor='gray', alpha=0.5)

# Annotate
for i, name in enumerate(names):
    ax1.annotate(name, (rhos[i], gains[i]), textcoords="offset points", xytext=(5, 5), fontsize=8)

ax1.axhline(y=1.0, color='black', linestyle='--', alpha=0.5, label='G=1.0 (Bentov Point)')
ax1.set_xlabel('œÅ = n_heads / d_head')
ax1.set_ylabel('Residual Stream Gain')
ax1.set_title('œÅ vs Gain')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Panel 2: Bar chart by model
ax2 = axes[1]
x = np.arange(len(names))
bars = ax2.bar(x, gains, color=colors, alpha=0.7, yerr=stds, capsize=3)
ax2.axhline(y=1.0, color='black', linestyle='--', alpha=0.5)
ax2.set_xticks(x)
ax2.set_xticklabels(names, rotation=45)
ax2.set_xlabel('Model (sorted by œÅ)')
ax2.set_ylabel('Residual Stream Gain')
ax2.set_title('Gain by Model Size')
ax2.grid(True, alpha=0.3, axis='y')

# Panel 3: œÅ correlation with trend line
ax3 = axes[2]
ax3.scatter(rhos, gains, c=colors, s=100, zorder=5)

if len(rhos) >= 2:
    # Linear regression
    slope, intercept, r_value, p_value, std_err = stats.linregress(rhos, gains)
    x_fit = np.linspace(min(rhos) * 0.9, max(rhos) * 1.1, 100)
    y_fit = slope * x_fit + intercept
    ax3.plot(x_fit, y_fit, 'g--', linewidth=2, label=f'r = {r_value:.3f}, p = {p_value:.3e}')

ax3.axhline(y=1.0, color='black', linestyle='--', alpha=0.5)
ax3.set_xlabel('œÅ = n_heads / d_head')
ax3.set_ylabel('Residual Stream Gain')
ax3.set_title('Correlation: œÅ ‚Üí Dampening?')
ax3.legend()
ax3.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('pythia_family_residual_gain.png', dpi=150, bbox_inches='tight')
plt.show()

print("\nSaved: pythia_family_residual_gain.png")

In [None]:
# H25 Verdict
print("\n" + "=" * 70)
print("H25 VALIDATION: DIMENSIONAL CROWDING HYPOTHESIS")
print("=" * 70)

# Check if all high-œÅ models dampen
high_rho_models = [r for r in all_results if r['rho'] >= 0.2]
low_rho_models = [r for r in all_results if r['rho'] < 0.2]

print(f"\nHigh œÅ models (‚â• 0.2): {len(high_rho_models)}")
for r in high_rho_models:
    status = "DAMPEN" if r['is_dampening'] else "EXPAND"
    print(f"  {r['model']}: œÅ = {r['rho']:.4f}, G = {r['residual_gain_mean']:.4f} ‚Üí {status}")

print(f"\nLow œÅ models (< 0.2): {len(low_rho_models)}")
for r in low_rho_models:
    status = "DAMPEN" if r['is_dampening'] else "EXPAND"
    print(f"  {r['model']}: œÅ = {r['rho']:.4f}, G = {r['residual_gain_mean']:.4f} ‚Üí {status}")

# Compute verdict
high_rho_dampen_rate = sum(1 for r in high_rho_models if r['is_dampening']) / max(len(high_rho_models), 1)
low_rho_expand_rate = sum(1 for r in low_rho_models if not r['is_dampening']) / max(len(low_rho_models), 1)

print(f"\nHigh œÅ dampening rate: {high_rho_dampen_rate*100:.0f}%")
print(f"Low œÅ expansion rate: {low_rho_expand_rate*100:.0f}%")

if high_rho_dampen_rate > 0.5 or (len(all_results) >= 3 and corr < -0.3):
    print(f"\n‚úÖ H25 VALIDATED: Dimensional Crowding ‚Üí Dampening")
    verdict = "VALIDATED"
elif len(all_results) < 3:
    print(f"\n‚ö†Ô∏è INCONCLUSIVE: Need more data points (only {len(all_results)} models)")
    verdict = "INCONCLUSIVE"
else:
    print(f"\n‚ùå H25 NOT VALIDATED")
    verdict = "NOT_VALIDATED"

In [None]:
# Save results
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')

# Handle case where correlation wasn't computed (< 3 models)
correlation_value = None
p_value_value = None
if len(all_results) >= 3:
    try:
        correlation_value = float(corr)
        p_value_value = float(p_value)
    except NameError:
        pass

results_data = {
    'experiment': 'Pythia Family Residual Stream Gain Sweep - CORRECTED (No Final LN)',
    'hypothesis': 'H25: Dimensional Crowding ‚Üí Dampening',
    'methodology': {
        'correct_metric': 'hidden_states[-2] / hidden_states[-3] (true last layer, no final LN)',
        'wrong_metric': 'hidden_states[-1] / hidden_states[-2] (includes final LN artifact)'
    },
    'date': datetime.now().isoformat(),
    'n_models': len(all_results),
    'n_prompts': len(TEST_PROMPTS),
    'models': all_results,
    'analysis': {
        'rho_gain_correlation': correlation_value,
        'correlation_p_value': p_value_value,
        'high_rho_dampen_rate': float(high_rho_dampen_rate),
        'low_rho_expand_rate': float(low_rho_expand_rate)
    },
    'verdict': str(verdict)
}

filename = f'pythia_family_NO_FINAL_LN_{timestamp}.json'
with open(filename, 'w') as f:
    json.dump(results_data, f, indent=2)

print(f"\nSaved: {filename}")

In [None]:
# Create archive and auto-download
import zipfile

archive_name = f'pythia_family_NO_FINAL_LN_{timestamp}.zip'

with zipfile.ZipFile(archive_name, 'w') as zf:
    zf.write(filename)
    zf.write('pythia_family_residual_gain.png')

print(f"Created archive: {archive_name}")

# Auto-download in Colab
try:
    from google.colab import files
    print("\nStarting automatic downloads...")
    files.download(filename)
    files.download('pythia_family_residual_gain.png')
    files.download(archive_name)
    print("Downloads triggered!")
except ImportError:
    print("\nNot running in Colab - manual download required.")

In [None]:
# Final Summary
print("\n" + "=" * 70)
print("FINAL SUMMARY: Pythia Family - CORRECTED (No Final LayerNorm)")
print("=" * 70)

print(f"\nüìä Models Tested: {len(all_results)}")
print(f"üìù Prompts per Model: {len(TEST_PROMPTS)}")

dampening_count = sum(1 for r in all_results if r['is_dampening'])
expansion_count = len(all_results) - dampening_count

print(f"\nüîµ DAMPENING (G < 1.0): {dampening_count} models")
print(f"üî¥ EXPANSION (G > 1.0): {expansion_count} models")

if len(all_results) >= 3:
    try:
        print(f"\nüìà œÅ vs Gain Correlation: r = {corr:.4f} (p = {p_value:.4e})")
    except NameError:
        print("\nüìà œÅ vs Gain Correlation: not computed")

print(f"\nüéØ H25 VERDICT: {verdict}")

print(f"\nüìÅ Output Files:")
print(f"   ‚Ä¢ {filename}")
print(f"   ‚Ä¢ pythia_family_residual_gain.png")
print(f"   ‚Ä¢ {archive_name}")

print(f"\n‚ö†Ô∏è  METHODOLOGY: Using CORRECT metric (no final LayerNorm)")
print(f"   This excludes the LayerNorm artifact that caused all models to appear as 'dampening'.")