# Anisotropy Profile Measurement: Gemma-2B

**Paper #3 Empirical Validation - Gauge Theory Test**

## Prediction (Gauge-Theorie via Gemini Deep Research)

**Hypothesis:** RMSNorm acts as "gauge fixing" that projects onto hypersphere S^{d-1}, restricting diffusion to angular coordinates only.

**Expected Difference from Pythia:**
- **Pythia (LayerNorm):** Clear Bell Curve, strong inversion
- **Gemma (RMSNorm):** Flatter profile, subtle/no inversion

**Why?** RMSNorm preserves radial degrees of freedom less than LayerNorm. This results in:
- Embeddings constrained to unit sphere
- Only angular diffusion possible (Connection Laplacian)
- Prevents representation collapse, but makes inversion subtler

**Reference:** Gemini Deep Research Report (2026-01-04), Section on Architectural Gauge Theory

**Author:** Davide D'Elia  
**Date:** 2026-01-04

## 1. Setup & Authentication

**Important:** Gemma models require HuggingFace authentication and license acceptance.

1. Go to https://huggingface.co/google/gemma-2b
2. Accept the license agreement
3. Create a token at https://huggingface.co/settings/tokens

In [None]:
# Install dependencies
!pip install -q transformers accelerate einops scipy matplotlib seaborn huggingface_hub

In [None]:
# Authenticate with HuggingFace (required for Gemma)
from huggingface_hub import login

# Option 1: Interactive login (paste token when prompted)
login()

# Option 2: Direct token (uncomment and paste your token)
# login(token="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxx")

In [None]:
import torch
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from transformers import AutoModelForCausalLM, AutoTokenizer
from scipy import stats
from tqdm.auto import tqdm
import warnings
warnings.filterwarnings('ignore')

# Set style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('husl')

print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

## 2. Load Model

**Architecture Details:**
- **Normalization:** RMSNorm (NOT LayerNorm)
- **Position Encoding:** RoPE (Rotary Position Embeddings)
- **Activation:** GeGLU

All three contribute to "gauge fixing" that may flatten the anisotropy profile.

In [None]:
# Model configuration
MODEL_NAME = "google/gemma-2b"
# Alternatives:
# MODEL_NAME = "google/gemma-2-2b"  # Gemma 2 variant
# MODEL_NAME = "google/gemma-7b"    # Larger variant (needs more VRAM)

print(f"Loading {MODEL_NAME}...")
print("(This may take a few minutes for first download)")

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.float16,
    device_map="auto",
    output_hidden_states=True
)
model.eval()

n_layers = model.config.num_hidden_layers
hidden_dim = model.config.hidden_size

print(f"\nLoaded: {n_layers} layers, {hidden_dim} hidden dim")
print(f"\nArchitecture Features:")
print(f"  - Normalization: RMSNorm (gauge fixing)")
print(f"  - Position Encoding: RoPE (flat connection)")
print(f"  - Activation: GeGLU (cohomological gating)")

## 3. Define Test Prompts

Same prompts as Pythia for fair comparison.

In [None]:
# Diverse prompts for robust measurement (SAME AS PYTHIA)
TEST_PROMPTS = [
    # Factual
    "The capital of France is Paris, which is known for",
    "Water boils at 100 degrees Celsius under standard",
    "The speed of light in a vacuum is approximately",
    
    # Reasoning
    "If all mammals are warm-blooded and whales are mammals, then",
    "The probability of rolling a six on a fair die is",
    
    # Creative
    "Once upon a time in a faraway kingdom, there lived",
    "The sunset painted the sky in shades of orange and",
    
    # Technical
    "In Python, you can define a function using the def keyword",
    "Machine learning models learn patterns from data by",
    "The transformer architecture uses self-attention to",
    
    # Abstract
    "The concept of infinity has puzzled philosophers because",
    "Democracy is often considered the best form of government",
    
    # Conversational
    "Hello! How are you doing today? I hope you're having",
    "Thank you for your help with this project. I really",
    
    # From our dataset (Paper #1 examples)
    "Functional programming emphasizes immutability and pure functions",
    "Object-oriented programming uses classes and inheritance for",
]

print(f"Using {len(TEST_PROMPTS)} test prompts (same as Pythia for comparison)")

## 4. Extract Layer-wise Embeddings

In [None]:
def extract_all_layer_embeddings(model, tokenizer, prompts, device="cuda"):
    """
    Extract embeddings from all layers for all prompts.
    
    Returns:
        Dict[layer_idx -> np.array of shape (n_tokens_total, hidden_dim)]
    """
    n_layers = model.config.num_hidden_layers
    layer_embeddings = {i: [] for i in range(n_layers + 1)}  # +1 for embedding layer
    
    with torch.no_grad():
        for prompt in tqdm(prompts, desc="Processing prompts"):
            inputs = tokenizer(prompt, return_tensors="pt").to(device)
            outputs = model(**inputs, output_hidden_states=True)
            
            # hidden_states: tuple of (n_layers + 1) tensors
            # Each tensor: (batch=1, seq_len, hidden_dim)
            hidden_states = outputs.hidden_states
            
            for layer_idx, hidden in enumerate(hidden_states):
                # Take all tokens, squeeze batch dimension
                emb = hidden.squeeze(0).cpu().float().numpy()  # (seq_len, hidden_dim)
                layer_embeddings[layer_idx].append(emb)
    
    # Concatenate all embeddings per layer
    for layer_idx in layer_embeddings:
        layer_embeddings[layer_idx] = np.vstack(layer_embeddings[layer_idx])
    
    return layer_embeddings

print("Extracting embeddings from all layers...")
layer_embeddings = extract_all_layer_embeddings(model, tokenizer, TEST_PROMPTS)

print(f"\nExtracted embeddings:")
for layer_idx in [0, n_layers // 2, n_layers]:
    print(f"  Layer {layer_idx}: {layer_embeddings[layer_idx].shape}")

## 5. Compute Anisotropy Metrics

In [None]:
def compute_anisotropy_metrics(embeddings):
    """
    Compute multiple anisotropy metrics for a set of embeddings.
    
    Args:
        embeddings: np.array of shape (n_samples, hidden_dim)
    
    Returns:
        dict with various anisotropy measures
    """
    # Center the data
    centered = embeddings - embeddings.mean(axis=0)
    
    # Compute covariance matrix
    n_samples = embeddings.shape[0]
    cov = (centered.T @ centered) / (n_samples - 1)
    
    # Eigenvalue decomposition
    eigenvalues = np.linalg.eigvalsh(cov)
    eigenvalues = np.sort(eigenvalues)[::-1]  # Descending order
    eigenvalues = np.maximum(eigenvalues, 1e-10)  # Numerical stability
    
    # Metric 1: Eigenvalue Variance
    eigenvalue_variance = np.var(eigenvalues)
    
    # Metric 2: Intrinsic Dimension Ratio (lambda_1 / sum(lambda_i))
    total_var = eigenvalues.sum()
    intrinsic_dim_ratio = eigenvalues[0] / total_var if total_var > 0 else 0
    
    # Metric 3: Effective Rank = exp(entropy)
    normalized = eigenvalues / total_var
    entropy = -np.sum(normalized * np.log(normalized + 1e-10))
    effective_rank = np.exp(entropy)
    
    # Metric 4: Average Cosine Similarity to Mean (isotropy score)
    mean_vec = embeddings.mean(axis=0)
    mean_norm = np.linalg.norm(mean_vec)
    if mean_norm > 1e-10:
        cos_sims = []
        for emb in embeddings:
            cos_sim = np.dot(emb, mean_vec) / (np.linalg.norm(emb) * mean_norm + 1e-10)
            cos_sims.append(cos_sim)
        avg_cos_sim = np.mean(cos_sims)
    else:
        avg_cos_sim = 0
    
    # Metric 5: Explained variance by top-k PCs
    cumsum = np.cumsum(eigenvalues) / total_var
    var_top1 = eigenvalues[0] / total_var
    var_top10 = cumsum[min(9, len(cumsum)-1)]
    var_top50 = cumsum[min(49, len(cumsum)-1)]
    
    # Metric 6: Norms (relevant for RMSNorm analysis)
    norms = np.linalg.norm(embeddings, axis=1)
    mean_norm_value = np.mean(norms)
    std_norm_value = np.std(norms)
    
    return {
        'eigenvalue_variance': eigenvalue_variance,
        'intrinsic_dim_ratio': intrinsic_dim_ratio,
        'effective_rank': effective_rank,
        'avg_cos_sim_to_mean': avg_cos_sim,
        'var_top1': var_top1,
        'var_top10': var_top10,
        'var_top50': var_top50,
        'mean_norm': mean_norm_value,
        'std_norm': std_norm_value,
        'norm_cv': std_norm_value / mean_norm_value if mean_norm_value > 0 else 0,  # Coefficient of variation
        'eigenvalues': eigenvalues[:100]  # Store top 100 for analysis
    }

print("Computing anisotropy metrics for each layer...")
layer_metrics = {}
for layer_idx in tqdm(range(n_layers + 1), desc="Layers"):
    layer_metrics[layer_idx] = compute_anisotropy_metrics(layer_embeddings[layer_idx])

print("\nDone!")

## 6. Plot Anisotropy Profile

In [None]:
# Extract metrics for plotting
layers = list(range(n_layers + 1))
eigenvalue_variance = [layer_metrics[l]['eigenvalue_variance'] for l in layers]
intrinsic_dim_ratio = [layer_metrics[l]['intrinsic_dim_ratio'] for l in layers]
effective_rank = [layer_metrics[l]['effective_rank'] for l in layers]
avg_cos_sim = [layer_metrics[l]['avg_cos_sim_to_mean'] for l in layers]
norm_cv = [layer_metrics[l]['norm_cv'] for l in layers]

# Find L* (maximum of intrinsic_dim_ratio = maximum anisotropy)
L_star = np.argmax(intrinsic_dim_ratio)
print(f"Detected L* (maximum anisotropy): Layer {L_star}")

In [None]:
# Main Plot: Anisotropy Profile
fig, axes = plt.subplots(2, 3, figsize=(18, 10))

# Plot 1: Intrinsic Dimension Ratio (Primary Anisotropy Measure)
ax1 = axes[0, 0]
ax1.plot(layers, intrinsic_dim_ratio, 'b-', linewidth=2, marker='o', markersize=4)
ax1.axvline(x=L_star, color='red', linestyle='--', linewidth=2, label=f'L* = {L_star}')
ax1.fill_between(layers, intrinsic_dim_ratio, alpha=0.3)
ax1.set_xlabel('Layer', fontsize=12)
ax1.set_ylabel('lambda_1 / sum(lambda_i)', fontsize=12)
ax1.set_title('Primary Anisotropy: Variance Concentration', fontsize=14)
ax1.legend(fontsize=11)
ax1.grid(True, alpha=0.3)

# Plot 2: Effective Rank (Inverse Anisotropy)
ax2 = axes[0, 1]
ax2.plot(layers, effective_rank, 'g-', linewidth=2, marker='s', markersize=4)
ax2.axvline(x=L_star, color='red', linestyle='--', linewidth=2, label=f'L* = {L_star}')
ax2.set_xlabel('Layer', fontsize=12)
ax2.set_ylabel('Effective Rank', fontsize=12)
ax2.set_title('Effective Rank (lower = more anisotropic)', fontsize=14)
ax2.legend(fontsize=11)
ax2.grid(True, alpha=0.3)

# Plot 3: Average Cosine Similarity to Mean
ax3 = axes[0, 2]
ax3.plot(layers, avg_cos_sim, 'm-', linewidth=2, marker='^', markersize=4)
ax3.axvline(x=L_star, color='red', linestyle='--', linewidth=2, label=f'L* = {L_star}')
ax3.set_xlabel('Layer', fontsize=12)
ax3.set_ylabel('Avg Cosine Sim to Mean', fontsize=12)
ax3.set_title('Directional Anisotropy', fontsize=14)
ax3.legend(fontsize=11)
ax3.grid(True, alpha=0.3)

# Plot 4: Eigenvalue Variance
ax4 = axes[1, 0]
ax4.semilogy(layers, eigenvalue_variance, 'r-', linewidth=2, marker='d', markersize=4)
ax4.axvline(x=L_star, color='red', linestyle='--', linewidth=2, label=f'L* = {L_star}')
ax4.set_xlabel('Layer', fontsize=12)
ax4.set_ylabel('Var(lambda_i) [log scale]', fontsize=12)
ax4.set_title('Eigenvalue Variance', fontsize=14)
ax4.legend(fontsize=11)
ax4.grid(True, alpha=0.3)

# Plot 5: Norm Coefficient of Variation (RMSNorm-specific)
ax5 = axes[1, 1]
ax5.plot(layers, norm_cv, 'c-', linewidth=2, marker='p', markersize=4)
ax5.axvline(x=L_star, color='red', linestyle='--', linewidth=2, label=f'L* = {L_star}')
ax5.set_xlabel('Layer', fontsize=12)
ax5.set_ylabel('Norm CV (std/mean)', fontsize=12)
ax5.set_title('Norm Coefficient of Variation\n(RMSNorm Effect)', fontsize=14)
ax5.legend(fontsize=11)
ax5.grid(True, alpha=0.3)

# Plot 6: Summary Comparison
ax6 = axes[1, 2]
# Normalize all metrics for overlay
def normalize(arr):
    arr = np.array(arr)
    return (arr - arr.min()) / (arr.max() - arr.min() + 1e-10)

ax6.plot(layers, normalize(intrinsic_dim_ratio), 'b-', linewidth=2, label='Intrinsic Dim Ratio', alpha=0.8)
ax6.plot(layers, 1 - normalize(effective_rank), 'g-', linewidth=2, label='1 - Effective Rank (norm)', alpha=0.8)
ax6.plot(layers, normalize(avg_cos_sim), 'm-', linewidth=2, label='Avg Cos Sim', alpha=0.8)
ax6.axvline(x=L_star, color='red', linestyle='--', linewidth=2, label=f'L* = {L_star}')
ax6.set_xlabel('Layer', fontsize=12)
ax6.set_ylabel('Normalized Value', fontsize=12)
ax6.set_title('All Metrics Normalized', fontsize=14)
ax6.legend(fontsize=9, loc='best')
ax6.grid(True, alpha=0.3)

plt.suptitle(f'{MODEL_NAME}: Anisotropy Profile\n(Gauge Theory Prediction: FLATTER than Pythia due to RMSNorm)', 
             fontsize=16, fontweight='bold', y=1.02)
plt.tight_layout()
plt.savefig('anisotropy_profile_gemma.png', dpi=150, bbox_inches='tight')
plt.show()

print(f"\n>>> Figure saved as 'anisotropy_profile_gemma.png'")

## 7. Bell Curve Analysis & Gauge Theory Test

In [None]:
# Check if profile matches Bell Curve prediction
def analyze_bell_curve(values, L_star):
    """
    Analyze if the values follow a Bell Curve pattern:
    - Rising before L*
    - Falling after L*
    """
    values = np.array(values)
    
    # Split into phases
    phase1 = values[:L_star]  # Before L*
    phase2 = values[L_star:]   # After L*
    
    # Trend analysis (linear regression slope)
    if len(phase1) > 1:
        slope1, _, r1, p1, _ = stats.linregress(range(len(phase1)), phase1)
    else:
        slope1, r1, p1 = 0, 0, 1
    
    if len(phase2) > 1:
        slope2, _, r2, p2, _ = stats.linregress(range(len(phase2)), phase2)
    else:
        slope2, r2, p2 = 0, 0, 1
    
    # Check Bell Curve pattern
    is_bell_curve = (slope1 > 0) and (slope2 < 0)
    
    # Measure "flatness" (variance of the profile)
    profile_variance = np.var(values)
    profile_range = np.max(values) - np.min(values)
    
    return {
        'is_bell_curve': is_bell_curve,
        'phase1_slope': slope1,
        'phase1_r': r1,
        'phase1_p': p1,
        'phase2_slope': slope2,
        'phase2_r': r2,
        'phase2_p': p2,
        'L_star': L_star,
        'max_value': values[L_star],
        'profile_variance': profile_variance,
        'profile_range': profile_range
    }

# Analyze main anisotropy metric
bell_analysis = analyze_bell_curve(intrinsic_dim_ratio, L_star)

print("="*60)
print("BELL CURVE ANALYSIS")
print("="*60)
print(f"\nDetected L* (maximum anisotropy): Layer {L_star}")
print(f"\nPhase 1 (layers 0-{L_star}, before L*):")
print(f"  Slope: {bell_analysis['phase1_slope']:.6f}")
print(f"  Direction: {'Rising' if bell_analysis['phase1_slope'] > 0 else 'Falling'}")
print(f"  R-value: {bell_analysis['phase1_r']:.4f}")
print(f"\nPhase 2 (layers {L_star}-{n_layers}, after L*):")
print(f"  Slope: {bell_analysis['phase2_slope']:.6f}")
print(f"  Direction: {'Rising' if bell_analysis['phase2_slope'] > 0 else 'Falling'}")
print(f"  R-value: {bell_analysis['phase2_r']:.4f}")
print(f"\n" + "="*60)
if bell_analysis['is_bell_curve']:
    print("Result: Bell Curve pattern detected")
else:
    print("Result: Pattern does NOT match Bell Curve")
    print(f"   Phase 1: {'Rising' if bell_analysis['phase1_slope'] > 0 else 'Falling'}")
    print(f"   Phase 2: {'Rising' if bell_analysis['phase2_slope'] > 0 else 'Falling'}")
print("="*60)

In [None]:
# Gauge Theory Test: Flatness Comparison
print("\n" + "="*60)
print("GAUGE THEORY TEST: Profile Flatness")
print("="*60)

print(f"\nPrediction (RMSNorm Gauge Fixing):")
print(f"  Gemma should have FLATTER profile than Pythia")
print(f"  Because RMSNorm constrains to hypersphere S^{{d-1}}")

print(f"\nMeasured Profile Statistics:")
print(f"  Profile Variance: {bell_analysis['profile_variance']:.6f}")
print(f"  Profile Range: {bell_analysis['profile_range']:.4f}")
print(f"  Max Anisotropy: {bell_analysis['max_value']:.4f}")

# Reference values from Pythia (hardcoded from previous experiment)
PYTHIA_REF = {
    'profile_range': 0.934,  # 0.994 - 0.060
    'max_anisotropy': 0.994,
    'L_star': 7
}

print(f"\nComparison with Pythia-6.9B (Reference):")
print(f"  Pythia Profile Range: {PYTHIA_REF['profile_range']:.4f}")
print(f"  Gemma Profile Range:  {bell_analysis['profile_range']:.4f}")

if bell_analysis['profile_range'] < PYTHIA_REF['profile_range']:
    flatness_ratio = (PYTHIA_REF['profile_range'] - bell_analysis['profile_range']) / PYTHIA_REF['profile_range'] * 100
    print(f"\n>>> GAUGE THEORY CONFIRMED: Gemma is {flatness_ratio:.1f}% flatter than Pythia")
else:
    print(f"\n>>> Unexpected: Gemma is NOT flatter than Pythia")
    print(f"    This may indicate RMSNorm effect is more subtle than predicted")

print("="*60)

## 8. Eigenvalue Spectrum Visualization

In [None]:
# Visualize eigenvalue spectrum at key layers
key_layers = [0, L_star // 2, L_star, (L_star + n_layers) // 2, n_layers]

fig, ax = plt.subplots(figsize=(12, 6))

colors = plt.cm.viridis(np.linspace(0, 1, len(key_layers)))

for idx, layer in enumerate(key_layers):
    eigenvalues = layer_metrics[layer]['eigenvalues']
    normalized_eig = eigenvalues / eigenvalues.sum()
    ax.semilogy(range(len(normalized_eig)), normalized_eig, 
                label=f'Layer {layer}', color=colors[idx], linewidth=2)

ax.set_xlabel('Eigenvalue Index', fontsize=12)
ax.set_ylabel('Normalized Eigenvalue (log scale)', fontsize=12)
ax.set_title(f'{MODEL_NAME}: Eigenvalue Spectrum at Key Layers\n(L* = {L_star})', fontsize=14)
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('eigenvalue_spectrum_gemma.png', dpi=150, bbox_inches='tight')
plt.show()

print(">>> Figure saved as 'eigenvalue_spectrum_gemma.png'")

## 9. Norm Distribution (RMSNorm-Specific Analysis)

In [None]:
# Analyze how RMSNorm affects embedding norms across layers
mean_norms = [layer_metrics[l]['mean_norm'] for l in layers]
std_norms = [layer_metrics[l]['std_norm'] for l in layers]

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Mean Norm per Layer
ax1 = axes[0]
ax1.plot(layers, mean_norms, 'b-', linewidth=2, marker='o', markersize=4)
ax1.fill_between(layers, 
                  [m - s for m, s in zip(mean_norms, std_norms)],
                  [m + s for m, s in zip(mean_norms, std_norms)],
                  alpha=0.3)
ax1.axvline(x=L_star, color='red', linestyle='--', linewidth=2, label=f'L* = {L_star}')
ax1.set_xlabel('Layer', fontsize=12)
ax1.set_ylabel('Embedding Norm', fontsize=12)
ax1.set_title('Mean Embedding Norm (+/- 1 std)', fontsize=14)
ax1.legend(fontsize=11)
ax1.grid(True, alpha=0.3)

# Plot 2: Norm Coefficient of Variation
ax2 = axes[1]
ax2.plot(layers, norm_cv, 'g-', linewidth=2, marker='s', markersize=4)
ax2.axvline(x=L_star, color='red', linestyle='--', linewidth=2, label=f'L* = {L_star}')
ax2.axhline(y=0, color='gray', linestyle=':', alpha=0.5)
ax2.set_xlabel('Layer', fontsize=12)
ax2.set_ylabel('CV = std/mean', fontsize=12)
ax2.set_title('Norm Coefficient of Variation\n(low = more uniform norms, RMSNorm effect)', fontsize=14)
ax2.legend(fontsize=11)
ax2.grid(True, alpha=0.3)

plt.suptitle(f'{MODEL_NAME}: RMSNorm Effect on Embedding Norms', fontsize=16, fontweight='bold', y=1.02)
plt.tight_layout()
plt.savefig('norm_analysis_gemma.png', dpi=150, bbox_inches='tight')
plt.show()

print(">>> Figure saved as 'norm_analysis_gemma.png'")

## 10. Summary and Export

In [None]:
import json

# Prepare summary - ensure all numpy types are converted to Python types
summary = {
    'model': MODEL_NAME,
    'n_layers': int(n_layers),
    'hidden_dim': int(hidden_dim),
    'n_prompts': len(TEST_PROMPTS),
    'normalization': 'RMSNorm',
    'L_star_anisotropy': int(L_star),
    'is_bell_curve': bool(bell_analysis['is_bell_curve']),
    'phase1_slope': float(bell_analysis['phase1_slope']),
    'phase2_slope': float(bell_analysis['phase2_slope']),
    'phase1_r': float(bell_analysis['phase1_r']),
    'phase2_r': float(bell_analysis['phase2_r']),
    'profile_variance': float(bell_analysis['profile_variance']),
    'profile_range': float(bell_analysis['profile_range']),
    'max_anisotropy': float(bell_analysis['max_value']),
    'intrinsic_dim_ratio': [float(x) for x in intrinsic_dim_ratio],
    'effective_rank': [float(x) for x in effective_rank],
    'avg_cos_sim': [float(x) for x in avg_cos_sim],
    'norm_cv': [float(x) for x in norm_cv],
    'gauge_theory_test': {
        'pythia_profile_range': PYTHIA_REF['profile_range'],
        'gemma_profile_range': float(bell_analysis['profile_range']),
        'is_flatter': bool(bell_analysis['profile_range'] < PYTHIA_REF['profile_range'])
    }
}

# Save to JSON
with open('anisotropy_results_gemma.json', 'w') as f:
    json.dump(summary, f, indent=2)

print("\n" + "="*60)
print("SUMMARY")
print("="*60)
print(f"\nModel: {MODEL_NAME}")
print(f"Layers: {n_layers}")
print(f"Hidden dim: {hidden_dim}")
print(f"Normalization: RMSNorm")
print(f"\nResults:")
print(f"  L* (anisotropy max): Layer {L_star}")
print(f"  Bell Curve: {'Confirmed' if bell_analysis['is_bell_curve'] else 'Not confirmed'}")
print(f"  Profile Range: {bell_analysis['profile_range']:.4f}")
print(f"  Max Anisotropy: {bell_analysis['max_value']:.4f}")
print(f"\nGauge Theory Test:")
print(f"  Flatter than Pythia: {'YES' if summary['gauge_theory_test']['is_flatter'] else 'NO'}")
print(f"\nFiles saved:")
print(f"  - anisotropy_profile_gemma.png")
print(f"  - eigenvalue_spectrum_gemma.png")
print(f"  - norm_analysis_gemma.png")
print(f"  - anisotropy_results_gemma.json")
print("="*60)

In [None]:
# Create ZIP archive with all results
import zipfile
from datetime import datetime

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
zip_filename = f"anisotropy_results_gemma_{timestamp}.zip"

with zipfile.ZipFile(zip_filename, 'w') as zipf:
    zipf.write('anisotropy_profile_gemma.png')
    zipf.write('eigenvalue_spectrum_gemma.png')
    zipf.write('norm_analysis_gemma.png')
    zipf.write('anisotropy_results_gemma.json')

print(f">>> Created: {zip_filename}")
print(f"  Contents: 3 PNG figures + 1 JSON data file")

## 11. Interpretation

### Gauge Theory Prediction

From Gemini Deep Research:

> "Normalization layers act as **gauge fixing** operations."
>
> | Architecture | Norm | Effect |
> |-------------|------|--------|
> | Pythia | LayerNorm | Radial freedom preserved -> clear inversion |
> | Gemma | RMSNorm | Spherical geometry enforced -> subtle/no inversion |

### Why RMSNorm Flattens the Profile

1. **RMSNorm Formula:** $\text{RMSNorm}(x) = \frac{x}{\sqrt{\frac{1}{d}\sum_i x_i^2}}$

2. **Effect:** Projects all embeddings onto hypersphere $S^{d-1}$

3. **Consequence:** Only angular diffusion possible (Connection Laplacian $\Delta_\nabla$)

4. **Result:** The compression/expansion dynamics are constrained, leading to a flatter anisotropy profile

### Multi-Model Comparison Summary

| Model | Normalization | Profile Shape | L* | Interpretation |
|-------|---------------|---------------|----|-----------------|
| Pythia-6.9B | LayerNorm | Strong Bell Curve | 7 | Clear phase transitions |
| Gemma-2B | RMSNorm | Flatter | TBD | Gauge-fixed dynamics |

### Implications for Paper #3

If Gemma shows a flatter profile:
1. Validates Gauge Theory interpretation
2. Explains why Paper #2 saw "no clear inversion" in Gemma
3. Suggests normalization choice affects sheaf topology

## 12. Download Results

In [None]:
# Download all results
from google.colab import files

print("Downloading result files...")
print()

# Download ZIP (easiest - single file with everything)
print(f"1. ZIP Archive: {zip_filename}")
files.download(zip_filename)

# Also offer individual files
print("\n2. Individual files:")
print("   - anisotropy_profile_gemma.png")
files.download('anisotropy_profile_gemma.png')
print("   - eigenvalue_spectrum_gemma.png")
files.download('eigenvalue_spectrum_gemma.png')
print("   - norm_analysis_gemma.png")
files.download('norm_analysis_gemma.png')
print("   - anisotropy_results_gemma.json")
files.download('anisotropy_results_gemma.json')

print("\n>>> All files downloaded!")
print(f"\nTIP: The ZIP file ({zip_filename}) contains all results in one download.")