# E11-T-Indra-LLaMA2-V3: State-Dependency on MHA (Standard-Compliant)

**Paper 4: Behavioral Sink Dynamics**

## Purpose: A2 Claim → A+ Tier (2nd Architecture)

A2 (Indra State-Dependency) has evidence from 1 architecture (GQA).
This notebook tests MHA to upgrade A2 from B-Tier to A-Tier.

## Methodology (E11-v3 Standard)

| Standard | Implementation |
|----------|----------------|
| Seeds | 42, 123, 456 (3-seed aggregation) |
| Noise Injection | **PRE-ATTENTION** (affects attention weights) |
| SI Measurement | **GLOBAL + LOCAL** (region-isolated) |
| Attention Mask | **YES** (excludes padding from entropy) |
| Chat Template | **YES** for Instruct model |
| dtype | **bfloat16** (stable on A100) |
| Prompts | Standard-10 v3 |

## The LLaMA-2 Paradox (from E11-X)

| Metric | LLaMA-3.1 (GQA) | LLaMA-2 (MHA) |
|--------|-----------------|---------------|
| Base SI | 0.7134 (HIGH) | 0.2149 (LOW) |
| Instruct SI | 0.3115 (LOW) | 0.2642 (HIGH) |
| RLHF Effect | COLLAPSES (-56%) | HEALS (+23%) |

**LLaMA-2 is "born collapsed" but RLHF+SFT heals it!**

## State-Dependency Hypothesis

| Model | Initial State | Expected Indra Effect |
|-------|---------------|----------------------|
| LLaMA-2 BASE | COLLAPSED (SI=0.21) | HEAL (+SI) |
| LLaMA-2 INSTRUCT | HEALTHY (SI=0.26) | DAMAGE (-SI) |

---

In [None]:
# Cell 1: Setup + Seeds (E11-v3 STANDARD)
!pip install -q transformers torch accelerate scipy matplotlib seaborn huggingface_hub

import torch
import numpy as np
import random
import math
import matplotlib.pyplot as plt
import seaborn as sns
from transformers import AutoModelForCausalLM, AutoTokenizer
from scipy.stats import entropy as scipy_entropy
import json
import warnings
warnings.filterwarnings('ignore')

import os
from pathlib import Path
from datetime import datetime

# === REPRODUCIBILITY SEEDS (E11-v3 STANDARD) ===
SEEDS = [42, 123, 456]
PRIMARY_SEED = 42

def set_seed(seed):
    os.environ['PYTHONHASHSEED'] = str(seed)
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = False

set_seed(PRIMARY_SEED)

TIMESTAMP = datetime.now().strftime('%Y%m%d_%H%M%S')
Path('../results').mkdir(parents=True, exist_ok=True)
Path('../figures').mkdir(parents=True, exist_ok=True)

print(f"E11-T-Indra-LLaMA2-V3 (Standard-Compliant)")
print(f"Timestamp: {TIMESTAMP}")
print(f"Seeds: {SEEDS}")
print(f"PyTorch: {torch.__version__}")
print(f"CUDA: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

# HF Login for gated LLaMA models
try:
    from google.colab import userdata
    from huggingface_hub import login
    hf_token = userdata.get('HF_TOKEN')
    if hf_token:
        login(token=hf_token)
        print("HF Login: SUCCESS")
    else:
        print("WARNING: No HF_TOKEN - LLaMA requires auth!")
except:
    print("Not in Colab - ensure HF_TOKEN set")

In [None]:
# Cell 2: Configuration (E11-v3 STANDARD)

# Model Configuration
MODEL_CONFIGS = {
    'base': {
        'name': 'meta-llama/Llama-2-7b-hf',
        'display': 'LLaMA-2-7B-Base',
        'state': 'COLLAPSED',
        'expected_effect': 'HEAL',
        'use_chat_template': False
    },
    'instruct': {
        'name': 'meta-llama/Llama-2-7b-chat-hf',
        'display': 'LLaMA-2-7B-Chat',
        'state': 'HEALTHY',
        'expected_effect': 'DAMAGE',
        'use_chat_template': True  # CRITICAL: use chat template!
    }
}

# E11-X Reference Values
E11X_REFERENCE = {
    'base_si': 0.2149,
    'instruct_si': 0.2642,
    'base_corr': 0.7851,
    'instruct_corr': 0.7358,
    'delta_si': 0.0493,
    'alignment': 'RLHF+SFT'
}

# E11-T-Indra Reference (GQA) for comparison
E11T_GQA_REFERENCE = {
    'model': 'LLaMA-3.1-8B',
    'collapsed_heal': 28.6,  # % SI increase
    'healthy_damage': -30.5,  # % SI decrease
    'gap_pp': 59.1
}

# Noise levels
NOISE_LEVELS = [0.0, 0.01, 0.02, 0.05, 0.1, 0.2]

# Tokenization
MAX_LENGTH = 128

# Standard-10 v3 Prompts (MD5: 715065bab181f46bf12ed471951141e2)
STANDARD_PROMPTS = [
    "What is the capital of France and what is its population?",
    "If all roses are flowers and some flowers fade quickly, can we conclude that some roses fade quickly? Explain step by step.",
    "Calculate 47 multiplied by 23 and show your work.",
    "Translate the following to German: 'The quick brown fox jumps over the lazy dog'.",
    "Write a Python function that checks if a number is prime.",
    "Summarize the main points: Machine learning is a subset of artificial intelligence that enables systems to learn from data. It uses algorithms to identify patterns and make decisions with minimal human intervention.",
    "Statement A: 'All birds can fly.' Statement B: 'Penguins are birds that cannot fly.' Are these statements contradictory? Explain.",
    "What are the safety considerations when using a kitchen knife?",
    "Write a haiku about artificial intelligence.",
    "Complete this sentence in a helpful way: 'The best approach to solving complex problems is'",
]

print(f"\nConfiguration:")
print(f"  Seeds: {SEEDS}")
print(f"  Noise levels: {NOISE_LEVELS}")
print(f"  MAX_LENGTH: {MAX_LENGTH}")
print(f"  Prompts: Standard-10 v3")
print(f"\nModels:")
for key, cfg in MODEL_CONFIGS.items():
    print(f"  {key}: {cfg['display']}")
    print(f"         State: {cfg['state']}, Expected: {cfg['expected_effect']}")
    print(f"         Chat Template: {cfg['use_chat_template']}")

In [None]:
# Cell 3: Specialization Metrics (E11-v3 STANDARD - WITH MASK)

def extract_head_activations(model, tokenizer, prompts, max_length=128, use_chat_template=False):
    """
    Extract per-head attention patterns WITH attention masks.
    
    Args:
        use_chat_template: If True, apply chat template (for Instruct models)
    """
    all_attention_patterns = []
    all_attention_masks = []
    
    for prompt in prompts:
        # === CHAT TEMPLATE (E11-v3 STANDARD) ===
        if use_chat_template and hasattr(tokenizer, 'apply_chat_template'):
            try:
                messages = [{"role": "user", "content": prompt}]
                formatted = tokenizer.apply_chat_template(
                    messages, 
                    tokenize=False, 
                    add_generation_prompt=True
                )
            except:
                # Fallback for LLaMA-2 format
                formatted = f"[INST] {prompt} [/INST]"
        else:
            formatted = prompt
        
        inputs = tokenizer(
            formatted, 
            return_tensors='pt',
            max_length=max_length,
            truncation=True,
            padding='max_length'
        ).to(model.device)
        
        with torch.no_grad():
            outputs = model(**inputs, output_attentions=True)
        
        attn_stack = torch.stack([a.squeeze(0) for a in outputs.attentions], dim=0)
        all_attention_patterns.append(attn_stack.cpu())
        all_attention_masks.append(inputs['attention_mask'].squeeze(0).cpu())
    
    return {
        'attention_patterns': all_attention_patterns,
        'attention_masks': all_attention_masks,
        'num_layers': len(outputs.attentions),
        'num_heads': outputs.attentions[0].shape[1]
    }


def compute_head_entropy_profiles(attention_patterns, attention_masks):
    """
    Compute normalized entropy WITH attention mask (E11-v3 STANDARD).
    Excludes padding tokens from entropy calculation.
    """
    num_prompts = len(attention_patterns)
    num_layers = attention_patterns[0].shape[0]
    num_heads = attention_patterns[0].shape[1]
    
    all_entropies = np.zeros((num_prompts, num_layers, num_heads))
    
    for p_idx, attn in enumerate(attention_patterns):
        mask = attention_masks[p_idx].numpy()
        valid_len = int(mask.sum())
        
        for layer in range(num_layers):
            for head in range(num_heads):
                attn_weights = attn[layer, head].float().cpu().numpy()
                
                # === MASK APPLICATION (E11-v3 STANDARD) ===
                if valid_len > 0:
                    # Only use valid (non-padded) tokens
                    attn_weights = attn_weights[:valid_len, :valid_len]
                    attn_weights = attn_weights.mean(axis=0)
                else:
                    attn_weights = attn_weights.mean(axis=0)
                
                attn_weights = attn_weights / (attn_weights.sum() + 1e-10)
                attn_weights = attn_weights[attn_weights > 0]
                
                if len(attn_weights) > 1:
                    h = scipy_entropy(attn_weights, base=2)
                    h_max = np.log2(len(attn_weights))
                    h_norm = h / h_max if h_max > 0 else 0
                else:
                    h_norm = 0
                
                all_entropies[p_idx, layer, head] = h_norm
    
    return all_entropies.mean(axis=0)  # (num_layers, num_heads)


def compute_si_global(head_entropies):
    """Compute GLOBAL SI (all layers)."""
    num_layers, num_heads = head_entropies.shape
    
    head_profiles = head_entropies.T  # (num_heads, num_layers)
    head_corr_matrix = np.corrcoef(head_profiles)
    upper_tri = head_corr_matrix[np.triu_indices(num_heads, k=1)]
    mean_head_correlation = float(np.nanmean(upper_tri))
    
    return {
        'specialization_index': 1.0 - mean_head_correlation,
        'mean_head_correlation': mean_head_correlation,
        'method': 'GLOBAL'
    }


def compute_si_local(head_entropies, layer_start, layer_end):
    """Compute REGION-LOCAL SI (target layers only)."""
    local_entropies = head_entropies[layer_start:layer_end, :]
    local_layers, num_heads = local_entropies.shape
    
    if local_layers < 2:
        return {
            'specialization_index': 0.0,
            'mean_head_correlation': 1.0,
            'method': 'LOCAL',
            'layer_range': [layer_start, layer_end]
        }
    
    head_profiles = local_entropies.T
    head_corr_matrix = np.corrcoef(head_profiles)
    upper_tri = head_corr_matrix[np.triu_indices(num_heads, k=1)]
    mean_head_correlation = float(np.nanmean(upper_tri))
    
    return {
        'specialization_index': 1.0 - mean_head_correlation,
        'mean_head_correlation': mean_head_correlation,
        'method': 'LOCAL',
        'layer_range': [layer_start, layer_end]
    }

print("Specialization metrics loaded (E11-v3 Standard).")
print("  - Attention mask: YES")
print("  - Chat template: Configurable")
print("  - SI methods: GLOBAL + LOCAL")

In [None]:
# Cell 4: PRE-ATTENTION Noise Injector (E11-v3 STANDARD)

class PreAttentionNoiseInjector:
    """
    E11-v3 STANDARD: Inject noise BEFORE attention computation.
    
    This affects the attention weights (SI measurement target).
    Post-attention injection does NOT affect attention weights!
    """
    
    def __init__(self, model, target_range, noise_std=0.0):
        self.model = model
        self.target_start, self.target_end = target_range
        self.noise_std = noise_std
        self.hooks = []
    
    def _make_pre_hook(self, layer_idx):
        """Create forward PRE-hook (before attention)."""
        def hook(module, args):
            if self.noise_std > 0 and self.target_start <= layer_idx < self.target_end:
                hidden_states = args[0]
                noise = torch.randn_like(hidden_states) * self.noise_std
                noisy_hidden_states = hidden_states + noise
                return (noisy_hidden_states,) + args[1:]
            return args
        return hook
    
    def attach(self):
        """Attach PRE-hooks to transformer layers."""
        for idx, layer in enumerate(self.model.model.layers):
            hook = layer.register_forward_pre_hook(self._make_pre_hook(idx))
            self.hooks.append(hook)
    
    def detach(self):
        """Remove all hooks."""
        for hook in self.hooks:
            hook.remove()
        self.hooks = []

print("PRE-Attention noise injector loaded (E11-v3 Standard).")
print("  - Injection point: BEFORE attention (affects weights)")
print("  - NOT post-attention (which doesn't affect SI)")

In [None]:
# Cell 5: Single Model Test Function

def run_indra_test_single_model(model_key, model_config, layer_ranges, noise_levels, prompts, seeds):
    """
    Run full Indra test on a single model with multi-seed aggregation.
    """
    print(f"\n{'='*70}")
    print(f"TESTING: {model_config['display']}")
    print(f"State: {model_config['state']}, Expected: {model_config['expected_effect']}")
    print(f"Chat Template: {model_config['use_chat_template']}")
    print(f"{'='*70}")
    
    # Load model with bfloat16 (E11-v3 STANDARD)
    print(f"\nLoading: {model_config['name']}")
    tokenizer = AutoTokenizer.from_pretrained(model_config['name'])
    model = AutoModelForCausalLM.from_pretrained(
        model_config['name'],
        torch_dtype=torch.bfloat16,  # E11-v3 STANDARD: bf16 not fp16
        device_map='auto',
        trust_remote_code=True,
        attn_implementation="eager"  # CRITICAL: SDPA doesn't return attentions!
    )
    model.eval()
    
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
    
    # Architecture info
    config = model.config
    num_layers = config.num_hidden_layers
    num_heads = config.num_attention_heads
    num_kv_heads = getattr(config, 'num_key_value_heads', num_heads)
    hidden_size = config.hidden_size
    d_head = hidden_size // num_heads
    
    # Dual-rho (E11-v3 STANDARD)
    rho_head = num_heads / math.sqrt(hidden_size)
    rho_kv = num_kv_heads / num_layers
    
    if num_kv_heads == num_heads:
        arch_type = "MHA"
    elif num_kv_heads == 1:
        arch_type = "MQA"
    else:
        arch_type = f"GQA ({num_heads}:{num_kv_heads})"
    
    print(f"  Architecture: {arch_type}")
    print(f"  Layers: {num_layers}, Heads: {num_heads}, d_head: {d_head}")
    print(f"  rho_head: {rho_head:.4f}, rho_kv: {rho_kv:.4f}")
    
    # Layer ranges
    third = num_layers // 3
    local_layer_ranges = {
        'early': (0, third),
        'middle': (third, 2*third),
        'late': (2*third, num_layers),
        'all': (0, num_layers)
    }
    
    # Baseline measurement
    print(f"\n  Measuring baseline (no noise)...")
    set_seed(PRIMARY_SEED)
    baseline_act = extract_head_activations(
        model, tokenizer, prompts, MAX_LENGTH, 
        use_chat_template=model_config['use_chat_template']
    )
    baseline_ent = compute_head_entropy_profiles(
        baseline_act['attention_patterns'],
        baseline_act['attention_masks']
    )
    baseline_global = compute_si_global(baseline_ent)
    
    # Baseline local SI for each region
    baseline_local = {}
    for region, (start, end) in local_layer_ranges.items():
        baseline_local[region] = compute_si_local(baseline_ent, start, end)
    
    print(f"  Baseline Global SI: {baseline_global['specialization_index']:.4f}")
    print(f"  Baseline Correlation: {baseline_global['mean_head_correlation']:.4f}")
    
    # Multi-seed treatment loop
    all_seed_results = {}
    
    for seed in seeds:
        print(f"\n  Seed {seed}:")
        seed_results = {'global': [], 'local': []}
        
        for region_name, (start, end) in local_layer_ranges.items():
            region_global = {'region': region_name, 'tests': []}
            region_local = {'region': region_name, 'tests': []}
            
            for noise_std in noise_levels:
                set_seed(seed)
                
                injector = PreAttentionNoiseInjector(model, (start, end), noise_std)
                injector.attach()
                
                treated_act = extract_head_activations(
                    model, tokenizer, prompts, MAX_LENGTH,
                    use_chat_template=model_config['use_chat_template']
                )
                treated_ent = compute_head_entropy_profiles(
                    treated_act['attention_patterns'],
                    treated_act['attention_masks']
                )
                
                injector.detach()
                
                # Global SI
                treated_global = compute_si_global(treated_ent)
                si_before = baseline_global['specialization_index']
                si_after = treated_global['specialization_index']
                change_pct = ((si_after - si_before) / si_before * 100) if si_before > 0 else 0
                
                region_global['tests'].append({
                    'noise': noise_std,
                    'si': si_after,
                    'change_pct': change_pct
                })
                
                # Local SI
                treated_local = compute_si_local(treated_ent, start, end)
                si_before_local = baseline_local[region_name]['specialization_index']
                si_after_local = treated_local['specialization_index']
                change_pct_local = ((si_after_local - si_before_local) / si_before_local * 100) if si_before_local > 0 else 0
                
                region_local['tests'].append({
                    'noise': noise_std,
                    'si': si_after_local,
                    'change_pct': change_pct_local
                })
            
            # Best effect for this region
            if model_config['expected_effect'] == 'HEAL':
                region_global['best'] = max(region_global['tests'], key=lambda x: x['change_pct'])
                region_local['best'] = max(region_local['tests'], key=lambda x: x['change_pct'])
            else:
                region_global['best'] = min(region_global['tests'], key=lambda x: x['change_pct'])
                region_local['best'] = min(region_local['tests'], key=lambda x: x['change_pct'])
            
            seed_results['global'].append(region_global)
            seed_results['local'].append(region_local)
            
            if seed == PRIMARY_SEED:
                print(f"    {region_name}: Global={region_global['best']['change_pct']:+.1f}%, Local={region_local['best']['change_pct']:+.1f}%")
        
        all_seed_results[seed] = seed_results
    
    # Aggregate across seeds
    aggregated = {'global': {}, 'local': {}}
    for si_type in ['global', 'local']:
        for region_name in local_layer_ranges.keys():
            values = []
            for seed in seeds:
                region_data = next(r for r in all_seed_results[seed][si_type] if r['region'] == region_name)
                values.append(region_data['best']['change_pct'])
            aggregated[si_type][region_name] = {
                'mean': float(np.mean(values)),
                'std': float(np.std(values)),
                'values': values
            }
    
    # Cleanup
    del model
    torch.cuda.empty_cache()
    
    return {
        'model_key': model_key,
        'model_name': model_config['name'],
        'state': model_config['state'],
        'expected_effect': model_config['expected_effect'],
        'architecture': arch_type,
        'rho_head': rho_head,
        'rho_kv': rho_kv,
        'num_layers': num_layers,
        'baseline_global': baseline_global,
        'baseline_local': {k: v for k, v in baseline_local.items()},
        'all_seed_results': all_seed_results,
        'aggregated': aggregated,
        'layer_ranges': local_layer_ranges
    }

print("Test function loaded.")

In [None]:
# Cell 6: Run Tests on Both Models

print(f"\n{'#'*70}")
print(f"# E11-T-Indra-LLaMA2-V3: DUAL MODEL TEST")
print(f"# MHA State-Dependency (E11-v3 Standard)")
print(f"{'#'*70}")

all_results = {}

for model_key, model_config in MODEL_CONFIGS.items():
    results = run_indra_test_single_model(
        model_key=model_key,
        model_config=model_config,
        layer_ranges=None,  # Computed inside
        noise_levels=NOISE_LEVELS,
        prompts=STANDARD_PROMPTS,
        seeds=SEEDS
    )
    all_results[model_key] = results

print(f"\n{'='*70}")
print("BOTH MODELS TESTED!")
print(f"{'='*70}")

In [None]:
# Cell 7: State-Dependency Verdict

print(f"\n{'='*70}")
print(f"STATE-DEPENDENCY ANALYSIS (MHA)")
print(f"{'='*70}")

# Thresholds
HEAL_THRESHOLD = 5.0
DAMAGE_THRESHOLD = -5.0

# Analyze BASE (collapsed, expected HEAL)
base = all_results['base']
base_best_global = max(base['aggregated']['global'].values(), key=lambda x: x['mean'])
base_best_region = [k for k, v in base['aggregated']['global'].items() if v['mean'] == base_best_global['mean']][0]
base_effect = base_best_global['mean']

print(f"\n[1] LLaMA-2-BASE (COLLAPSED)")
print(f"    Baseline SI: {base['baseline_global']['specialization_index']:.4f}")
print(f"    Expected: HEAL (+SI)")
print(f"    Best Effect: {base_effect:+.1f}% at {base_best_region}")

if base_effect > HEAL_THRESHOLD:
    base_verdict = "HEALED"
elif base_effect < DAMAGE_THRESHOLD:
    base_verdict = "UNEXPECTED_DAMAGE"
else:
    base_verdict = "NO_EFFECT"
print(f"    Verdict: {base_verdict}")

# Analyze INSTRUCT (healthy, expected DAMAGE)
inst = all_results['instruct']
inst_worst_global = min(inst['aggregated']['global'].values(), key=lambda x: x['mean'])
inst_worst_region = [k for k, v in inst['aggregated']['global'].items() if v['mean'] == inst_worst_global['mean']][0]
inst_effect = inst_worst_global['mean']

print(f"\n[2] LLaMA-2-INSTRUCT (HEALTHY)")
print(f"    Baseline SI: {inst['baseline_global']['specialization_index']:.4f}")
print(f"    Expected: DAMAGE (-SI)")
print(f"    Worst Effect: {inst_effect:+.1f}% at {inst_worst_region}")

if inst_effect < DAMAGE_THRESHOLD:
    inst_verdict = "DAMAGED"
elif inst_effect > HEAL_THRESHOLD:
    inst_verdict = "UNEXPECTED_HEAL"
else:
    inst_verdict = "NO_EFFECT"
print(f"    Verdict: {inst_verdict}")

# Gap calculation
gap = base_effect - inst_effect

print(f"\n{'='*70}")
print("CROSS-ARCHITECTURE COMPARISON")
print(f"{'='*70}")
print(f"\n{'Architecture':<15} {'Collapsed':<15} {'Healthy':<15} {'Gap':<10}")
print("-"*55)
print(f"{'GQA (3.1)':<15} {'+28.6%':<15} {'-30.5%':<15} {'59.1pp':<10}")
print(f"{'MHA (2)':<15} {base_effect:+.1f}%{'':>10} {inst_effect:+.1f}%{'':>10} {gap:.1f}pp")

# Final A2 verdict
print(f"\n{'='*70}")
print("A2 STATE-DEPENDENCY VERDICT")
print(f"{'='*70}")

if base_verdict == "HEALED" and inst_verdict == "DAMAGED":
    a2_verdict = "A_CONFIRMED"
    print(f"\n  VERDICT: {a2_verdict}")
    print(f"  State-dependency CONFIRMED on MHA!")
    print(f"  A2: GQA + MHA = 2 Architectures → A+ Tier")
elif base_verdict == "HEALED" or inst_verdict == "DAMAGED":
    a2_verdict = "B_PARTIAL"
    print(f"\n  VERDICT: {a2_verdict}")
    print(f"  Partial state-dependency on MHA.")
else:
    a2_verdict = "C_REFUTED"
    print(f"\n  VERDICT: {a2_verdict}")
    print(f"  State-dependency NOT confirmed on MHA.")

# Store verdict
verdict = {
    'base_verdict': base_verdict,
    'base_effect': base_effect,
    'base_region': base_best_region,
    'inst_verdict': inst_verdict,
    'inst_effect': inst_effect,
    'inst_region': inst_worst_region,
    'gap_pp': gap,
    'a2_verdict': a2_verdict,
    'comparison': {
        'gqa_collapsed': 28.6,
        'gqa_healthy': -30.5,
        'gqa_gap': 59.1,
        'mha_collapsed': base_effect,
        'mha_healthy': inst_effect,
        'mha_gap': gap
    }
}

In [None]:
# Cell 8: Visualization

fig, axes = plt.subplots(2, 2, figsize=(16, 14))

colors = {'early': '#3498db', 'middle': '#2ecc71', 'late': '#e74c3c', 'all': '#9b59b6'}

# Plot 1: BASE (Collapsed) - Expected HEAL
ax1 = axes[0, 0]
regions = list(base['aggregated']['global'].keys())
base_means = [base['aggregated']['global'][r]['mean'] for r in regions]
base_stds = [base['aggregated']['global'][r]['std'] for r in regions]

bars = ax1.bar(regions, base_means, yerr=base_stds, color=[colors[r] for r in regions], alpha=0.8, capsize=5)
ax1.axhline(y=0, color='black', linestyle='-')
ax1.axhline(y=5, color='green', linestyle=':', alpha=0.7, label='+5% threshold')
ax1.axhline(y=-5, color='red', linestyle=':', alpha=0.7, label='-5% threshold')
ax1.set_ylabel('SI Change %')
ax1.set_title(f'LLaMA-2 BASE (COLLAPSED)\nExpected: HEAL | Verdict: {base_verdict}')
ax1.legend()

for bar, val in zip(bars, base_means):
    ax1.annotate(f'{val:+.1f}%', xy=(bar.get_x() + bar.get_width()/2, val),
                 xytext=(0, 5 if val > 0 else -12), textcoords='offset points',
                 ha='center', fontweight='bold')

# Plot 2: INSTRUCT (Healthy) - Expected DAMAGE
ax2 = axes[0, 1]
inst_means = [inst['aggregated']['global'][r]['mean'] for r in regions]
inst_stds = [inst['aggregated']['global'][r]['std'] for r in regions]

bars = ax2.bar(regions, inst_means, yerr=inst_stds, color=[colors[r] for r in regions], alpha=0.8, capsize=5)
ax2.axhline(y=0, color='black', linestyle='-')
ax2.axhline(y=5, color='green', linestyle=':', alpha=0.7)
ax2.axhline(y=-5, color='red', linestyle=':', alpha=0.7)
ax2.set_ylabel('SI Change %')
ax2.set_title(f'LLaMA-2 INSTRUCT (HEALTHY)\nExpected: DAMAGE | Verdict: {inst_verdict}')

for bar, val in zip(bars, inst_means):
    ax2.annotate(f'{val:+.1f}%', xy=(bar.get_x() + bar.get_width()/2, val),
                 xytext=(0, 5 if val > 0 else -12), textcoords='offset points',
                 ha='center', fontweight='bold')

# Plot 3: State-Dependency Comparison
ax3 = axes[1, 0]
models = ['BASE\n(Collapsed)', 'INSTRUCT\n(Healthy)']
effects = [base_effect, inst_effect]
bar_colors = ['#2ecc71' if base_effect > 0 else '#e74c3c',
              '#e74c3c' if inst_effect < 0 else '#2ecc71']

bars = ax3.bar(models, effects, color=bar_colors, alpha=0.8, edgecolor='black', linewidth=2)
ax3.axhline(y=0, color='black', linestyle='-', linewidth=2)
ax3.set_ylabel('Best/Worst SI Change %')
ax3.set_title(f'State-Dependency: Gap = {gap:.1f}pp\nA2 Verdict: {a2_verdict}')

for bar, eff in zip(bars, effects):
    ax3.annotate(f'{eff:+.1f}%', xy=(bar.get_x() + bar.get_width()/2, eff),
                 xytext=(0, 10 if eff > 0 else -20), textcoords='offset points',
                 ha='center', fontsize=14, fontweight='bold')

# Plot 4: Cross-Architecture Comparison
ax4 = axes[1, 1]
archs = ['GQA\n(LLaMA-3.1)', 'MHA\n(LLaMA-2)']
collapsed = [28.6, base_effect]
healthy = [-30.5, inst_effect]

x = np.arange(len(archs))
width = 0.35

bars1 = ax4.bar(x - width/2, collapsed, width, label='Collapsed→Heal', color='#2ecc71', alpha=0.8)
bars2 = ax4.bar(x + width/2, healthy, width, label='Healthy→Damage', color='#e74c3c', alpha=0.8)

ax4.axhline(y=0, color='black', linestyle='-', linewidth=2)
ax4.set_ylabel('SI Change %')
ax4.set_title('Cross-Architecture: GQA vs MHA')
ax4.set_xticks(x)
ax4.set_xticklabels(archs)
ax4.legend()

for bars in [bars1, bars2]:
    for bar in bars:
        h = bar.get_height()
        ax4.annotate(f'{h:+.1f}%', xy=(bar.get_x() + bar.get_width()/2, h),
                     xytext=(0, 5 if h > 0 else -15), textcoords='offset points',
                     ha='center', fontweight='bold')

plt.suptitle(f'E11-T-Indra-LLaMA2-V3: MHA State-Dependency\nSeeds: {SEEDS}', fontsize=14, fontweight='bold')
plt.tight_layout()

fig_path = f'../figures/E11T_indra_llama2_v3_{TIMESTAMP}.png'
plt.savefig(fig_path, dpi=150, bbox_inches='tight')
plt.show()

print(f"\nFigure saved: {fig_path}")

In [None]:
# Cell 9: Save Results

def convert_to_native(obj):
    if isinstance(obj, dict):
        return {k: convert_to_native(v) for k, v in obj.items()}
    elif isinstance(obj, list):
        return [convert_to_native(v) for v in obj]
    elif isinstance(obj, (np.bool_, np.integer)):
        return int(obj)
    elif isinstance(obj, np.floating):
        return float(obj)
    elif isinstance(obj, np.ndarray):
        return obj.tolist()
    else:
        return obj

filename = f'../results/E11T_indra_llama2_v3_{TIMESTAMP}.json'

output = {
    'experiment': 'E11-T-Indra-LLaMA2-V3',
    'purpose': 'A2 State-Dependency on MHA (E11-v3 Standard)',
    'timestamp': TIMESTAMP,
    'methodology': {
        'standard': 'E11-v3',
        'seeds': SEEDS,
        'noise_injection': 'PRE-ATTENTION',
        'si_measurement': 'GLOBAL + LOCAL',
        'attention_mask': True,
        'chat_template': 'Yes for Instruct',
        'dtype': 'bfloat16',
        'prompts': 'Standard-10 v3'
    },
    'e11x_reference': E11X_REFERENCE,
    'e11t_gqa_reference': E11T_GQA_REFERENCE,
    'noise_levels': NOISE_LEVELS,
    'num_prompts': len(STANDARD_PROMPTS),
    'results': {
        'base': convert_to_native(all_results['base']),
        'instruct': convert_to_native(all_results['instruct'])
    },
    'verdict': convert_to_native(verdict)
}

with open(filename, 'w') as f:
    json.dump(output, f, indent=2)

print(f"Results saved: {filename}")

try:
    from google.colab import files
    files.download(filename)
    files.download(fig_path)
except:
    pass

---

## Summary: E11-T-Indra-LLaMA2-V3

### Methodology (E11-v3 Standard)

| Standard | Implementation |
|----------|----------------|
| Seeds | 42, 123, 456 |
| Noise Injection | **PRE-ATTENTION** |
| SI Measurement | **GLOBAL + LOCAL** |
| Attention Mask | **YES** |
| Chat Template | **YES** (Instruct) |
| dtype | **bfloat16** |

### Key Difference from V1

| Aspect | V1 (Wrong) | V3 (Correct) |
|--------|------------|---------------|
| Noise | POST-attention | PRE-attention |
| Mask | No | Yes |
| Template | No | Yes (Instruct) |
| Seeds | 1 | 3 |
| dtype | fp16 | bf16 |

### Expected Outcomes

| Model | State | Expected | If Confirmed |
|-------|-------|----------|---------------|
| BASE | Collapsed | HEAL (+SI) | A2 partial |
| INSTRUCT | Healthy | DAMAGE (-SI) | A2 complete |
| Both | - | Gap > 20pp | A2 → A+ Tier |

---

*Paper 4: Behavioral Sink Dynamics*  
*E11-T-Indra-LLaMA2-V3: MHA State-Dependency (E11-v3 Standard)*

In [None]:
# Cell 10: Auto-Download

import glob
import shutil

def auto_download():
    try:
        from google.colab import files
    except:
        print('Not in Colab')
        return
    
    print('AUTO-DOWNLOADING...')
    
    all_files = glob.glob('../results/E11T_indra_llama2*.json') + glob.glob('../figures/E11T_indra_llama2*.png')
    if not all_files:
        print('No files found')
        return
    
    import os
    os.makedirs('download', exist_ok=True)
    for f in all_files:
        shutil.copy(f, 'download/')
    
    shutil.make_archive(f'E11T_llama2_v3_{TIMESTAMP}', 'zip', 'download')
    files.download(f'E11T_llama2_v3_{TIMESTAMP}.zip')
    print('DONE!')

auto_download()