# E11-T-Phi3-Indra: State-Dependency on Microsoft Heritage (E11-v3 Standard)

**Paper 4: Behavioral Sink Dynamics**

## Purpose: A2 Claim Validation (3rd GQA Heritage)

A2 (Indra State-Dependency) has evidence from:
- GQA (LLaMA-3.1): Gap 59pp
- MHA (LLaMA-2): Gap 138pp

This notebook tests **Microsoft Phi-3** (3rd Heritage) for A2 generalization.

## Methodology (E11-v3 Standard)

| Standard | Implementation |
|----------|----------------|
| Seeds | 42, 123, 456 (3-seed aggregation) |
| Noise Injection | **PRE-ATTENTION** (affects attention weights) |
| SI Measurement | **GLOBAL + LOCAL** (region-isolated) |
| Attention Mask | **YES** (excludes padding from entropy) |
| Chat Template | **YES** for Instruct model |
| dtype | **bfloat16** (stable on A100) |
| Prompts | Standard-10 v3 (MD5: 715065ba) |

## Phi-3 Architecture

| Property | Phi-3-mini | Phi-3-small | Phi-3-medium |
|----------|------------|-------------|---------------|
| Params | 3.8B | 7B | 14B |
| Attention | GQA | GQA | GQA |
| Heritage | Microsoft (synthetic data) | Microsoft | Microsoft |
| Training | Heavy textbook + synthetic | Same | Same |

## Hypothesis

**If A2 is universal:** Phi-3 Instruct should show DAMAGE under noise (healthy model).

---

In [None]:
# Cell 1: Setup + Seeds (E11-v3 STANDARD)
!pip install -q transformers torch accelerate scipy matplotlib seaborn

import torch
import numpy as np
import random
import math
import matplotlib.pyplot as plt
import seaborn as sns
from transformers import AutoModelForCausalLM, AutoTokenizer
from scipy.stats import entropy as scipy_entropy
import json
import warnings
warnings.filterwarnings('ignore')

import os
from pathlib import Path
from datetime import datetime

# === REPRODUCIBILITY SEEDS (E11-v3 STANDARD) ===
SEEDS = [42, 123, 456]
PRIMARY_SEED = 42

def set_seed(seed):
    os.environ['PYTHONHASHSEED'] = str(seed)
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = False

set_seed(PRIMARY_SEED)

TIMESTAMP = datetime.now().strftime('%Y%m%d_%H%M%S')
Path('../results').mkdir(parents=True, exist_ok=True)
Path('../figures').mkdir(parents=True, exist_ok=True)

print(f"E11-T-Phi3-Indra (E11-v3 Standard)")
print(f"Timestamp: {TIMESTAMP}")
print(f"Seeds: {SEEDS}")
print(f"PyTorch: {torch.__version__}")
print(f"CUDA: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

In [None]:
# Cell 2: Configuration (E11-v3 STANDARD)

# === MODEL CONFIGURATION ===
# Phi-3 uses GQA architecture
# Note: Base models may not be publicly available - testing Instruct only

MODEL_CONFIGS = {
    'instruct': {
        'name': 'microsoft/Phi-3-mini-4k-instruct',
        'display': 'Phi-3-Mini-4K-Instruct',
        'state': 'HEALTHY',  # Instruct models are considered HEALTHY
        'expected_effect': 'DAMAGE',  # Healthy models should be DAMAGED by noise
        'use_chat_template': True,
        'params': '3.8B'
    }
}

# Reference values from other architectures
E11T_REFERENCES = {
    'gqa_llama31': {
        'model': 'LLaMA-3.1-8B',
        'collapsed_heal': 28.6,
        'healthy_damage': -30.5,
        'gap_pp': 59.1
    },
    'mha_llama2': {
        'model': 'LLaMA-2-7B',
        'collapsed_heal': 114.05,
        'healthy_damage': -24.02,
        'gap_pp': 138.08
    }
}

# === E11-v3 STANDARD PARAMETERS ===
NOISE_LEVELS = [0.0, 0.01, 0.02, 0.05, 0.1, 0.2]
MAX_LENGTH = 128

# Standard-10 v3 Prompts (MD5: 715065bab181f46bf12ed471951141e2)
STANDARD_PROMPTS = [
    "What is the capital of France and what is its population?",
    "If all roses are flowers and some flowers fade quickly, can we conclude that some roses fade quickly? Explain step by step.",
    "Calculate 47 multiplied by 23 and show your work.",
    "Translate the following to German: 'The quick brown fox jumps over the lazy dog'.",
    "Write a Python function that checks if a number is prime.",
    "Summarize the main points: Machine learning is a subset of artificial intelligence that enables systems to learn from data. It uses algorithms to identify patterns and make decisions with minimal human intervention.",
    "Statement A: 'All birds can fly.' Statement B: 'Penguins are birds that cannot fly.' Are these statements contradictory? Explain.",
    "What are the safety considerations when using a kitchen knife?",
    "Write a haiku about artificial intelligence.",
    "Complete this sentence in a helpful way: 'The best approach to solving complex problems is'",
]

print(f"\nConfiguration (E11-v3 Standard):")
print(f"  Seeds: {SEEDS}")
print(f"  Noise levels: {NOISE_LEVELS}")
print(f"  MAX_LENGTH: {MAX_LENGTH}")
print(f"  Prompts: Standard-10 v3")
print(f"\nModel:")
for key, cfg in MODEL_CONFIGS.items():
    print(f"  {key}: {cfg['display']} ({cfg['params']})")
    print(f"         State: {cfg['state']}, Expected: {cfg['expected_effect']}")

In [None]:
# Cell 3: Specialization Metrics (E11-v3 STANDARD - PHI-3 FIX)
# =============================================================================
# FIX: padding=False statt padding='max_length'
# Phi-3 returns degenerate attention when padded heavily
# =============================================================================

def extract_head_activations(model, tokenizer, prompts, max_length=128, use_chat_template=False):
    """
    Extract per-head attention patterns WITH attention masks.
    E11-v3 STANDARD: Attention mask excludes padding from entropy.
    
    PHI-3 FIX: Use padding=False to avoid degenerate attention patterns!
    """
    all_attention_patterns = []
    all_valid_lengths = []
    
    for prompt in prompts:
        # === CHAT TEMPLATE (E11-v3 STANDARD) ===
        if use_chat_template and hasattr(tokenizer, 'apply_chat_template'):
            try:
                messages = [{"role": "user", "content": prompt}]
                formatted = tokenizer.apply_chat_template(
                    messages, 
                    tokenize=False, 
                    add_generation_prompt=True
                )
            except Exception as e:
                print(f"Chat template failed: {e}, using raw prompt")
                formatted = prompt
        else:
            formatted = prompt
        
        # PHI-3 FIX: NO PADDING - use actual sequence length
        inputs = tokenizer(
            formatted, 
            return_tensors='pt',
            max_length=max_length,
            truncation=True,
            padding=False  # CRITICAL FIX: No padding!
        ).to(model.device)
        
        valid_len = inputs['input_ids'].shape[1]
        
        with torch.no_grad():
            # CRITICAL: use_cache=False for Phi-3 (DynamicCache bug)
            outputs = model(**inputs, output_attentions=True, use_cache=False)
        
        # Stack attention layers: [layers, heads, seq, seq]
        attn_stack = torch.stack([a.squeeze(0) for a in outputs.attentions], dim=0)
        all_attention_patterns.append(attn_stack.cpu())
        all_valid_lengths.append(valid_len)
    
    return {
        'attention_patterns': all_attention_patterns,
        'valid_lengths': all_valid_lengths,
        'num_layers': len(outputs.attentions),
        'num_heads': outputs.attentions[0].shape[1]
    }


def compute_head_entropy_profiles(attention_patterns, valid_lengths):
    """
    Compute normalized entropy (E11-v3 STANDARD).
    PHI-3 FIX: Use valid_lengths instead of attention_mask.
    """
    num_prompts = len(attention_patterns)
    num_layers = attention_patterns[0].shape[0]
    num_heads = attention_patterns[0].shape[1]
    
    all_entropies = np.zeros((num_prompts, num_layers, num_heads))
    
    for p_idx, attn in enumerate(attention_patterns):
        valid_len = valid_lengths[p_idx]
        
        for layer in range(num_layers):
            for head in range(num_heads):
                attn_weights = attn[layer, head].float().cpu().numpy()
                
                # Already correctly sized (no padding), but slice just in case
                attn_weights = attn_weights[:valid_len, :valid_len]
                
                # Average across query positions
                attn_weights = attn_weights.mean(axis=0)
                
                # Normalize
                attn_weights = attn_weights / (attn_weights.sum() + 1e-10)
                attn_weights = attn_weights[attn_weights > 0]
                
                if len(attn_weights) > 1:
                    h = scipy_entropy(attn_weights, base=2)
                    h_max = np.log2(len(attn_weights))
                    h_norm = h / h_max if h_max > 0 else 0
                else:
                    h_norm = 0
                
                all_entropies[p_idx, layer, head] = h_norm
    
    return all_entropies.mean(axis=0)


def compute_si_global(head_entropies):
    """Compute GLOBAL SI (all layers)."""
    num_layers, num_heads = head_entropies.shape
    head_profiles = head_entropies.T
    head_corr_matrix = np.corrcoef(head_profiles)
    upper_tri = head_corr_matrix[np.triu_indices(num_heads, k=1)]
    mean_head_correlation = float(np.nanmean(upper_tri))
    
    return {
        'specialization_index': 1.0 - mean_head_correlation,
        'mean_head_correlation': mean_head_correlation,
        'method': 'GLOBAL'
    }


def compute_si_local(head_entropies, layer_start, layer_end):
    """Compute REGION-LOCAL SI."""
    local_entropies = head_entropies[layer_start:layer_end, :]
    local_layers, num_heads = local_entropies.shape
    
    if local_layers < 2:
        return {
            'specialization_index': 0.0,
            'mean_head_correlation': 1.0,
            'method': 'LOCAL',
            'layer_range': [layer_start, layer_end]
        }
    
    head_profiles = local_entropies.T
    head_corr_matrix = np.corrcoef(head_profiles)
    upper_tri = head_corr_matrix[np.triu_indices(num_heads, k=1)]
    mean_head_correlation = float(np.nanmean(upper_tri))
    
    return {
        'specialization_index': 1.0 - mean_head_correlation,
        'mean_head_correlation': mean_head_correlation,
        'method': 'LOCAL',
        'layer_range': [layer_start, layer_end]
    }

print("Specialization metrics loaded (E11-v3 Standard + PHI-3 FIX).")
print("  - Attention mask: NO (using valid_lengths)")
print("  - Padding: FALSE (critical Phi-3 fix!)")
print("  - Chat template: Configurable")
print("  - SI methods: GLOBAL + LOCAL")
print("  - use_cache: FALSE (Phi-3 DynamicCache fix)")

In [None]:
# Cell 3b: SANITY CHECK (PHI-3 FIXED VERSION)
# =============================================================================
# Uses padding=False fix - should now pass!
# =============================================================================

def run_sanity_check(model_name, use_chat_template=True):
    """
    Sanity check with PHI-3 FIX: No padding.
    """
    print(f"\n{'='*70}")
    print(f"SANITY CHECK (PHI-3 FIX): {model_name}")
    print(f"{'='*70}")
    
    # Load model
    print("\n1. Loading model...")
    tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype=torch.bfloat16,
        device_map='auto',
        trust_remote_code=True,
        attn_implementation="eager"
    )
    model.eval()
    
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
    
    # Single prompt test
    test_prompt = "What is the capital of France and what is its population?"
    print(f"\n2. Test prompt: '{test_prompt[:50]}...'")
    
    # Format with chat template
    if use_chat_template and hasattr(tokenizer, 'apply_chat_template'):
        messages = [{"role": "user", "content": test_prompt}]
        formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
        print(f"   Chat template applied: {len(formatted)} chars")
    else:
        formatted = test_prompt
        print(f"   Raw prompt (no chat template)")
    
    # PHI-3 FIX: NO PADDING
    inputs = tokenizer(
        formatted, 
        return_tensors='pt',
        max_length=MAX_LENGTH,
        truncation=True,
        padding=False  # CRITICAL FIX!
    ).to(model.device)
    
    valid_len = inputs['input_ids'].shape[1]
    print(f"\n3. Tokenization (PHI-3 FIX):")
    print(f"   Sequence length: {valid_len}")
    print(f"   Padding: FALSE")
    
    assert valid_len > 5, f"ABORT: valid_len={valid_len} too small"
    print(f"   ✅ valid_len > 5: PASS")
    
    # Forward pass
    print(f"\n4. Forward pass...")
    with torch.no_grad():
        outputs = model(**inputs, output_attentions=True, use_cache=False)
    
    assert outputs.attentions is not None, "ABORT: outputs.attentions is None"
    print(f"   ✅ outputs.attentions exists: PASS")
    
    # Attention diagnostics
    attn_layer0 = outputs.attentions[0].squeeze(0)
    num_layers = len(outputs.attentions)
    num_heads = attn_layer0.shape[0]
    
    print(f"\n5. Attention diagnostics:")
    print(f"   Num layers: {num_layers}")
    print(f"   Num heads: {num_heads}")
    print(f"   Layer 0 shape: {attn_layer0.shape}")
    
    attn_abs_mean = attn_layer0.abs().mean().item()
    attn_std = attn_layer0.std().item()
    
    print(f"   attn.abs().mean() = {attn_abs_mean:.6f}")
    print(f"   attn.std() = {attn_std:.6f}")
    
    assert attn_abs_mean > 0, "ABORT: attn.abs().mean() = 0"
    print(f"   ✅ attn.abs().mean() > 0: PASS")
    
    assert torch.isfinite(attn_layer0).all(), "ABORT: attention contains NaN/Inf"
    print(f"   ✅ torch.isfinite(attn): PASS")
    
    # Head diversity check
    head0 = attn_layer0[0]
    head1 = attn_layer0[1]
    heads_identical = torch.allclose(head0, head1, atol=1e-4)
    
    print(f"\n6. Head diversity check:")
    print(f"   Head 0 vs Head 1 identical? {heads_identical}")
    if heads_identical:
        print(f"   ⚠️ WARNING: Heads identical")
    else:
        print(f"   ✅ Heads are different: PASS")
    
    # Compute SI using fixed functions
    print(f"\n7. Computing baseline SI (PHI-3 FIX)...")
    act = extract_head_activations(
        model, tokenizer, [test_prompt], MAX_LENGTH,
        use_chat_template=use_chat_template
    )
    ent = compute_head_entropy_profiles(act['attention_patterns'], act['valid_lengths'])
    si_result = compute_si_global(ent)
    baseline_si = si_result['specialization_index']
    mean_corr = si_result['mean_head_correlation']
    
    entropy_min = ent.min()
    entropy_max = ent.max()
    
    print(f"   Entropy range: [{entropy_min:.4f}, {entropy_max:.4f}]")
    print(f"\n8. BASELINE SI:")
    print(f"   Mean head correlation: {mean_corr:.4f}")
    print(f"   Specialization Index: {baseline_si:.4f}")
    
    SI_THRESHOLD = 0.05
    if baseline_si < SI_THRESHOLD:
        print(f"\n   ❌ SANITY CHECK FAILED!")
        print(f"   baseline_si = {baseline_si:.4f} < {SI_THRESHOLD}")
        del model
        torch.cuda.empty_cache()
        raise AssertionError(f"ABORT: baseline_si={baseline_si:.4f} < {SI_THRESHOLD}")
    
    print(f"\n   ✅ baseline_si > {SI_THRESHOLD}: PASS")
    
    del model
    torch.cuda.empty_cache()
    
    print(f"\n{'='*70}")
    print(f"SANITY CHECK PASSED! ✅")
    print(f"{'='*70}")
    
    return {
        'valid_len': valid_len,
        'attn_abs_mean': attn_abs_mean,
        'attn_std': attn_std,
        'entropy_range': [entropy_min, entropy_max],
        'baseline_si': baseline_si,
        'heads_identical': heads_identical
    }

# RUN SANITY CHECK
print("Running sanity check (PHI-3 FIX: padding=False)...")
sanity_result = run_sanity_check(
    MODEL_CONFIGS['instruct']['name'],
    use_chat_template=MODEL_CONFIGS['instruct']['use_chat_template']
)
print(f"\nSanity check result: {sanity_result}")

In [None]:
# Cell 4: PRE-ATTENTION Noise Injector (E11-v3 STANDARD)

class PreAttentionNoiseInjector:
    """
    E11-v3 STANDARD: Inject noise BEFORE attention computation.
    This affects the attention weights (SI measurement target).
    
    CRITICAL: Phi-3 uses different layer structure!
    We need to hook into the correct module.
    """
    
    def __init__(self, model, target_range, noise_std=0.0):
        self.model = model
        self.target_start, self.target_end = target_range
        self.noise_std = noise_std
        self.hooks = []
        
        # Detect model architecture
        self.layer_path = self._detect_layer_path()
    
    def _detect_layer_path(self):
        """Detect the correct layer path for different model architectures."""
        # Try common paths
        if hasattr(self.model, 'model') and hasattr(self.model.model, 'layers'):
            return 'model.layers'  # LLaMA, Mistral, Phi-3
        elif hasattr(self.model, 'transformer') and hasattr(self.model.transformer, 'h'):
            return 'transformer.h'  # GPT-2, Falcon
        elif hasattr(self.model, 'gpt_neox') and hasattr(self.model.gpt_neox, 'layers'):
            return 'gpt_neox.layers'  # Pythia
        else:
            raise ValueError(f"Unknown model architecture: {type(self.model)}")
    
    def _get_layers(self):
        """Get the transformer layers."""
        if self.layer_path == 'model.layers':
            return self.model.model.layers
        elif self.layer_path == 'transformer.h':
            return self.model.transformer.h
        elif self.layer_path == 'gpt_neox.layers':
            return self.model.gpt_neox.layers
    
    def _make_pre_hook(self, layer_idx):
        """Create forward PRE-hook (before attention)."""
        def hook(module, args):
            if self.noise_std > 0 and self.target_start <= layer_idx < self.target_end:
                hidden_states = args[0]
                noise = torch.randn_like(hidden_states) * self.noise_std
                noisy_hidden_states = hidden_states + noise
                return (noisy_hidden_states,) + args[1:]
            return args
        return hook
    
    def attach(self):
        """Attach PRE-hooks to transformer layers."""
        layers = self._get_layers()
        for idx, layer in enumerate(layers):
            hook = layer.register_forward_pre_hook(self._make_pre_hook(idx))
            self.hooks.append(hook)
    
    def detach(self):
        """Remove all hooks."""
        for hook in self.hooks:
            hook.remove()
        self.hooks = []

print("PRE-Attention noise injector loaded (E11-v3 Standard).")
print("  - Injection point: BEFORE attention")
print("  - Architecture detection: Auto")

In [None]:
# Cell 5: Run Indra Test on Phi-3 (PHI-3 FIX: valid_lengths)

def run_indra_test(model_config, noise_levels, prompts, seeds):
    """
    Run full Indra test with multi-seed aggregation (E11-v3 Standard).
    PHI-3 FIX: Uses valid_lengths instead of attention_masks.
    """
    print(f"\n{'='*70}")
    print(f"TESTING: {model_config['display']}")
    print(f"State: {model_config['state']}, Expected: {model_config['expected_effect']}")
    print(f"{'='*70}")
    
    # Load model with bfloat16 (E11-v3 STANDARD)
    print(f"\nLoading: {model_config['name']}")
    tokenizer = AutoTokenizer.from_pretrained(
        model_config['name'],
        trust_remote_code=True
    )
    model = AutoModelForCausalLM.from_pretrained(
        model_config['name'],
        torch_dtype=torch.bfloat16,
        device_map='auto',
        trust_remote_code=True,
        attn_implementation="eager"
    )
    model.eval()
    
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
    
    # Architecture info
    config = model.config
    num_layers = config.num_hidden_layers
    num_heads = config.num_attention_heads
    num_kv_heads = getattr(config, 'num_key_value_heads', num_heads)
    hidden_size = config.hidden_size
    d_head = hidden_size // num_heads
    
    rho_head = num_heads / math.sqrt(hidden_size)
    rho_kv = num_kv_heads / num_layers
    
    if num_kv_heads == num_heads:
        arch_type = "MHA"
    elif num_kv_heads == 1:
        arch_type = "MQA"
    else:
        arch_type = f"GQA ({num_heads}:{num_kv_heads})"
    
    print(f"  Architecture: {arch_type}")
    print(f"  Layers: {num_layers}, Heads: {num_heads}, KV Heads: {num_kv_heads}")
    print(f"  d_head: {d_head}, hidden_size: {hidden_size}")
    print(f"  rho_head: {rho_head:.4f}, rho_kv: {rho_kv:.4f}")
    
    # Layer ranges (thirds)
    third = num_layers // 3
    layer_ranges = {
        'early': (0, third),
        'middle': (third, 2*third),
        'late': (2*third, num_layers),
        'all': (0, num_layers)
    }
    
    # Baseline measurement (PHI-3 FIX: valid_lengths)
    print(f"\n  Measuring baseline (no noise)...")
    set_seed(PRIMARY_SEED)
    baseline_act = extract_head_activations(
        model, tokenizer, prompts, MAX_LENGTH,
        use_chat_template=model_config['use_chat_template']
    )
    baseline_ent = compute_head_entropy_profiles(
        baseline_act['attention_patterns'],
        baseline_act['valid_lengths']  # PHI-3 FIX
    )
    baseline_global = compute_si_global(baseline_ent)
    
    baseline_local = {}
    for region, (start, end) in layer_ranges.items():
        baseline_local[region] = compute_si_local(baseline_ent, start, end)
    
    print(f"  Baseline Global SI: {baseline_global['specialization_index']:.4f}")
    print(f"  Baseline Correlation: {baseline_global['mean_head_correlation']:.4f}")
    
    # CRITICAL: Assert baseline SI > 0.05
    if baseline_global['specialization_index'] < 0.05:
        raise RuntimeError(f"Baseline SI too low: {baseline_global['specialization_index']:.4f} - measurement failure!")
    
    # Multi-seed treatment loop
    all_seed_results = {}
    
    for seed in seeds:
        print(f"\n  Seed {seed}:")
        seed_results = {'global': [], 'local': []}
        
        for region_name, (start, end) in layer_ranges.items():
            region_global = {'region': region_name, 'tests': []}
            region_local = {'region': region_name, 'tests': []}
            
            for noise_std in noise_levels:
                set_seed(seed)
                
                injector = PreAttentionNoiseInjector(model, (start, end), noise_std)
                injector.attach()
                
                treated_act = extract_head_activations(
                    model, tokenizer, prompts, MAX_LENGTH,
                    use_chat_template=model_config['use_chat_template']
                )
                treated_ent = compute_head_entropy_profiles(
                    treated_act['attention_patterns'],
                    treated_act['valid_lengths']  # PHI-3 FIX
                )
                
                injector.detach()
                
                # Global SI
                treated_global = compute_si_global(treated_ent)
                si_before = baseline_global['specialization_index']
                si_after = treated_global['specialization_index']
                change_pct = ((si_after - si_before) / si_before * 100) if si_before > 0 else 0
                
                region_global['tests'].append({
                    'noise': noise_std,
                    'si': si_after,
                    'change_pct': change_pct
                })
                
                # Local SI
                treated_local = compute_si_local(treated_ent, start, end)
                si_before_local = baseline_local[region_name]['specialization_index']
                si_after_local = treated_local['specialization_index']
                change_pct_local = ((si_after_local - si_before_local) / si_before_local * 100) if si_before_local > 0 else 0
                
                region_local['tests'].append({
                    'noise': noise_std,
                    'si': si_after_local,
                    'change_pct': change_pct_local
                })
            
            # Best/Worst effect based on expected outcome
            if model_config['expected_effect'] == 'HEAL':
                region_global['best'] = max(region_global['tests'], key=lambda x: x['change_pct'])
                region_local['best'] = max(region_local['tests'], key=lambda x: x['change_pct'])
            else:  # DAMAGE expected
                region_global['best'] = min(region_global['tests'], key=lambda x: x['change_pct'])
                region_local['best'] = min(region_local['tests'], key=lambda x: x['change_pct'])
            
            seed_results['global'].append(region_global)
            seed_results['local'].append(region_local)
            
            if seed == PRIMARY_SEED:
                print(f"    {region_name}: Global={region_global['best']['change_pct']:+.1f}%, Local={region_local['best']['change_pct']:+.1f}%")
        
        all_seed_results[str(seed)] = seed_results
    
    # Aggregate across seeds
    aggregated = {'global': {}, 'local': {}}
    for si_type in ['global', 'local']:
        for region_name in layer_ranges.keys():
            values = []
            for seed in seeds:
                region_data = next(r for r in all_seed_results[str(seed)][si_type] if r['region'] == region_name)
                values.append(region_data['best']['change_pct'])
            aggregated[si_type][region_name] = {
                'mean': float(np.mean(values)),
                'std': float(np.std(values)),
                'values': values
            }
    
    # Cleanup
    del model
    torch.cuda.empty_cache()
    
    return {
        'model_name': model_config['name'],
        'display_name': model_config['display'],
        'state': model_config['state'],
        'expected_effect': model_config['expected_effect'],
        'architecture': arch_type,
        'rho_head': rho_head,
        'rho_kv': rho_kv,
        'num_layers': num_layers,
        'num_heads': num_heads,
        'num_kv_heads': num_kv_heads,
        'd_head': d_head,
        'baseline_global': baseline_global,
        'baseline_local': {k: v for k, v in baseline_local.items()},
        'all_seed_results': all_seed_results,
        'aggregated': aggregated,
        'layer_ranges': layer_ranges
    }

print("Test function loaded (E11-v3 Standard + PHI-3 FIX).")

In [None]:
# Cell 6: Run Test

print(f"\n{'#'*70}")
print(f"# E11-T-Phi3-Indra: Microsoft Heritage Test")
print(f"# State-Dependency on 3rd GQA Family (E11-v3 Standard)")
print(f"{'#'*70}")

results = run_indra_test(
    model_config=MODEL_CONFIGS['instruct'],
    noise_levels=NOISE_LEVELS,
    prompts=STANDARD_PROMPTS,
    seeds=SEEDS
)

print(f"\n{'='*70}")
print("TEST COMPLETE!")
print(f"{'='*70}")

In [None]:
# Cell 7: Verdict Analysis

print(f"\n{'='*70}")
print(f"PHI-3 STATE-DEPENDENCY VERDICT")
print(f"{'='*70}")

# Thresholds
DAMAGE_THRESHOLD = -5.0

# Get worst effect (most negative = most damage)
worst_global = min(results['aggregated']['global'].values(), key=lambda x: x['mean'])
worst_region = [k for k, v in results['aggregated']['global'].items() if v['mean'] == worst_global['mean']][0]
effect = worst_global['mean']

print(f"\n[INSTRUCT] Phi-3-Mini ({results['state']})")
print(f"  Baseline SI: {results['baseline_global']['specialization_index']:.4f}")
print(f"  Architecture: {results['architecture']}")
print(f"  rho_kv: {results['rho_kv']:.4f}")
print(f"  Expected: DAMAGE (-SI)")
print(f"  Actual Effect: {effect:+.2f}% at {worst_region}")
print(f"  Seeds: {worst_global['values']}")

if effect < DAMAGE_THRESHOLD:
    verdict = "DAMAGED"
    a2_impact = "A2 CONFIRMED on 3rd Heritage!"
elif effect > 5.0:
    verdict = "UNEXPECTED_HEAL"
    a2_impact = "A2 PARTIAL - Phi-3 behaves like COLLAPSED!"
else:
    verdict = "NO_EFFECT"
    a2_impact = "A2 NEUTRAL - Effect too small"

print(f"\n  VERDICT: {verdict}")
print(f"  IMPACT: {a2_impact}")

# Cross-architecture comparison
print(f"\n{'='*70}")
print("CROSS-ARCHITECTURE COMPARISON (Healthy Models)")
print(f"{'='*70}")
print(f"\n{'Model':<25} {'Heritage':<15} {'Effect':<15} {'Region':<10}")
print("-"*65)
print(f"{'LLaMA-3.1-8B-Instruct':<25} {'Meta':<15} {'-30.5%':<15} {'middle':<10}")
print(f"{'LLaMA-2-7B-Chat':<25} {'Meta':<15} {'-24.0%':<15} {'middle':<10}")
print(f"{'Phi-3-Mini-Instruct':<25} {'Microsoft':<15} {effect:+.1f}%{'':>10} {worst_region:<10}")

# Store verdict
verdict_data = {
    'verdict': verdict,
    'effect': effect,
    'region': worst_region,
    'a2_impact': a2_impact,
    'baseline_si': results['baseline_global']['specialization_index'],
    'comparison': {
        'llama31_instruct': -30.5,
        'llama2_chat': -24.0,
        'phi3_mini': effect
    }
}

In [None]:
# Cell 8: Visualization

fig, axes = plt.subplots(1, 2, figsize=(14, 6))

colors = {'early': '#3498db', 'middle': '#2ecc71', 'late': '#e74c3c', 'all': '#9b59b6'}

# Plot 1: Phi-3 Region Effects
ax1 = axes[0]
regions = list(results['aggregated']['global'].keys())
means = [results['aggregated']['global'][r]['mean'] for r in regions]
stds = [results['aggregated']['global'][r]['std'] for r in regions]

bars = ax1.bar(regions, means, yerr=stds, color=[colors[r] for r in regions], alpha=0.8, capsize=5)
ax1.axhline(y=0, color='black', linestyle='-')
ax1.axhline(y=-5, color='red', linestyle=':', alpha=0.7, label='-5% threshold')
ax1.set_ylabel('SI Change %')
ax1.set_title(f'Phi-3-Mini-Instruct (HEALTHY)\nExpected: DAMAGE | Verdict: {verdict}')
ax1.legend()

for bar, val in zip(bars, means):
    ax1.annotate(f'{val:+.1f}%', xy=(bar.get_x() + bar.get_width()/2, val),
                 xytext=(0, 5 if val > 0 else -12), textcoords='offset points',
                 ha='center', fontweight='bold')

# Plot 2: Cross-Architecture Comparison
ax2 = axes[1]
models = ['LLaMA-3.1\n(Meta)', 'LLaMA-2\n(Meta)', 'Phi-3\n(Microsoft)']
effects = [-30.5, -24.0, effect]
bar_colors = ['#e74c3c', '#e74c3c', '#e74c3c' if effect < 0 else '#2ecc71']

bars = ax2.bar(models, effects, color=bar_colors, alpha=0.8, edgecolor='black', linewidth=2)
ax2.axhline(y=0, color='black', linestyle='-', linewidth=2)
ax2.axhline(y=-5, color='red', linestyle=':', alpha=0.7)
ax2.set_ylabel('SI Change % (Healthy Models)')
ax2.set_title('Cross-Heritage: Healthy Model Damage')

for bar, eff in zip(bars, effects):
    ax2.annotate(f'{eff:+.1f}%', xy=(bar.get_x() + bar.get_width()/2, eff),
                 xytext=(0, -20 if eff < 0 else 10), textcoords='offset points',
                 ha='center', fontsize=12, fontweight='bold')

plt.suptitle(f'E11-T-Phi3-Indra: Microsoft Heritage Test\nSeeds: {SEEDS}', fontsize=14, fontweight='bold')
plt.tight_layout()

fig_path = f'../figures/E11T_phi3_indra_{TIMESTAMP}.png'
plt.savefig(fig_path, dpi=150, bbox_inches='tight')
plt.show()

print(f"\nFigure saved: {fig_path}")

In [None]:
# Cell 9: Save Results

def convert_to_native(obj):
    if isinstance(obj, dict):
        return {k: convert_to_native(v) for k, v in obj.items()}
    elif isinstance(obj, list):
        return [convert_to_native(v) for v in obj]
    elif isinstance(obj, (np.bool_, np.integer)):
        return int(obj)
    elif isinstance(obj, np.floating):
        return float(obj)
    elif isinstance(obj, np.ndarray):
        return obj.tolist()
    else:
        return obj

filename = f'../results/E11T_phi3_indra_{TIMESTAMP}.json'

output = {
    'experiment': 'E11-T-Phi3-Indra',
    'purpose': 'A2 State-Dependency on Microsoft Heritage (3rd GQA Family)',
    'timestamp': TIMESTAMP,
    'methodology': {
        'standard': 'E11-v3',
        'seeds': SEEDS,
        'noise_injection': 'PRE-ATTENTION',
        'si_measurement': 'GLOBAL + LOCAL',
        'attention_mask': True,
        'chat_template': True,
        'dtype': 'bfloat16',
        'prompts': 'Standard-10 v3',
        'max_length': MAX_LENGTH
    },
    'references': E11T_REFERENCES,
    'noise_levels': NOISE_LEVELS,
    'num_prompts': len(STANDARD_PROMPTS),
    'results': convert_to_native(results),
    'verdict': convert_to_native(verdict_data)
}

with open(filename, 'w') as f:
    json.dump(output, f, indent=2)

print(f"Results saved: {filename}")

# Auto-download
try:
    from google.colab import files
    import shutil
    import os
    os.makedirs('download', exist_ok=True)
    shutil.copy(filename, 'download/')
    shutil.copy(fig_path, 'download/')
    shutil.make_archive(f'E11T_phi3_indra_{TIMESTAMP}', 'zip', 'download')
    files.download(f'E11T_phi3_indra_{TIMESTAMP}.zip')
    print('Downloaded!')
except:
    print('Not in Colab')

---

## Summary: E11-T-Phi3-Indra

### Methodology (E11-v3 Standard)

| Standard | Implementation |
|----------|----------------|
| Seeds | 42, 123, 456 |
| Noise | PRE-ATTENTION |
| SI | GLOBAL + LOCAL |
| Mask | YES |
| Chat Template | YES |
| dtype | bfloat16 |
| Prompts | Standard-10 v3 |

### Expected Outcomes

| Model | State | Expected | If Confirmed |
|-------|-------|----------|---------------|
| Phi-3-Mini-Instruct | HEALTHY | DAMAGE (-SI) | A2 on 3rd Heritage |

### Cross-Architecture Target

| Heritage | Model | Healthy Damage |
|----------|-------|----------------|
| Meta | LLaMA-3.1 | -30.5% |
| Meta | LLaMA-2 | -24.0% |
| **Microsoft** | **Phi-3** | **???** |

---

*Paper 4: Behavioral Sink Dynamics*  
*E11-T-Phi3-Indra: Microsoft Heritage Test (E11-v3 Standard)*