# E11-Indra-Gemma27B-V3: Region-Local SI (Codex Fix)

**Paper 4: Behavioral Sink Dynamics**

## Purpose: Address Codex's Region-Local SI Critique

### The Problem (V2)

V2 fixed pre-attention injection but SI is still computed **globally**:
```
SI = 1 - mean_correlation(ALL heads across ALL layers)

Problem: Late-only noise affects 16 layers, but SI averages 46 layers
         ‚Üí Effect "diluted" by 30 unaffected layers
         ‚Üí Late = 0% could be dilution artifact!
```

### The Fix (V3)

Compute SI **only from heads in the target layer range**:
```
Early noise ‚Üí SI_early  = 1 - mean_corr(Early heads only)
Late noise  ‚Üí SI_late   = 1 - mean_corr(Late heads only)

This isolates the effect to the perturbed region!
```

### Codex's Critique (Verbatim)

> "Late bleibt 0 ‚Üí noch nicht erkl√§rt. Das ist weiterhin ein methodischer Red-Flag:
>  - SI wird global √ºber alle Heads berechnet
>  - Late-Noise beeinflusst nur wenige Layers ‚Üí Effekt verschwindet
>  - nicht beweisbar, dass Late immun ist"

### Expected Outcomes

| Outcome | Meaning | Implication |
|---------|---------|-------------|
| Late-Local ‚â† 0% | Codex RIGHT | Global SI masked real effect |
| Late-Local = 0% | Codex WRONG | Late layers truly immune |

---

In [None]:
# Cell 1: Setup + Seeds (E11-v3 STANDARD)
!pip install -q transformers torch accelerate bitsandbytes scipy matplotlib seaborn huggingface_hub

import torch
import numpy as np
import random
import matplotlib.pyplot as plt
import seaborn as sns
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from scipy.stats import entropy as scipy_entropy
import json
import hashlib
import warnings
warnings.filterwarnings('ignore')

import os
from pathlib import Path
from datetime import datetime

# ============ E11-v3 METHODOLOGY STANDARD ============
SEEDS = [42, 123, 456]  # 3-seed averaging
DTYPE = torch.bfloat16  # Standardized precision (Note: 8-bit quantization required for 27B)
EXPECTED_MD5 = "715065bab181f46bf12ed471951141e2"  # Standard-10 v3

def verify_prompts(prompts):
    """Verify Standard-10 prompts via MD5."""
    combined = '|||'.join(prompts)  # Canonical delimiter for MD5
    actual_md5 = hashlib.md5(combined.encode()).hexdigest()
    verified = actual_md5 == EXPECTED_MD5
    print(f"  Prompt MD5: {actual_md5}")
    print(f"  Expected:   {EXPECTED_MD5}")
    print(f"  Verified:   {'‚úì' if verified else '‚úó MISMATCH!'}")
    return verified, actual_md5

# Set initial seed
SEED = SEEDS[0]
os.environ['PYTHONHASHSEED'] = str(SEED)
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(SEED)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

TIMESTAMP = datetime.now().strftime('%Y%m%d_%H%M%S')
Path('results').mkdir(parents=True, exist_ok=True)
Path('figures').mkdir(parents=True, exist_ok=True)
print(f"Timestamp: {TIMESTAMP}")
print(f"E11-v3 Standard: Seeds={SEEDS}, dtype={DTYPE}")
print(f"‚ö†Ô∏è Note: 8-bit quantization required for 27B model (VRAM constraint)")

print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    vram_gb = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f"GPU: {gpu_name}")
    print(f"VRAM: {vram_gb:.1f} GB")

# HF Login
try:
    from google.colab import userdata
    from huggingface_hub import login
    hf_token = userdata.get('HF_TOKEN')
    if hf_token:
        login(token=hf_token)
        print("HF Login: SUCCESS")
except:
    print("Not in Colab or no HF_TOKEN")

In [None]:
# Cell 2: Configuration (V3 - Region-Local SI) - E11-v3 UPDATED

MODEL_NAME = 'google/gemma-2-27b-it'

# Reference Values (from E08b-G)
REFERENCE = {
    'base_si': 0.3490640065124305,
    'instruct_si': 0.34176792550380786,
    'state': 'SICK'
}

RHO_CRIT = 0.267
NOISE_LEVELS = [0.0, 0.01, 0.02, 0.05, 0.1, 0.2]
MAX_LENGTH = 128  # E11-v3 Standard
PRIMARY_SEED = 42

# V2 Reference Results (for comparison)
V2_RESULTS = {
    'early': -10.14,
    'middle': -0.01,
    'late': 0.0,
    'all': -9.73
}

# ============ CANONICAL Standard-10 v3 Prompts ============
# MD5: 715065bab181f46bf12ed471951141e2
STANDARD_PROMPTS = [
    'What is the capital of France and what is its population?',
    'If all roses are flowers and some flowers fade quickly, can we conclude that some roses fade quickly? Explain step by step.',
    'Calculate 47 multiplied by 23 and show your work.',
    "Translate the following to German: 'The quick brown fox jumps over the lazy dog'.",
    'Write a Python function that checks if a number is prime.',
    'Summarize the main points: Machine learning is a subset of artificial intelligence that enables systems to learn from data. It uses algorithms to identify patterns and make decisions with minimal human intervention.',
    "Statement A: 'All birds can fly.' Statement B: 'Penguins are birds that cannot fly.' Are these statements contradictory? Explain.",
    'What are the safety considerations when using a kitchen knife?',
    'Write a haiku about artificial intelligence.',
    "Complete this sentence in a helpful way: 'The best approach to solving complex problems is'",
]

# Verify prompts
print("Verifying Standard-10 prompts...")
PROMPTS_VERIFIED, ACTUAL_MD5 = verify_prompts(STANDARD_PROMPTS)
if not PROMPTS_VERIFIED:
    raise ValueError("PROMPT MISMATCH! Check Standard-10 v3 canonical prompts.")

print(f"\nE11-Indra-Gemma27B-V3: REGION-LOCAL SI (Codex Fix #2)")
print(f"\n{'='*60}")
print(f"KEY CHANGE: SI computed ONLY from target layer range")
print(f"            Late-noise ‚Üí SI_late (not SI_global)")
print(f"            This isolates effect to perturbed region!")
print(f"{'='*60}")
print(f"\nE11-v3 Config: MAX_LENGTH={MAX_LENGTH}, Seeds={SEEDS}")
print(f"‚ö†Ô∏è 8-bit quantization required (27B model)")

In [None]:
# Cell 3: REGION-LOCAL Specialization Metrics (V3 KEY CHANGE!)

def extract_head_activations(model, tokenizer, prompts, max_length=128):
    """
    Extract per-head activation patterns WITH attention masks.
    Returns full attention tensor for later region-local analysis.
    """
    all_attention_patterns = []
    all_attention_masks = []
    
    for prompt in prompts:
        messages = [{"role": "user", "content": prompt}]
        formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
        
        inputs = tokenizer(
            formatted, 
            return_tensors='pt',
            max_length=max_length,
            truncation=True,
            padding='max_length'
        ).to(model.device)
        
        with torch.no_grad():
            outputs = model(**inputs, output_attentions=True)
        
        attn_stack = torch.stack([a.squeeze(0) for a in outputs.attentions], dim=0)
        all_attention_patterns.append(attn_stack.cpu())
        all_attention_masks.append(inputs['attention_mask'].squeeze(0).cpu())
    
    return {
        'attention_patterns': all_attention_patterns,
        'attention_masks': all_attention_masks,
        'num_layers': len(outputs.attentions),
        'num_heads': outputs.attentions[0].shape[1]
    }


def compute_head_entropy_profiles(attention_patterns, attention_masks=None):
    """
    Compute normalized entropy for each head across prompts.
    Returns FULL matrix (num_layers, num_heads) for region-local analysis.
    """
    num_prompts = len(attention_patterns)
    num_layers = attention_patterns[0].shape[0]
    num_heads = attention_patterns[0].shape[1]
    
    all_entropies = np.zeros((num_prompts, num_layers, num_heads))
    
    for p_idx, attn in enumerate(attention_patterns):
        mask = attention_masks[p_idx] if attention_masks is not None else None
        
        for layer in range(num_layers):
            for head in range(num_heads):
                attn_weights = attn[layer, head].float().cpu().numpy()
                
                if mask is not None:
                    mask_np = mask.numpy()
                    valid_len = mask_np.sum()
                    if valid_len > 0:
                        attn_weights = attn_weights[:, :valid_len]
                        attn_weights = attn_weights.mean(axis=0)
                    else:
                        attn_weights = attn_weights.mean(axis=0)
                else:
                    attn_weights = attn_weights.mean(axis=0)
                
                attn_weights = attn_weights / (attn_weights.sum() + 1e-10)
                attn_weights = attn_weights[attn_weights > 0]
                
                if len(attn_weights) > 1:
                    h = scipy_entropy(attn_weights, base=2)
                    h_max = np.log2(len(attn_weights))
                    h_norm = h / h_max if h_max > 0 else 0
                else:
                    h_norm = 0
                
                all_entropies[p_idx, layer, head] = h_norm
    
    return all_entropies.mean(axis=0)  # (num_layers, num_heads)


def compute_specialization_metrics_global(head_entropies):
    """Compute GLOBAL SI (V1/V2 method) - for comparison."""
    num_layers, num_heads = head_entropies.shape
    
    layer_variances = np.var(head_entropies, axis=1)
    mean_variance = float(np.mean(layer_variances))
    
    head_profiles = head_entropies.T  # (num_heads, num_layers)
    head_corr_matrix = np.corrcoef(head_profiles)
    upper_tri = head_corr_matrix[np.triu_indices(num_heads, k=1)]
    mean_head_correlation = float(np.nanmean(upper_tri))
    
    specialization_index = 1.0 - mean_head_correlation
    
    return {
        'mean_head_variance': mean_variance,
        'mean_head_correlation': mean_head_correlation,
        'specialization_index': specialization_index,
        'num_layers': num_layers,
        'num_heads': num_heads,
        'method': 'GLOBAL'
    }


def compute_specialization_metrics_local(head_entropies, layer_start, layer_end):
    """
    V3 KEY FUNCTION: Compute REGION-LOCAL SI.
    
    SI is computed ONLY from heads in layers [layer_start, layer_end).
    This isolates the effect to the perturbed region!
    
    Args:
        head_entropies: Full entropy matrix (num_layers, num_heads)
        layer_start: Start of target region (inclusive)
        layer_end: End of target region (exclusive)
    
    Returns:
        SI computed only from target region heads
    """
    # Extract only the target layer range
    local_entropies = head_entropies[layer_start:layer_end, :]  # (local_layers, num_heads)
    
    local_layers, num_heads = local_entropies.shape
    
    if local_layers == 0:
        return {
            'mean_head_variance': 0.0,
            'mean_head_correlation': 0.0,
            'specialization_index': 0.0,
            'num_layers': 0,
            'num_heads': num_heads,
            'method': 'LOCAL',
            'layer_range': [layer_start, layer_end]
        }
    
    # Compute variance per layer (local)
    layer_variances = np.var(local_entropies, axis=1)
    mean_variance = float(np.mean(layer_variances))
    
    # Compute head correlation (local)
    # Each head's profile is now only over the local layers
    head_profiles = local_entropies.T  # (num_heads, local_layers)
    
    # Need at least 2 data points per head for correlation
    if local_layers < 2:
        # Can't compute meaningful correlation with 1 layer
        # Fall back to variance-based estimate
        mean_head_correlation = 1.0 - mean_variance  # rough estimate
    else:
        head_corr_matrix = np.corrcoef(head_profiles)
        upper_tri = head_corr_matrix[np.triu_indices(num_heads, k=1)]
        mean_head_correlation = float(np.nanmean(upper_tri))
    
    specialization_index = 1.0 - mean_head_correlation
    
    return {
        'mean_head_variance': mean_variance,
        'mean_head_correlation': mean_head_correlation,
        'specialization_index': specialization_index,
        'num_layers': local_layers,
        'num_heads': num_heads,
        'method': 'LOCAL',
        'layer_range': [layer_start, layer_end]
    }


print("Region-Local SI functions defined (V3).")
print("")
print("KEY CHANGE FROM V2:")
print("  - compute_specialization_metrics_global(): V1/V2 method (all layers)")
print("  - compute_specialization_metrics_local(): V3 method (target layers only)")
print("")
print("Example:")
print("  Late-only noise (layers 30-46) ‚Üí compute_local(entropies, 30, 46)")
print("  SI_late reflects ONLY Late layer heads ‚Üí no dilution!")

In [None]:
# Cell 4: PRE-ATTENTION Noise Injector (same as V2)

class PreAttentionNoiseInjector:
    """
    V2/V3: Inject Gaussian noise BEFORE attention computation.
    (Same as V2 - the change is in SI measurement, not injection)
    """
    
    def __init__(self, model, target_range, noise_std=0.0):
        self.model = model
        self.target_start, self.target_end = target_range
        self.noise_std = noise_std
        self.hooks = []
    
    def _make_pre_hook(self, layer_idx):
        def hook(module, args):
            if self.noise_std > 0 and self.target_start <= layer_idx < self.target_end:
                hidden_states = args[0]
                noise = torch.randn_like(hidden_states) * self.noise_std
                noisy_hidden_states = hidden_states + noise
                return (noisy_hidden_states,) + args[1:]
            return args
        return hook
    
    def attach(self):
        for idx, layer in enumerate(self.model.model.layers):
            hook = layer.register_forward_pre_hook(self._make_pre_hook(idx))
            self.hooks.append(hook)
    
    def detach(self):
        for hook in self.hooks:
            hook.remove()
        self.hooks = []
    
    def set_noise(self, std):
        self.noise_std = std

print("PRE-Attention noise injector class defined.")

In [None]:
# Cell 5: Load Model + Architecture Detection (DUAL-RHO PATCHED)

print(f"\n{'='*60}")
print(f"PHASE 1: LOAD MODEL (8-BIT QUANTIZATION)")
print(f"{'='*60}")

print(f"\nLoading: {MODEL_NAME}")

bnb_config = BitsAndBytesConfig(
    load_in_8bit=True,
    bnb_8bit_compute_dtype=torch.float16
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
QUANTIZATION = "8bit"

print(f"\nLoading with 8-bit quantization...")
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config=bnb_config,
    device_map='auto',
    trust_remote_code=True,
    attn_implementation="eager"
)
model.eval()

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Architecture detection
config = model.config
num_layers = config.num_hidden_layers
num_query_heads = config.num_attention_heads
num_kv_heads = getattr(config, 'num_key_value_heads', num_query_heads)
hidden_size = config.hidden_size
d_head = hidden_size // num_query_heads

# === DUAL-RHO CALCULATION (Codex Fix) ===
# rho_kv: Original definition (kv_heads / num_layers) - for backward compatibility
# rho_head: Paper 3 definition (num_heads / sqrt(d_model)) - theoretical basis
import math
rho_kv = num_kv_heads / num_layers
rho_head = num_query_heads / math.sqrt(hidden_size)

# Use rho_head as primary (Paper 3 consistent), keep rho_kv for reference
rho = rho_head  # PRIMARY

if num_kv_heads == num_query_heads:
    attn_type = "MHA"
elif num_kv_heads == 1:
    attn_type = "MQA"
else:
    attn_type = f"GQA ({num_query_heads}:{num_kv_heads})"

has_swa = hasattr(config, 'sliding_window') and config.sliding_window is not None
architecture = f"{attn_type}+SWA" if has_swa else attn_type

MODEL_CONFIG = {
    'name': MODEL_NAME,
    'num_layers': num_layers,
    'num_query_heads': num_query_heads,
    'num_kv_heads': num_kv_heads,
    'd_head': d_head,
    'hidden_size': hidden_size,
    'architecture': architecture,
    # DUAL-RHO (Codex Fix)
    'rho': rho,           # Primary (rho_head)
    'rho_head': rho_head, # num_heads / sqrt(d_model) - Paper 3
    'rho_kv': rho_kv,     # kv_heads / num_layers - legacy
    'rho_crit': RHO_CRIT
}

third = num_layers // 3
LAYER_RANGES = {
    'early': (0, third),
    'middle': (third, 2*third),
    'late': (2*third, num_layers),
    'all': (0, num_layers)
}

print(f"\nArchitecture: {architecture}")
print(f"Layers: {num_layers}, Heads: {num_query_heads}, d_head: {d_head}")
print(f"\n=== DUAL-RHO (Codex Fix) ===")
print(f"  rho_head = {rho_head:.4f} (num_heads / sqrt(d_model)) ‚Üê PRIMARY")
print(f"  rho_kv   = {rho_kv:.4f} (kv_heads / num_layers) ‚Üê legacy")
print(f"  rho_crit = {RHO_CRIT}")
print(f"  Status: {'ABOVE rho_crit (POISON expected)' if rho > RHO_CRIT else 'BELOW rho_crit'}")
print(f"\nLayer Ranges:")
for region, (start, end) in LAYER_RANGES.items():
    print(f"  {region}: layers {start}-{end-1} ({end-start} layers)")

In [None]:
# Cell 6: Baseline Measurement (BOTH Global and Local)

print(f"\n{'='*60}")
print(f"PHASE 2: BASELINE (Global + Region-Local SI)")
print(f"{'='*60}")

baseline_activations = extract_head_activations(model, tokenizer, STANDARD_PROMPTS, max_length=MAX_LENGTH)
baseline_entropies = compute_head_entropy_profiles(
    baseline_activations['attention_patterns'],
    baseline_activations['attention_masks']
)

# Global SI (V1/V2 method)
baseline_global = compute_specialization_metrics_global(baseline_entropies)

# Region-Local SI for each region (V3 method)
baseline_local = {}
for region_name, (start, end) in LAYER_RANGES.items():
    baseline_local[region_name] = compute_specialization_metrics_local(baseline_entropies, start, end)

print(f"\nBaseline Results:")
print(f"\n  GLOBAL SI (V1/V2):")
print(f"    SI = {baseline_global['specialization_index']:.4f}")
print(f"    Corr = {baseline_global['mean_head_correlation']:.4f}")

print(f"\n  REGION-LOCAL SI (V3):")
for region_name, metrics in baseline_local.items():
    print(f"    {region_name}: SI_local = {metrics['specialization_index']:.4f} (layers {metrics['layer_range']})")

# Store baselines
results = {
    'baseline_global': baseline_global,
    'baseline_local': baseline_local,
    'treatments_global': [],
    'treatments_local': [],
    'quantization': QUANTIZATION,
    'injection_method': 'PRE-ATTENTION',
    'si_method': 'GLOBAL + LOCAL (V3)'
}

In [None]:
# Cell 7: V3 Treatment Loop - BOTH Global and Local SI

print(f"\n{'='*60}")
print(f"PHASE 3: INDRA V3 - REGION-LOCAL SI MEASUREMENT")
print(f"{'='*60}")
print(f"\nKEY: For each region, we compute BOTH:")
print(f"     - SI_global (V2 method) - for comparison")
print(f"     - SI_local (V3 method) - isolates perturbed region")
print(f"\nRunning {len(SEEDS)} seeds: {SEEDS}")

all_seed_results = {seed: {'global': [], 'local': []} for seed in SEEDS}

for seed_idx, current_seed in enumerate(SEEDS):
    print(f"\n{'#'*60}")
    print(f"SEED {seed_idx+1}/{len(SEEDS)}: {current_seed}")
    print(f"{'#'*60}")
    
    for region_name, (start, end) in LAYER_RANGES.items():
        print(f"\n  TREATING: {region_name.upper()} (Layers {start}-{end-1})")
        
        region_global = {
            'region': region_name,
            'layer_range': [start, end],
            'seed': current_seed,
            'si_method': 'GLOBAL',
            'noise_tests': []
        }
        
        region_local = {
            'region': region_name,
            'layer_range': [start, end],
            'seed': current_seed,
            'si_method': 'LOCAL',
            'noise_tests': []
        }
        
        for noise_std in NOISE_LEVELS:
            torch.manual_seed(current_seed)
            np.random.seed(current_seed)
            random.seed(current_seed)
            
            injector = PreAttentionNoiseInjector(model, (start, end), noise_std=noise_std)
            injector.attach()
            
            treated_activations = extract_head_activations(
                model, tokenizer, STANDARD_PROMPTS, max_length=MAX_LENGTH
            )
            treated_entropies = compute_head_entropy_profiles(
                treated_activations['attention_patterns'],
                treated_activations['attention_masks']
            )
            
            injector.detach()
            
            # ============ GLOBAL SI (V2 method) ============
            treated_global = compute_specialization_metrics_global(treated_entropies)
            
            si_before_global = baseline_global['specialization_index']
            si_after_global = treated_global['specialization_index']
            si_delta_global = si_after_global - si_before_global
            change_pct_global = (si_delta_global / si_before_global) * 100 if si_before_global > 0 else 0
            
            region_global['noise_tests'].append({
                'noise_std': float(noise_std),
                'si': treated_global['specialization_index'],
                'si_delta': float(si_delta_global),
                'change_pct': float(change_pct_global)
            })
            
            # ============ LOCAL SI (V3 method) ============
            treated_local = compute_specialization_metrics_local(treated_entropies, start, end)
            baseline_local_region = baseline_local[region_name]
            
            si_before_local = baseline_local_region['specialization_index']
            si_after_local = treated_local['specialization_index']
            si_delta_local = si_after_local - si_before_local
            change_pct_local = (si_delta_local / si_before_local) * 100 if si_before_local > 0 else 0
            
            region_local['noise_tests'].append({
                'noise_std': float(noise_std),
                'si': treated_local['specialization_index'],
                'si_delta': float(si_delta_local),
                'change_pct': float(change_pct_local)
            })
            
            # Print comparison (primary seed only)
            if current_seed == PRIMARY_SEED:
                print(f"    sigma={noise_std:.2f}: Global={change_pct_global:+.2f}%, Local={change_pct_local:+.2f}%")
        
        # Store min/max changes
        region_global['min_change_pct'] = min(t['change_pct'] for t in region_global['noise_tests'])
        region_global['max_change_pct'] = max(t['change_pct'] for t in region_global['noise_tests'])
        region_local['min_change_pct'] = min(t['change_pct'] for t in region_local['noise_tests'])
        region_local['max_change_pct'] = max(t['change_pct'] for t in region_local['noise_tests'])
        
        all_seed_results[current_seed]['global'].append(region_global)
        all_seed_results[current_seed]['local'].append(region_local)

# Aggregate across seeds
print(f"\n{'='*60}")
print(f"AGGREGATING ACROSS {len(SEEDS)} SEEDS")
print(f"{'='*60}")

aggregated = {'global': {}, 'local': {}}

for si_method in ['global', 'local']:
    for region_name in LAYER_RANGES.keys():
        region_changes = []
        for seed in SEEDS:
            seed_data = next(t for t in all_seed_results[seed][si_method] if t['region'] == region_name)
            region_changes.append(seed_data['min_change_pct'])
        
        aggregated[si_method][region_name] = {
            'mean': float(np.mean(region_changes)),
            'std': float(np.std(region_changes)),
            'values': region_changes
        }

# Print comparison table
print(f"\n{'Region':<10} {'Global (V2)':<20} {'Local (V3)':<20} {'Diff':<10}")
print("-"*60)
for region_name in LAYER_RANGES.keys():
    g = aggregated['global'][region_name]
    l = aggregated['local'][region_name]
    diff = l['mean'] - g['mean']
    print(f"{region_name:<10} {g['mean']:+.2f}% +/- {g['std']:.2f}%{'':>3} {l['mean']:+.2f}% +/- {l['std']:.2f}%{'':>3} {diff:+.2f}%")

results['treatments_global'] = all_seed_results[PRIMARY_SEED]['global']
results['treatments_local'] = all_seed_results[PRIMARY_SEED]['local']
results['multi_seed_results'] = all_seed_results
results['aggregated'] = aggregated

In [None]:
# Cell 8: V3 Verdict - Codex Critique Resolution

print(f"\n{'='*70}")
print(f"PHASE 4: V3 VERDICT - CODEX REGION-LOCAL CRITIQUE")
print(f"{'='*70}")

print(f"\n{'='*70}")
print("CODEX'S REGION-LOCAL CRITIQUE")
print(f"{'='*70}")
print(f"\nCodex said: 'Late = 0% not proven because SI is global.'")
print(f"            'SI computed globally dilutes Late-only noise effect.'")
print(f"            'Need region-local SI to prove Late is truly immune.'")

late_global = aggregated['global']['late']['mean']
late_local = aggregated['local']['late']['mean']
late_local_std = aggregated['local']['late']['std']

print(f"\nLATE REGION RESULTS:")
print(f"  V2 Global SI: {late_global:+.2f}%")
print(f"  V3 Local SI:  {late_local:+.2f}% +/- {late_local_std:.2f}%")

# Determine verdict
if abs(late_local) > 1.0:
    print(f"\n  VERDICT: CODEX CONFIRMED (Late Dilution)")
    print(f"  Late WAS being diluted by global SI!")
    print(f"  Local SI reveals real Late effect: {late_local:+.2f}%")
    codex_local_verdict = "DILUTION_CONFIRMED"
else:
    print(f"\n  VERDICT: CODEX REFUTED (Late Truly Immune)")
    print(f"  Even with Local SI, Late = {late_local:+.2f}%")
    print(f"  Late layers are truly noise-immune!")
    codex_local_verdict = "IMMUNITY_CONFIRMED"

# Full comparison table
print(f"\n{'='*70}")
print("FULL COMPARISON: V2 (Global) vs V3 (Local)")
print(f"{'='*70}")
print(f"\n{'Region':<10} {'V2 Global':<12} {'V3 Local':<12} {'V2 Ref':<12} {'Interpretation'}")
print("-"*70)

for region_name in ['early', 'middle', 'late', 'all']:
    g = aggregated['global'][region_name]['mean']
    l = aggregated['local'][region_name]['mean']
    v2_ref = V2_RESULTS[region_name]
    
    if abs(l - g) > 2:
        interp = "DILUTION DETECTED"
    elif abs(l) < 1 and abs(g) < 1:
        interp = "Both ~0: Immune"
    elif l < -5:
        interp = "POISON (local)"
    else:
        interp = "Similar"
    
    print(f"{region_name:<10} {g:+.2f}%{'':>5} {l:+.2f}%{'':>5} {v2_ref:+.2f}%{'':>5} {interp}")

# Store verdict
results['verdict'] = {
    'codex_local_verdict': codex_local_verdict,
    'late_global': late_global,
    'late_local': late_local,
    'late_local_std': late_local_std,
    'v2_reference': V2_RESULTS,
    'aggregated': aggregated,
    'seeds_used': SEEDS,
    'model': MODEL_CONFIG['name'],
    'rho': MODEL_CONFIG['rho'],
    'architecture': MODEL_CONFIG['architecture']
}

In [None]:
# Cell 9: Visualization - Global vs Local SI Comparison

fig, axes = plt.subplots(2, 2, figsize=(16, 14))

colors = {'early': '#3498db', 'middle': '#2ecc71', 'late': '#e74c3c', 'all': '#9b59b6'}

# Plot 1: Global vs Local Bar Chart
ax1 = axes[0, 0]
regions = ['early', 'middle', 'late', 'all']
global_vals = [aggregated['global'][r]['mean'] for r in regions]
local_vals = [aggregated['local'][r]['mean'] for r in regions]
global_stds = [aggregated['global'][r]['std'] for r in regions]
local_stds = [aggregated['local'][r]['std'] for r in regions]

x = np.arange(len(regions))
width = 0.35

bars1 = ax1.bar(x - width/2, global_vals, width, yerr=global_stds, label='Global SI (V2)', color='gray', alpha=0.7, capsize=3)
bars2 = ax1.bar(x + width/2, local_vals, width, yerr=local_stds, label='Local SI (V3)', color='blue', alpha=0.7, capsize=3)

ax1.axhline(y=0, color='black', linestyle='--', linewidth=1)
ax1.axhline(y=-5, color='red', linestyle=':', alpha=0.5)
ax1.set_ylabel('SI Change %')
ax1.set_title('V3: Global vs Local SI\n(Does Local SI reveal Late effect?)')
ax1.set_xticks(x)
ax1.set_xticklabels([r.capitalize() for r in regions])
ax1.legend()
ax1.set_ylim(-20, 5)

# Highlight Late region
ax1.annotate('Codex\nCritique', xy=(2 + width/2, local_vals[2]), 
             xytext=(2.5, local_vals[2] - 5),
             fontsize=9, color='red', fontweight='bold',
             arrowprops=dict(arrowstyle='->', color='red', lw=1))

# Plot 2: Local SI Dose-Response
ax2 = axes[0, 1]
for treatment in results['treatments_local']:
    region = treatment['region']
    noise_levels = [t['noise_std'] for t in treatment['noise_tests']]
    change_vals = [t['change_pct'] for t in treatment['noise_tests']]
    ax2.plot(noise_levels, change_vals, 'o-', color=colors[region], 
             label=f"{region.capitalize()} (local)", linewidth=2, markersize=8)

ax2.axhline(y=0, color='black', linestyle='--')
ax2.axhline(y=-5, color='red', linestyle=':', alpha=0.5)
ax2.set_xlabel('Noise Level (sigma)')
ax2.set_ylabel('Local SI Change %')
ax2.set_title('V3 Local SI Dose-Response\n(Each region measured locally)')
ax2.legend()
ax2.grid(True, alpha=0.3)

# Plot 3: Global SI Dose-Response (for comparison)
ax3 = axes[1, 0]
for treatment in results['treatments_global']:
    region = treatment['region']
    noise_levels = [t['noise_std'] for t in treatment['noise_tests']]
    change_vals = [t['change_pct'] for t in treatment['noise_tests']]
    ax3.plot(noise_levels, change_vals, 'o--', color=colors[region], 
             label=f"{region.capitalize()} (global)", linewidth=2, markersize=8, alpha=0.7)

ax3.axhline(y=0, color='black', linestyle='--')
ax3.axhline(y=-5, color='red', linestyle=':', alpha=0.5)
ax3.set_xlabel('Noise Level (sigma)')
ax3.set_ylabel('Global SI Change %')
ax3.set_title('V2 Global SI Dose-Response (Reference)\n(Diluted by non-perturbed layers)')
ax3.legend()
ax3.grid(True, alpha=0.3)

# Plot 4: Verdict Summary
ax4 = axes[1, 1]
ax4.axis('off')

verdict_text = f"""E11-INDRA-GEMMA27B-V3: REGION-LOCAL SI
{'='*50}

CODEX CRITIQUE:
"Late = 0% not proven because SI is global.
 Need region-local SI to isolate effect."

V3 RESULTS:
  Late Global (V2): {late_global:+.2f}%
  Late Local (V3):  {late_local:+.2f}% +/- {late_local_std:.2f}%

VERDICT: {codex_local_verdict}
"""

if codex_local_verdict == "IMMUNITY_CONFIRMED":
    verdict_text += """
Late layers are TRULY IMMUNE to noise!
Even with region-local SI, Late = ~0%.
This is biological, not methodological.

Interpretation:
- Late layers = "frozen" output patterns
- SWA may create locality that resists perturbation
- Paper claim STRENGTHENED
"""
else:
    verdict_text += """
Global SI WAS hiding Late layer effect!
Local SI reveals true Late response.

Interpretation:
- V2 Late = 0% was dilution artifact
- V3 shows real (non-zero) Late effect
- Paper needs methodological note
"""

ax4.text(0.05, 0.95, verdict_text, transform=ax4.transAxes, fontsize=10,
         verticalalignment='top', fontfamily='monospace',
         bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

plt.suptitle(f'E11-Indra-Gemma27B-V3: Region-Local SI\nCodex Critique: {codex_local_verdict}', 
             fontsize=14, fontweight='bold')
plt.tight_layout()

fig_path = f'figures/E11_indra_gemma27b_v3_{TIMESTAMP}.png'
plt.savefig(fig_path, dpi=150, bbox_inches='tight')
plt.show()

print(f"\nFigure saved: {fig_path}")

In [None]:
# Cell 10: Save Results (E11-v3 METHODOLOGY BLOCK)

def convert_to_native(obj):
    if isinstance(obj, dict):
        return {k: convert_to_native(v) for k, v in obj.items()}
    elif isinstance(obj, list):
        return [convert_to_native(v) for v in obj]
    elif isinstance(obj, tuple):
        return tuple(convert_to_native(v) for v in obj)
    elif isinstance(obj, (np.bool_, np.integer)):
        return int(obj)
    elif isinstance(obj, np.floating):
        return float(obj)
    elif isinstance(obj, np.ndarray):
        return obj.tolist()
    else:
        return obj

filename = f'results/E11_indra_gemma27b_v3_{TIMESTAMP}.json'

output = {
    'experiment': 'E11-Indra-Gemma27B-V3',
    'purpose': 'Region-Local SI - Address Codex Dilution Critique',
    'timestamp': TIMESTAMP,
    'model': MODEL_CONFIG['name'],
    'architecture': MODEL_CONFIG['architecture'],
    
    # === E11-v3 METHODOLOGY BLOCK ===
    'methodology': {
        'standard': 'E11-v3',
        'seeds': SEEDS,
        'max_length': MAX_LENGTH,
        'dtype': str(DTYPE),
        'prompt_md5': ACTUAL_MD5,
        'prompt_md5_verified': PROMPTS_VERIFIED,
        'num_prompts': len(STANDARD_PROMPTS),
        'prompt_set': 'Standard-10 v3',
        'quantization': '8-bit (required for 27B)',
        'quantization_note': 'Full precision requires A100-80GB',
        'si_method': 'GLOBAL + LOCAL (V3)',
        'injection_method': 'PRE-ATTENTION'
    },
    
    # === DUAL-RHO (Codex Fix) ===
    'rho': MODEL_CONFIG['rho'],           # Primary (rho_head)
    'rho_head': MODEL_CONFIG['rho_head'], # num_heads / sqrt(d_model) - Paper 3
    'rho_kv': MODEL_CONFIG['rho_kv'],     # kv_heads / num_layers - legacy
    'rho_definition': {
        'primary': 'rho_head',
        'rho_head_formula': 'num_heads / sqrt(d_model)',
        'rho_kv_formula': 'kv_heads / num_layers',
        'note': 'rho_head is Paper 3 consistent, rho_kv for backward compatibility'
    },
    'rho_crit': MODEL_CONFIG['rho_crit'],
    'd_head': MODEL_CONFIG['d_head'],
    'quantization': QUANTIZATION,
    'injection_method': 'PRE-ATTENTION',
    'si_method': 'GLOBAL + LOCAL',
    'codex_critique': {
        'original': 'Late = 0% not proven because SI is global. Late-noise diluted by 30 unaffected layers.',
        'fix': 'V3 computes SI only from heads in target layer range (region-local SI).',
        'expected': 'If Late truly immune, Local SI also = 0%. If dilution artifact, Local SI != 0%.'
    },
    'v2_reference': V2_RESULTS,
    'layer_ranges': {k: list(v) for k, v in LAYER_RANGES.items()},
    'noise_levels': NOISE_LEVELS,
    'seeds': SEEDS,
    'prompt_set': 'Standard-10 v3',
    'num_prompts': len(STANDARD_PROMPTS),
    'results': convert_to_native(results)
}

with open(filename, 'w') as f:
    json.dump(output, f, indent=2)

print(f"Results saved: {filename}")
print(f"\nüìã E11-v3 Compliance:")
print(f"   Seeds: {SEEDS} ‚úì")
print(f"   dtype: {DTYPE} (8-bit for 27B) ‚ö†Ô∏è")
print(f"   MD5: {ACTUAL_MD5} {'‚úì' if PROMPTS_VERIFIED else '‚úó'}")
print(f"   MAX_LENGTH: {MAX_LENGTH} ‚úì")
print(f"\n=== DUAL-RHO in output ===")
print(f"  rho (primary): {output['rho']:.4f}")
print(f"  rho_head:      {output['rho_head']:.4f}")
print(f"  rho_kv:        {output['rho_kv']:.4f}")

try:
    from google.colab import files
    files.download(filename)
    files.download(fig_path)
except:
    pass

In [None]:
# Cell 11: Auto-Download

import glob
import shutil

def auto_download_results():
    try:
        from google.colab import files
    except ImportError:
        print('Not in Colab - skipping auto-download')
        return
    
    print('=' * 60)
    print('AUTO-DOWNLOADING RESULTS...')
    print('=' * 60)
    
    json_files = glob.glob('results/*.json') + glob.glob('figures/*.json')
    png_files = glob.glob('results/*.png') + glob.glob('figures/*.png')
    all_files = json_files + png_files
    
    if not all_files:
        print('WARNING: No result files found!')
        return
    
    print(f'Found {len(all_files)} files')
    
    import os
    zip_name = f'E11_indra_gemma27b_v3_results_{TIMESTAMP}'
    
    os.makedirs('download_package', exist_ok=True)
    for f in all_files:
        shutil.copy(f, 'download_package/')
    
    shutil.make_archive(zip_name, 'zip', 'download_package')
    print(f'Downloading: {zip_name}.zip')
    files.download(f'{zip_name}.zip')
    print('DOWNLOAD COMPLETE!')

auto_download_results()

---

## Summary: E11-Indra-Gemma27B-V3

### Purpose

Address Codex's region-local SI critique:
> "Late = 0% not proven because SI is global. Late-noise diluted by 30 unaffected layers."

### Key Changes from V2

| Aspect | V2 | V3 |
|--------|----|----|  
| SI Measurement | Global (all layers) | Local (target layers only) |
| Late-noise SI | Computed over 46 layers | Computed over 16 Late layers |
| Dilution | 30 unaffected layers included | Only affected layers included |
| **œÅ Definition** | `kv/num_layers` only | **DUAL: rho_head + rho_kv** |

### Dual-œÅ Patch (Codex Fix)

| Name | Formula | Value (Gemma-27B) | Use |
|------|---------|-------------------|-----|
| `rho_head` | `num_heads / ‚àöd_model` | ~0.534 | **PRIMARY** (Paper 3) |
| `rho_kv` | `kv_heads / num_layers` | ~0.348 | Legacy compatibility |

Both values > œÅ_crit (0.267) ‚Üí Poison classification unchanged.

### Possible Outcomes

| Result | Meaning | Implication |
|--------|---------|-------------|
| Late Local = 0% | Late truly immune | V2 result biological, paper STRONGER |
| Late Local != 0% | Dilution artifact | V2 Late=0% was methodological |

### Codex Improvement Suggestions Addressed

1. ‚úÖ **Region-local SI** (this notebook)
2. ‚úÖ **3+ seeds** (V2/V3: 3 seeds)
3. ‚úÖ **Dual-œÅ definition** (PATCHED: rho_head + rho_kv)

---

*Paper 4: Behavioral Sink Dynamics*  
*E11-Indra-Gemma27B-V3: Region-Local SI (Dual-œÅ Patched)*