# E11-T-Indra-B: Base Control (Artifact Check)

**Paper 4: Behavioral Sink Dynamics**

## Critical Question

> **Does noise artificially inflate Specialization Index even in HEALTHY (non-collapsed) models?**

## Why This Matters

E11-T-Indra showed 28.6% "recovery" in collapsed Instruct model. But:
- We measured SI **while noise was active**
- Noise mechanically forces different head responses
- This could be an artifact, not real specialization recovery

## The Control

Run the **exact same protocol** on LLaMA-3.1-8B-**BASE** (healthy, SI=0.7134):

| Expected if REAL | Expected if ARTIFACT |
|------------------|---------------------|
| Base SI stays ~same or decreases | Base SI INCREASES under noise |
| (already specialized, noise = disruption) | (noise = artificial variance) |

## Verdict Logic (Unified Thresholds)

| Base SI Change | Interpretation | E11-T-Indra Status |
|----------------|----------------|---------------------|
| **< 5%** | Noise doesn't inflate healthy SI | **REAL** - Recovery is genuine |
| **5-15%** | Noise has some effect | **PARTIAL** - Some artifact |
| **> 15%** | Noise inflates SI universally | **ARTIFACT** - Recovery may be fake |
| **Negative** | Noise disrupts specialization | **STRONGLY REAL** - Collapsed model has latent capacity |

## Dual-Check Approach

1. **Primary:** Compare Early@σ=0.02 (direct comparison with E11-T-Indra)
2. **Secondary:** Check if ANY region/σ exceeds artifact threshold

---

In [None]:
# Cell 1: Setup
!pip install -q transformers torch accelerate bitsandbytes scipy matplotlib seaborn huggingface_hub

import torch
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from transformers import AutoModelForCausalLM, AutoTokenizer
from scipy.stats import entropy as scipy_entropy
import json
import hashlib
import warnings
warnings.filterwarnings('ignore')

import os
from pathlib import Path
from datetime import datetime

# E11-v3 STANDARD: 3-Seed Reproducibility
SEEDS = [42, 123, 456]
os.environ['PYTHONHASHSEED'] = '42'
torch.manual_seed(42)
np.random.seed(42)

TIMESTAMP = datetime.now().strftime('%Y%m%d_%H%M%S')
Path('results').mkdir(parents=True, exist_ok=True)
Path('figures').mkdir(parents=True, exist_ok=True)
print(f"Timestamp: {TIMESTAMP}")
print(f"E11-v3 Standard: Seeds {SEEDS}")

print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

# HF Login for gated models (LLaMA) - REQUIRED!
from huggingface_hub import login, HfFolder

def get_hf_token():
    token = None
    try:
        from google.colab import userdata
        token = userdata.get('HF_TOKEN')
    except Exception:
        pass
    if not token:
        token = os.environ.get('HF_TOKEN') or os.environ.get('HUGGINGFACE_TOKEN') or os.environ.get('HUGGING_FACE_HUB_TOKEN')
    if not token:
        token = HfFolder.get_token()
    return token

HF_TOKEN = get_hf_token()
if HF_TOKEN:
    try:
        login(token=HF_TOKEN)
        print("HF Login: SUCCESS (required for gated models)")
    except Exception as e:
        print(f"HF Login failed: {e}")
else:
    print("WARNING: No HF_TOKEN found! LLaMA requires authentication.")
    print("Colab: Runtime -> Secrets -> Add HF_TOKEN")
    print("Local: run `huggingface-cli login` or set HF_TOKEN env var")

TOKEN_KWARGS = {'token': HF_TOKEN} if HF_TOKEN else {}



In [None]:
# Cell 2: Configuration - BASE MODEL (not Instruct!)

# ==============================================================================
# E11-v3 STANDARD PARAMETERS
# ==============================================================================
MAX_LENGTH = 128
DTYPE = torch.bfloat16  # E11-v3: bfloat16 (NOT float16!)
USE_CHAT_TEMPLATE = False  # E11-v3: no chat template for Base model
EXPECTED_MD5 = "715065bab181f46bf12ed471951141e2"

# CRITICAL: This is the BASE model (healthy, not collapsed)
MODEL_CONFIG = {
    'name': 'meta-llama/Llama-3.1-8B',  # BASE, not Instruct!
    'display': 'LLaMA-3.1-8B-Base (HEALTHY - Control)',
    'num_layers': 32,
    'num_query_heads': 32,
    'num_kv_heads': 8,
    'd_head': 128,
    'architecture': 'GQA'
}

# Reference Values from E11-T
E11T_REFERENCE = {
    'base_specialization': 0.7134,      # HEALTHY - this is what we expect
    'instruct_specialization': 0.3115,  # Collapsed
    'base_correlation': 0.2866,         # Low correlation = specialized
    'instruct_correlation': 0.6885,     # High correlation = uniform
}

# Layer Ranges (same as E11-T-Indra for fair comparison)
LAYER_RANGES = {
    'early': (0, 11),      # Layers 0-10  (Pre-Engine)
    'middle': (11, 28),    # Layers 11-27 (Engine Room per E06d-0)
    'late': (28, 32),      # Layers 28-31 (Post-Engine)
    'all': (0, 32)         # All layers
}

# Noise Levels (same as E11-T-Indra)
NOISE_LEVELS = [0.0, 0.01, 0.02, 0.05, 0.1, 0.2]

# Standard-10 Prompt Set (canonical per NOTEBOOK_GUIDE.md §9)
STANDARD_PROMPTS = [
    "What is the capital of France and what is its population?",
    "If all roses are flowers and some flowers fade quickly, can we conclude that some roses fade quickly? Explain step by step.",
    "Calculate 47 multiplied by 23 and show your work.",
    "Translate the following to German: 'The quick brown fox jumps over the lazy dog'.",
    "Write a Python function that checks if a number is prime.",
    "Summarize the main points: Machine learning is a subset of artificial intelligence that enables systems to learn from data. It uses algorithms to identify patterns and make decisions with minimal human intervention.",
    "Statement A: 'All birds can fly.' Statement B: 'Penguins are birds that cannot fly.' Are these statements contradictory? Explain.",
    "What are the safety considerations when using a kitchen knife?",
    "Write a haiku about artificial intelligence.",
    "Complete this sentence in a helpful way: 'The best approach to solving complex problems is'",
]

# ==============================================================================
# E11-v3 PROMPT VERIFICATION
# ==============================================================================
def verify_prompts():
    """Verify Standard-10 prompts haven't been modified."""
    prompt_string = '|||'.join(STANDARD_PROMPTS)
    actual_md5 = hashlib.md5(prompt_string.encode()).hexdigest()
    return actual_md5, actual_md5 == EXPECTED_MD5

actual_md5, prompts_ok = verify_prompts()
print(f"E11-v3 Prompt Verification:")
print(f"  Expected MD5: {EXPECTED_MD5}")
print(f"  Actual MD5:   {actual_md5}")
print(f"  Status:       {'✓ VERIFIED' if prompts_ok else '✗ MISMATCH - STOP!'}")

if not prompts_ok:
    raise ValueError(f"Prompt MD5 mismatch! Expected {EXPECTED_MD5}, got {actual_md5}")

print(f"\nE11-T-Indra-B: BASE CONTROL (Artifact Check)")
print(f"\nTarget: {MODEL_CONFIG['display']}")
print(f"Expected SI: {E11T_REFERENCE['base_specialization']:.4f} (HEALTHY)")
print(f"\nThis is a CONTROL experiment:")
print(f"  If noise increases SI here too → E11-T-Indra may be artifact")
print(f"  If noise doesn't change SI → E11-T-Indra recovery is REAL")



In [None]:
# Cell 3: Specialization Metrics (from E11-T)

def extract_head_activations_with_noise(model, tokenizer, prompts, noise_injector=None, max_length=128, use_chat_template=False):
    """
    Extract per-head activation patterns, optionally with noise injection.
    Uses attention_mask to avoid PAD bias in entropy.
    """
    all_attention_patterns = []
    all_attention_masks = []

    for prompt in prompts:
        formatted = prompt
        if use_chat_template and hasattr(tokenizer, 'apply_chat_template'):
            messages = [{"role": "user", "content": prompt}]
            try:
                formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
            except Exception:
                formatted = prompt

        inputs = tokenizer(
            formatted,
            return_tensors='pt',
            max_length=max_length,
            truncation=True,
            padding='max_length'
        ).to(model.device)

        attention_mask = inputs.get('attention_mask')

        with torch.no_grad():
            outputs = model(**inputs, output_attentions=True, output_hidden_states=True)

        attn_stack = torch.stack([a.squeeze(0) for a in outputs.attentions], dim=0)
        all_attention_patterns.append(attn_stack.cpu())
        all_attention_masks.append(attention_mask.squeeze(0).cpu() if attention_mask is not None else None)

    return {
        'attention_patterns': all_attention_patterns,
        'attention_masks': all_attention_masks,
        'num_layers': len(outputs.attentions),
        'num_heads': outputs.attentions[0].shape[1]
    }


def compute_head_entropy_profiles(attention_patterns, attention_masks=None):
    """Compute normalized entropy for each head across prompts."""
    num_prompts = len(attention_patterns)
    num_layers = attention_patterns[0].shape[0]
    num_heads = attention_patterns[0].shape[1]

    all_entropies = np.zeros((num_prompts, num_layers, num_heads))

    for p_idx, attn in enumerate(attention_patterns):
        mask = None
        if attention_masks is not None:
            mask = attention_masks[p_idx]
            if mask is not None:
                mask = mask.bool()

        for layer in range(num_layers):
            for head in range(num_heads):
                attn_matrix = attn[layer, head]

                if mask is not None:
                    valid_idx = mask.nonzero(as_tuple=False).squeeze(-1)
                    if valid_idx.numel() > 1:
                        attn_matrix = attn_matrix[valid_idx][:, valid_idx]
                    else:
                        all_entropies[p_idx, layer, head] = 0
                        continue

                attn_weights = attn_matrix.mean(dim=0).float().cpu().numpy()
                denom = attn_weights.sum()
                if denom <= 0:
                    all_entropies[p_idx, layer, head] = 0
                    continue

                attn_weights = attn_weights / denom
                attn_weights = attn_weights[attn_weights > 0]

                if len(attn_weights) > 1:
                    h = scipy_entropy(attn_weights, base=2)
                    h_max = np.log2(len(attn_weights))
                    h_norm = h / h_max if h_max > 0 else 0
                else:
                    h_norm = 0

                all_entropies[p_idx, layer, head] = h_norm

    return all_entropies.mean(axis=0)


def compute_specialization_metrics(head_entropies):
    """Compute specialization metrics."""
    num_layers, num_heads = head_entropies.shape

    layer_variances = np.var(head_entropies, axis=1)
    mean_variance = float(np.mean(layer_variances))

    head_profiles = head_entropies.T
    head_corr_matrix = np.corrcoef(head_profiles)
    upper_tri = head_corr_matrix[np.triu_indices(num_heads, k=1)]
    mean_head_correlation = float(np.nanmean(upper_tri))

    specialization_index = 1.0 - mean_head_correlation

    head_contributions = np.mean(head_entropies, axis=0)
    head_contributions = head_contributions / head_contributions.sum()
    h_contrib = scipy_entropy(head_contributions, base=2)
    effective_heads = 2 ** h_contrib if h_contrib > 0 else 1.0
    effective_ratio = effective_heads / num_heads

    return {
        'mean_head_variance': mean_variance,
        'mean_head_correlation': mean_head_correlation,
        'specialization_index': specialization_index,
        'effective_heads': float(effective_heads),
        'effective_ratio': float(effective_ratio),
        'layer_variances': layer_variances.tolist(),
        'num_layers': num_layers,
        'num_heads': num_heads
    }

print("Specialization metrics functions loaded.")





In [None]:
# Cell 4: Layer-Targeted Noise Injector (identical to E11-T-Indra)

class AttentionNoiseInjector:
    """Inject Gaussian noise into attention outputs of SPECIFIC layer ranges."""
    
    def __init__(self, model, target_range, noise_std=0.0):
        self.model = model
        self.target_start, self.target_end = target_range
        self.noise_std = noise_std
        self.hooks = []
    
    def _make_hook(self, layer_idx):
        """Create a forward hook for a specific layer."""
        def hook(module, input, output):
            if self.noise_std > 0 and self.target_start <= layer_idx < self.target_end:
                if isinstance(output, tuple):
                    attn_output = output[0]
                    noise = torch.randn_like(attn_output) * self.noise_std
                    return (attn_output + noise,) + output[1:]
                else:
                    noise = torch.randn_like(output) * self.noise_std
                    return output + noise
            return output
        return hook
    
    def attach(self):
        """Attach hooks to attention layers."""
        for idx, layer in enumerate(self.model.model.layers):
            hook = layer.self_attn.register_forward_hook(self._make_hook(idx))
            self.hooks.append(hook)
    
    def detach(self):
        """Remove all hooks."""
        for hook in self.hooks:
            hook.remove()
        self.hooks = []
    
    def set_noise(self, std):
        """Update noise level."""
        self.noise_std = std

print("Attention noise injector ready.")

In [None]:
# Cell 5: Load BASE Model and Verify Healthy State (3-Seed)

print(f"\n{'='*60}")
print(f"PHASE 1: LOAD BASE MODEL AND VERIFY HEALTHY STATE")
print(f"{'='*60}")

print(f"\nLoading: {MODEL_CONFIG['name']}")
print(f"E11-v3 dtype: {DTYPE}")

tokenizer = AutoTokenizer.from_pretrained(MODEL_CONFIG['name'], **TOKEN_KWARGS)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_CONFIG['name'],
    **TOKEN_KWARGS,
    torch_dtype=DTYPE,  # E11-v3: bfloat16
    device_map='auto',
    trust_remote_code=True,
    attn_implementation="eager"  # CRITICAL: SDPA doesn't return attentions!
)

model.eval()

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

print(f"Loaded: {sum(p.numel() for p in model.parameters()) / 1e9:.2f}B parameters")
print(f"Layers: {len(model.model.layers)}")

# Measure baseline with 3-seed averaging (E11-v3 standard)
print(f"\nMeasuring baseline specialization (HEALTHY model, 3-seed average)...")

seed_results_baseline = []
for seed in SEEDS:
    torch.manual_seed(seed)
    np.random.seed(seed)
    
    baseline_activations = extract_head_activations_with_noise(
        model, tokenizer, STANDARD_PROMPTS, max_length=MAX_LENGTH, use_chat_template=USE_CHAT_TEMPLATE
    )
    baseline_entropies = compute_head_entropy_profiles(
        baseline_activations['attention_patterns'],
        baseline_activations['attention_masks']
    )
    baseline_metrics_seed = compute_specialization_metrics(baseline_entropies)
    seed_results_baseline.append(baseline_metrics_seed)
    print(f"  Seed {seed}: SI={baseline_metrics_seed['specialization_index']:.4f}")

# Average across seeds
baseline_metrics = {
    'specialization_index': np.mean([r['specialization_index'] for r in seed_results_baseline]),
    'mean_head_correlation': np.mean([r['mean_head_correlation'] for r in seed_results_baseline]),
    'mean_head_variance': np.mean([r['mean_head_variance'] for r in seed_results_baseline]),
    'si_std': np.std([r['specialization_index'] for r in seed_results_baseline])
}

print(f"\n  Baseline Specialization Index: {baseline_metrics['specialization_index']:.4f} ± {baseline_metrics['si_std']:.4f}")
print(f"  Baseline Head Correlation: {baseline_metrics['mean_head_correlation']:.4f}")
print(f"  Expected from E11-T: SI={E11T_REFERENCE['base_specialization']:.4f}")

# Verify we're in HEALTHY state
si_diff = abs(baseline_metrics['specialization_index'] - E11T_REFERENCE['base_specialization'])
if si_diff < 0.1:
    print(f"\n  VERIFIED: Model is in HEALTHY state (diff={si_diff:.4f})")
else:
    print(f"\n  WARNING: SI differs from E11-T reference by {si_diff:.4f}")

results = {
    'baseline': {
        'specialization_index': float(baseline_metrics['specialization_index']),
        'si_std': float(baseline_metrics['si_std']),
        'mean_head_correlation': float(baseline_metrics['mean_head_correlation']),
        'mean_head_variance': float(baseline_metrics['mean_head_variance']),
        'seed_results': [{'seed': s, 'si': r['specialization_index']} for s, r in zip(SEEDS, seed_results_baseline)]
    },
    'treatments': []
}



In [None]:
# Cell 6: Noise Injection on HEALTHY Model (3-Seed)

print(f"\n{'='*60}")
print(f"PHASE 2: NOISE INJECTION ON HEALTHY BASE MODEL (3-Seed Average)")
print(f"{'='*60}")

# Unified threshold (15% = artifact, aligned with verdict logic)
ARTIFACT_THRESHOLD = 0.15  # 15%
DISRUPTION_THRESHOLD = -0.05  # -5%

for region_name, (start, end) in LAYER_RANGES.items():
    print(f"\n{'='*50}")
    print(f"TESTING: {region_name.upper()} (Layers {start}-{end-1})")
    print(f"{'='*50}")
    
    region_results = {
        'region': region_name,
        'layer_range': [start, end],
        'noise_tests': []
    }
    
    for noise_std in NOISE_LEVELS:
        # E11-v3: 3-seed averaging for each noise level
        seed_si_values = []
        seed_corr_values = []
        seed_var_values = []
        
        for seed in SEEDS:
            torch.manual_seed(seed)
            np.random.seed(seed)
            
            injector = AttentionNoiseInjector(model, (start, end), noise_std=noise_std)
            injector.attach()
            
            treated_activations = extract_head_activations_with_noise(
                model, tokenizer, STANDARD_PROMPTS, max_length=MAX_LENGTH, use_chat_template=USE_CHAT_TEMPLATE
            )
            treated_entropies = compute_head_entropy_profiles(
                treated_activations['attention_patterns'],
                treated_activations['attention_masks']
            )
            treated_metrics = compute_specialization_metrics(treated_entropies)
            
            injector.detach()
            
            seed_si_values.append(treated_metrics['specialization_index'])
            seed_corr_values.append(treated_metrics['mean_head_correlation'])
            seed_var_values.append(treated_metrics['mean_head_variance'])
        
        # Average across seeds
        avg_si = np.mean(seed_si_values)
        avg_corr = np.mean(seed_corr_values)
        avg_var = np.mean(seed_var_values)
        si_std = np.std(seed_si_values)
        
        si_before = baseline_metrics['specialization_index']
        si_after = avg_si
        si_delta = si_after - si_before
        si_delta_pct = (si_delta / si_before) * 100 if si_before != 0 else 0
        
        corr_delta = avg_corr - baseline_metrics['mean_head_correlation']
        
        noise_result = {
            'noise_std': float(noise_std),
            'specialization_index': float(avg_si),
            'si_std': float(si_std),
            'mean_head_correlation': float(avg_corr),
            'mean_head_variance': float(avg_var),
            'si_delta': float(si_delta),
            'si_delta_pct': float(si_delta_pct),
            'corr_delta': float(corr_delta),
            'seed_values': {str(s): float(v) for s, v in zip(SEEDS, seed_si_values)}
        }
        region_results['noise_tests'].append(noise_result)
        
        # Status based on UNIFIED thresholds (15% artifact, -5% disruption)
        if si_delta_pct / 100 > ARTIFACT_THRESHOLD:
            status = "ARTIFACT?"
        elif si_delta_pct / 100 < DISRUPTION_THRESHOLD:
            status = "DISRUPTED"
        else:
            status = "STABLE"
        
        print(f"  σ={noise_std:.2f}: SI={avg_si:.4f}±{si_std:.4f} (Δ={si_delta:+.4f}, {si_delta_pct:+.1f}%) {status}")
    
    # Find max SI change
    max_change = max(region_results['noise_tests'], key=lambda x: abs(x['si_delta_pct']))
    region_results['max_change_noise'] = max_change['noise_std']
    region_results['max_change_pct'] = max_change['si_delta_pct']
    
    results['treatments'].append(region_results)
    
    print(f"\n  MAX CHANGE for {region_name}: σ={max_change['noise_std']:.2f} -> {max_change['si_delta_pct']:+.1f}%")


In [None]:
# Cell 7: Artifact Analysis

print(f"\n{'='*70}")
print(f"PHASE 3: ARTIFACT ANALYSIS")
print(f"{'='*70}")

# Unified thresholds
ARTIFACT_THRESHOLD_PCT = 15.0   # >15% = artifact
PARTIAL_THRESHOLD_PCT = 5.0    # 5-15% = partial artifact
DISRUPTION_THRESHOLD_PCT = -5.0  # <-5% = disruption (good for us)

print(f"\nThresholds (unified):")
print(f"  ARTIFACT:    > {ARTIFACT_THRESHOLD_PCT}%")
print(f"  PARTIAL:     {PARTIAL_THRESHOLD_PCT}% - {ARTIFACT_THRESHOLD_PCT}%")
print(f"  STABLE:      {DISRUPTION_THRESHOLD_PCT}% - {PARTIAL_THRESHOLD_PCT}%")
print(f"  DISRUPTION:  < {DISRUPTION_THRESHOLD_PCT}%")

print(f"\nBaseline (HEALTHY Base Model):")
print(f"  SI: {baseline_metrics['specialization_index']:.4f}")
print(f"  Correlation: {baseline_metrics['mean_head_correlation']:.4f}")

print(f"\n" + "-"*70)
print(f"{'Region':<12} {'Max Δ σ':<10} {'SI After':<12} {'Δ SI':<12} {'Δ %':<12} {'Status':<12}")
print("-"*70)

artifact_evidence = []
any_artifact = False
max_artifact_pct = 0

for treatment in results['treatments']:
    region = treatment['region']
    max_noise = treatment['max_change_noise']
    max_pct = treatment['max_change_pct']
    
    max_test = next(t for t in treatment['noise_tests'] if t['noise_std'] == max_noise)
    si_after = max_test['specialization_index']
    si_delta = max_test['si_delta']
    
    # Use unified thresholds
    if max_pct > ARTIFACT_THRESHOLD_PCT:
        status = "ARTIFACT!"
        artifact_evidence.append((region, max_pct, max_noise))
        any_artifact = True
    elif max_pct > PARTIAL_THRESHOLD_PCT:
        status = "PARTIAL?"
    elif max_pct < DISRUPTION_THRESHOLD_PCT:
        status = "DISRUPTED"
    else:
        status = "STABLE"
    
    if max_pct > max_artifact_pct:
        max_artifact_pct = max_pct
    
    print(f"{region:<12} {max_noise:<10.2f} {si_after:<12.4f} {si_delta:<+12.4f} {max_pct:<+12.1f} {status:<12}")

print("-"*70)

# Early @ 0.02 comparison (primary comparison with E11-T-Indra)
early_treatment = next(t for t in results['treatments'] if t['region'] == 'early')
early_at_002 = next(t for t in early_treatment['noise_tests'] if t['noise_std'] == 0.02)
early_pct_change = early_at_002['si_delta_pct']

print(f"\n{'='*70}")
print("VERDICT: IS E11-T-INDRA AN ARTIFACT?")
print(f"{'='*70}")

print(f"\n[Primary Comparison: Early @ σ=0.02]")
print(f"  E11-T-Indra (Collapsed Instruct): +28.6% SI increase")
print(f"  This Control (Healthy Base):      {early_pct_change:+.1f}% SI change")
print(f"  Gap: {28.6 - early_pct_change:.1f}pp")

print(f"\n[Secondary Check: ANY region/σ > {ARTIFACT_THRESHOLD_PCT}%]")
if artifact_evidence:
    print(f"  FOUND: {len(artifact_evidence)} artifact cases")
    for region, pct, sigma in artifact_evidence:
        print(f"    - {region} @ σ={sigma}: {pct:+.1f}%")
else:
    print(f"  None found (all regions < {ARTIFACT_THRESHOLD_PCT}%)")

# Verdict logic using BOTH checks
print(f"\n{'='*70}")

# Primary verdict based on Early@0.02 (direct comparison)
if early_pct_change > ARTIFACT_THRESHOLD_PCT:
    primary_verdict = "ARTIFACT_LIKELY"
elif early_pct_change > PARTIAL_THRESHOLD_PCT:
    primary_verdict = "PARTIAL_ARTIFACT"
elif early_pct_change < DISRUPTION_THRESHOLD_PCT:
    primary_verdict = "REAL_STRONGLY_CONFIRMED"
else:
    primary_verdict = "REAL_CONFIRMED"

# Secondary verdict based on ANY artifact
if any_artifact:
    secondary_concern = True
else:
    secondary_concern = False

# Combined verdict
if primary_verdict == "ARTIFACT_LIKELY" or (secondary_concern and max_artifact_pct > 20):
    verdict = "ARTIFACT_LIKELY"
    print(f"\n  VERDICT: {verdict}")
    print(f"  Noise inflates SI even in healthy models.")
    print(f"  E11-T-Indra 'recovery' may be measurement artifact!")
elif primary_verdict == "PARTIAL_ARTIFACT" or secondary_concern:
    verdict = "PARTIAL_ARTIFACT"
    print(f"\n  VERDICT: {verdict}")
    print(f"  Noise has some effect on healthy SI.")
    print(f"  E11-T-Indra recovery is PARTIALLY real, partially artifact.")
elif primary_verdict == "REAL_STRONGLY_CONFIRMED":
    verdict = "REAL_STRONGLY_CONFIRMED"
    print(f"\n  VERDICT: {verdict}")
    print(f"  Noise DECREASES healthy SI (disrupts existing specialization)!")
    print(f"  E11-T-Indra recovery on collapsed model is STRONGLY CONFIRMED REAL!")
    print(f"  The collapsed model has latent specialization capacity that noise unlocks.")
else:
    verdict = "REAL_CONFIRMED"
    print(f"\n  VERDICT: {verdict}")
    print(f"  Noise does NOT significantly inflate healthy SI.")
    print(f"  E11-T-Indra recovery on collapsed model is REAL!")

results['verdict'] = {
    'code': verdict,
    'baseline_si': baseline_metrics['specialization_index'],
    'early_002_change_pct': early_pct_change,
    'max_artifact_pct': max_artifact_pct,
    'artifact_evidence': artifact_evidence,
    'any_artifact_above_threshold': any_artifact,
    'thresholds': {
        'artifact': ARTIFACT_THRESHOLD_PCT,
        'partial': PARTIAL_THRESHOLD_PCT,
        'disruption': DISRUPTION_THRESHOLD_PCT
    },
    'comparison': {
        'e11t_indra_early_002': 28.6,
        'control_early_002': early_pct_change,
        'gap_pp': 28.6 - early_pct_change
    }
}

print(f"\n{'='*70}")

In [None]:
# Cell 8: Visualization

fig, axes = plt.subplots(2, 2, figsize=(16, 14))

colors = {
    'early': '#3498db',
    'middle': '#2ecc71',
    'late': '#e74c3c',
    'all': '#9b59b6'
}

# Plot 1: SI Change % by Region
ax1 = axes[0, 0]
regions = [t['region'] for t in results['treatments']]
max_changes = [t['max_change_pct'] for t in results['treatments']]
bar_colors = [colors[r] for r in regions]

bars = ax1.bar(regions, max_changes, color=bar_colors, alpha=0.8, edgecolor='black')
ax1.axhline(y=0, color='black', linestyle='-', linewidth=1)
ax1.axhline(y=10, color='red', linestyle=':', alpha=0.7, label='Artifact threshold (10%)')
ax1.axhline(y=-5, color='green', linestyle=':', alpha=0.7, label='Disruption threshold (-5%)')
ax1.set_ylabel('Max SI Change %')
ax1.set_title('Healthy Base: Max SI Change Under Noise\n(Should be near 0 if E11-T-Indra is real)')
ax1.legend()

for bar, change in zip(bars, max_changes):
    ax1.annotate(f'{change:+.1f}%', xy=(bar.get_x() + bar.get_width()/2, change),
                 xytext=(0, 5 if change > 0 else -15), textcoords='offset points',
                 ha='center', fontsize=11, fontweight='bold')

# Plot 2: Comparison with E11-T-Indra
ax2 = axes[0, 1]
comparison_regions = ['Early (σ=0.02)']
e11t_indra_values = [28.6]  # From E11-T-Indra results
control_values = [results['verdict']['comparison']['control_early_002']]

x = np.arange(len(comparison_regions))
width = 0.35

bars1 = ax2.bar(x - width/2, e11t_indra_values, width, label='E11-T-Indra (Collapsed)', color='#e74c3c', alpha=0.8)
bars2 = ax2.bar(x + width/2, control_values, width, label='Control (Healthy)', color='#3498db', alpha=0.8)

ax2.set_ylabel('SI Change %')
ax2.set_title('Critical Comparison: Collapsed vs Healthy\n(Large gap = E11-T-Indra is REAL)')
ax2.set_xticks(x)
ax2.set_xticklabels(comparison_regions)
ax2.legend()
ax2.axhline(y=0, color='black', linestyle='-', linewidth=1)

for bar in bars1:
    ax2.annotate(f'{bar.get_height():.1f}%', xy=(bar.get_x() + bar.get_width()/2, bar.get_height()),
                 xytext=(0, 5), textcoords='offset points', ha='center', fontsize=11, fontweight='bold')
for bar in bars2:
    ax2.annotate(f'{bar.get_height():.1f}%', xy=(bar.get_x() + bar.get_width()/2, bar.get_height()),
                 xytext=(0, 5), textcoords='offset points', ha='center', fontsize=11, fontweight='bold')

# Plot 3: Dose-Response on Healthy Base
ax3 = axes[1, 0]
for treatment in results['treatments']:
    region = treatment['region']
    noise_levels = [t['noise_std'] for t in treatment['noise_tests']]
    si_values = [t['specialization_index'] for t in treatment['noise_tests']]
    ax3.plot(noise_levels, si_values, 'o-', color=colors[region], 
             label=region.capitalize(), linewidth=2, markersize=8)

ax3.axhline(y=baseline_metrics['specialization_index'], color='green', linestyle='--', 
            label=f'Baseline ({baseline_metrics["specialization_index"]:.3f})')
ax3.set_xlabel('Noise Level (σ)')
ax3.set_ylabel('Specialization Index')
ax3.set_title('Dose-Response: Healthy Base Under Noise\n(Flat = No artifact, Increasing = Artifact)')
ax3.legend()
ax3.grid(True, alpha=0.3)

# Plot 4: SI Change Heatmap
ax4 = axes[1, 1]
region_order = ['early', 'middle', 'late', 'all']
heatmap_data = []
for region in region_order:
    treatment = next(t for t in results['treatments'] if t['region'] == region)
    row = [t['si_delta_pct'] for t in treatment['noise_tests']]
    heatmap_data.append(row)

heatmap_data = np.array(heatmap_data)

im = ax4.imshow(heatmap_data, cmap='RdYlGn_r', aspect='auto', vmin=-20, vmax=20)
ax4.set_xticks(range(len(NOISE_LEVELS)))
ax4.set_xticklabels([f'σ={n:.2f}' for n in NOISE_LEVELS])
ax4.set_yticks(range(len(region_order)))
ax4.set_yticklabels([r.capitalize() for r in region_order])
ax4.set_xlabel('Noise Level')
ax4.set_ylabel('Target Region')
ax4.set_title('SI Change % Heatmap (Healthy Base)\n(Green = Stable/Decrease, Red = Increase = Artifact)')

for i in range(len(region_order)):
    for j in range(len(NOISE_LEVELS)):
        val = heatmap_data[i, j]
        color = 'white' if abs(val) > 10 else 'black'
        ax4.text(j, i, f'{val:+.1f}%', ha='center', va='center', color=color, fontsize=9)

plt.colorbar(im, ax=ax4, label='SI Change %')

plt.tight_layout()
fig_path = f'figures/E11T_indra_B_base_control_{TIMESTAMP}.png'
plt.savefig(fig_path, dpi=150, bbox_inches='tight')
plt.show()

print(f"\nFigure saved: {fig_path}")

In [None]:
# Cell 9: Save Results

def convert_to_native(obj):
    if isinstance(obj, dict):
        return {k: convert_to_native(v) for k, v in obj.items()}
    elif isinstance(obj, list):
        return [convert_to_native(v) for v in obj]
    elif isinstance(obj, tuple):
        return tuple(convert_to_native(v) for v in obj)
    elif isinstance(obj, (np.bool_, np.integer)):
        return int(obj)
    elif isinstance(obj, np.floating):
        return float(obj)
    elif isinstance(obj, np.ndarray):
        return obj.tolist()
    else:
        return obj

filename = f'results/E11T_indra_B_base_control_{TIMESTAMP}.json'

output = {
    'experiment': 'E11-T-Indra-B',
    'timestamp': TIMESTAMP,
    'model': MODEL_CONFIG['name'],
    'model_type': 'BASE (healthy, not collapsed)',
    'architecture': 'GQA',
    'purpose': 'Artifact check - does noise inflate SI in healthy models?',
    
    # E11-v3 Methodology Block (REQUIRED)
    'methodology': {
        'standard': 'E11-v3',
        'seeds': SEEDS,
        'max_length': MAX_LENGTH,
        'dtype': str(DTYPE),
        'prompt_md5': actual_md5,
        'prompt_md5_verified': prompts_ok,
        'use_chat_template': USE_CHAT_TEMPLATE,
        'attention_masked': True,
        'num_prompts': len(STANDARD_PROMPTS),
        'prompt_set': 'Standard-10'
    },
    
    'e11t_reference': E11T_REFERENCE,
    'layer_ranges': {k: list(v) for k, v in LAYER_RANGES.items()},
    'noise_levels': NOISE_LEVELS,
    'results': convert_to_native(results)
}

with open(filename, 'w') as f:
    json.dump(output, f, indent=2)

print(f"Results saved: {filename}")

try:
    from google.colab import files
    files.download(filename)
    files.download(fig_path)
except:
    pass


---

## Summary

### E11-T-Indra-B: Base Control (Artifact Check)

**Purpose:**
Determine if E11-T-Indra's "recovery" is real or a measurement artifact.

**Method:**
Run identical noise injection protocol on HEALTHY Base model (SI=0.7134).

**Unified Thresholds:**

| Base SI Change | Status | Interpretation |
|----------------|--------|----------------|
| **< 5%** | STABLE | E11-T-Indra is **REAL** |
| **5-15%** | PARTIAL | Some artifact, partially real |
| **> 15%** | ARTIFACT | E11-T-Indra may be fake |
| **< -5%** | DISRUPTED | **STRONGLY REAL** (noise hurts healthy specialization) |

**Dual-Check Approach:**
1. Primary: Early @ σ=0.02 (direct comparison)
2. Secondary: ANY region/σ > 15% (catches hidden artifacts)

**The Key Comparison:**
- E11-T-Indra (Collapsed): +28.6% SI at Early/σ=0.02
- This Control (Healthy): ??? % SI at Early/σ=0.02

If the gap is large (Collapsed >> Healthy), E11-T-Indra recovery is REAL.

---

*Paper 4: Behavioral Sink Dynamics*
*E11-T-Indra-B: Base Control Experiment*

In [None]:
# ============================================================================
# AUTO-DOWNLOAD RESULTS (Colab only)
# ============================================================================
import glob
import shutil

def auto_download_results():
    try:
        from google.colab import files
    except ImportError:
        print('Not in Colab - skipping auto-download')
        return
    
    print('=' * 60)
    print('AUTO-DOWNLOADING RESULTS...')
    print('=' * 60)
    
    # Find all result files
    json_files = glob.glob('results/*.json') + glob.glob('figures/*.json')
    png_files = glob.glob('results/*.png') + glob.glob('figures/*.png')
    all_files = json_files + png_files
    
    if not all_files:
        print('WARNING: No result files found!')
        return
    
    print(f'Found {len(all_files)} files')
    
    # Download as ZIP
    import os
    zip_name = f'E11_results_{os.path.basename(os.getcwd())}'
    
    # Create combined folder
    os.makedirs('download_package', exist_ok=True)
    for f in all_files:
        shutil.copy(f, 'download_package/')
    
    shutil.make_archive(zip_name, 'zip', 'download_package')
    print(f'Downloading: {zip_name}.zip')
    files.download(f'{zip_name}.zip')
    print('DOWNLOAD COMPLETE!')

auto_download_results()