# E11: Territorial Collapse - Yi-1.5 (2nd MHA Family)

**Paper 4: Behavioral Sink Dynamics**

## Purpose

This notebook tests Territorial Collapse on **Yi-1.5-9B** (01.AI) to strengthen claim A1:

> "Territorial collapse is architecture × alignment dependent: MHA responds to alignment method (DPO/SFT protect, RLHF-only collapses), GQA shows structural collapse, MQA is pre-collapsed."

## Model Pair (M06)

| Role | Model | Notes |
|------|-------|-------|
| Base | 01-ai/Yi-1.5-9B | Pure MHA, Chinese vendor |
| Instruct | 01-ai/Yi-1.5-9B-Chat | RLHF aligned |

## E12-P Result: C_DELAYED

Yi-1.5 **COLLAPSED** under corporate pressure (unlike Qwen2).
This tests whether structural collapse correlates with behavioral vulnerability.

## Cross-Family Comparison

| MHA Model | Vendor | E12-P | E11 Expected |
|-----------|--------|-------|-------------|
| Mistral | Mistral AI | C_DELAYED | SI ↑ (done) |
| Pythia | EleutherAI | pending | SI ↑ (expected) |
| **Yi-1.5** | **01.AI** | **C_DELAYED** | **SI ↑ ?** |

---

In [None]:
# Cell 1: Setup
!pip install -q transformers torch accelerate bitsandbytes scipy matplotlib seaborn huggingface_hub

import torch
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from transformers import AutoModelForCausalLM, AutoTokenizer
from scipy.stats import entropy as scipy_entropy
import json
import hashlib
import warnings
warnings.filterwarnings('ignore')

import os
from pathlib import Path
from datetime import datetime

# ============ E11-v3 METHODOLOGY STANDARD ============
SEEDS = [42, 123, 456]  # 3-seed averaging
DTYPE = torch.bfloat16  # Standardized precision
EXPECTED_MD5 = "715065bab181f46bf12ed471951141e2"  # Standard-10 v3
USE_CHAT_TEMPLATE = True  # Instruct models use chat template

def verify_prompts(prompts):
    """Verify Standard-10 prompts via MD5."""
    combined = '|||'.join(prompts)  # Canonical delimiter for MD5
    actual_md5 = hashlib.md5(combined.encode()).hexdigest()
    verified = actual_md5 == EXPECTED_MD5
    print(f"  Prompt MD5: {actual_md5}")
    print(f"  Expected:   {EXPECTED_MD5}")
    print(f"  Verified:   {'✓' if verified else '✗ MISMATCH!'}")
    return verified, actual_md5

os.environ['PYTHONHASHSEED'] = '42'
torch.manual_seed(42)
np.random.seed(42)

TIMESTAMP = datetime.now().strftime('%Y%m%d_%H%M%S')
Path('results').mkdir(parents=True, exist_ok=True)
Path('figures').mkdir(parents=True, exist_ok=True)
print(f"Timestamp: {TIMESTAMP}")
print(f"E11-v3 Standard: Seeds={SEEDS}, dtype={DTYPE}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")


In [None]:
# Cell 2: Configuration

PAIR = 'yi15'
PAIR_ID = 'M06'

TWIN_PAIRS = {
    'yi15': {
        'base': '01-ai/Yi-1.5-9B',
        'instruct': '01-ai/Yi-1.5-9B-Chat',
        'params': '9B',
        'heads': 32,
        'd_head': 128,
        'rho': 0.25,
        'arch': 'MHA',
        'alignment': 'RLHF',
        'vendor': '01.AI',
        'e12p_result': 'C_DELAYED',
        'note': '2nd MHA family + Chinese vendor that COLLAPSED'
    }
}

MAX_LENGTH = 128  # E11-v3 Standard

# ============ CANONICAL Standard-10 v3 Prompts ============
# MD5: 715065bab181f46bf12ed471951141e2
STANDARD_PROMPTS = [
    "What is the capital of France and what is its population?",
    "If all roses are flowers and some flowers fade quickly, can we conclude that some roses fade quickly? Explain step by step.",
    "Calculate 47 multiplied by 23 and show your work.",
    "Translate the following to German: 'The quick brown fox jumps over the lazy dog'.",
    "Write a Python function that checks if a number is prime.",
    "Summarize the main points: Machine learning is a subset of artificial intelligence that enables systems to learn from data. It uses algorithms to identify patterns and make decisions with minimal human intervention.",
    "Statement A: 'All birds can fly.' Statement B: 'Penguins are birds that cannot fly.' Are these statements contradictory? Explain.",
    "What are the safety considerations when using a kitchen knife?",
    "Write a haiku about artificial intelligence.",
    "Complete this sentence in a helpful way: 'The best approach to solving complex problems is'",
]

# Verify prompts
print("Verifying Standard-10 prompts...")
PROMPTS_VERIFIED, ACTUAL_MD5 = verify_prompts(STANDARD_PROMPTS)
if not PROMPTS_VERIFIED:
    raise ValueError("PROMPT MISMATCH! Check Standard-10 v3 canonical prompts.")

print(f"\nTesting: {PAIR} ({PAIR_ID})")
print(f"Arch: {TWIN_PAIRS[PAIR]['arch']}")
print(f"E12-P Result: {TWIN_PAIRS[PAIR]['e12p_result']}")
print(f"\nE11-v3 Config: MAX_LENGTH={MAX_LENGTH}, dtype={DTYPE}, seeds={SEEDS}")

In [None]:
# Cell 3: Metrics Functions (E11-v3 masked)

def extract_head_activations(model, tokenizer, prompts, max_length=128, use_chat_template=False):
    all_attention_patterns = []
    all_attention_masks = []
    for prompt in prompts:
        formatted = prompt
        if use_chat_template and hasattr(tokenizer, 'apply_chat_template'):
            messages = [{"role": "user", "content": prompt}]
            try:
                formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
            except Exception:
                formatted = prompt

        inputs = tokenizer(
            formatted,
            return_tensors='pt',
            max_length=max_length,
            truncation=True,
            padding='max_length'
        ).to(model.device)

        attention_mask = inputs.get('attention_mask')

        with torch.no_grad():
            outputs = model(**inputs, output_attentions=True)
        attn_stack = torch.stack([a.squeeze(0) for a in outputs.attentions], dim=0)
        all_attention_patterns.append(attn_stack.cpu())
        all_attention_masks.append(attention_mask.squeeze(0).cpu() if attention_mask is not None else None)
    return {
        'attention_patterns': all_attention_patterns,
        'attention_masks': all_attention_masks,
        'num_layers': len(outputs.attentions),
        'num_heads': outputs.attentions[0].shape[1]
    }


def compute_head_entropy_profiles(attention_patterns, attention_masks=None):
    num_prompts = len(attention_patterns)
    num_layers = attention_patterns[0].shape[0]
    num_heads = attention_patterns[0].shape[1]
    all_entropies = np.zeros((num_prompts, num_layers, num_heads))
    for p_idx, attn in enumerate(attention_patterns):
        mask = None
        if attention_masks is not None:
            mask = attention_masks[p_idx]
            if mask is not None:
                mask = mask.bool()
        for layer in range(num_layers):
            for head in range(num_heads):
                attn_matrix = attn[layer, head]
                if mask is not None:
                    valid_idx = mask.nonzero(as_tuple=False).squeeze(-1)
                    if valid_idx.numel() > 1:
                        attn_matrix = attn_matrix[valid_idx][:, valid_idx]
                    else:
                        all_entropies[p_idx, layer, head] = 0
                        continue
                attn_weights = attn_matrix.mean(dim=0).float().cpu().numpy()
                denom = attn_weights.sum()
                if denom <= 0:
                    all_entropies[p_idx, layer, head] = 0
                    continue
                attn_weights = attn_weights / denom
                attn_weights = attn_weights[attn_weights > 0]
                if len(attn_weights) > 1:
                    h = scipy_entropy(attn_weights, base=2)
                    h_max = np.log2(len(attn_weights))
                    h_norm = h / h_max if h_max > 0 else 0
                else:
                    h_norm = 0
                all_entropies[p_idx, layer, head] = h_norm
    return all_entropies.mean(axis=0)


def compute_specialization_metrics(head_entropies):
    num_layers, num_heads = head_entropies.shape
    layer_variances = np.var(head_entropies, axis=1)
    mean_variance = float(np.mean(layer_variances))
    head_profiles = head_entropies.T
    head_corr_matrix = np.corrcoef(head_profiles)
    upper_tri = head_corr_matrix[np.triu_indices(num_heads, k=1)]
    mean_head_correlation = float(np.nanmean(upper_tri))
    specialization_index = 1.0 - mean_head_correlation
    head_contributions = np.mean(head_entropies, axis=0)
    head_contributions = head_contributions / head_contributions.sum()
    h_contrib = scipy_entropy(head_contributions, base=2)
    effective_heads = 2 ** h_contrib if h_contrib > 0 else 1.0
    effective_ratio = effective_heads / num_heads
    return {
        'mean_head_variance': mean_variance,
        'mean_head_correlation': mean_head_correlation,
        'head_correlation_matrix': head_corr_matrix.tolist(),
        'specialization_index': specialization_index,
        'effective_heads': float(effective_heads),
        'effective_ratio': float(effective_ratio),
        'layer_variances': layer_variances.tolist(),
        'num_layers': num_layers,
        'num_heads': num_heads
    }

print("Metrics functions loaded.")




In [None]:
# Cell 4: Load BASE Model - 3-Seed Averaging

pair_config = TWIN_PAIRS[PAIR]
results = {'pair': PAIR, 'pair_id': PAIR_ID, 'base': {}, 'instruct': {}, 'config': pair_config}

print(f"\n{'='*60}")
print(f"E11: {PAIR.upper()} ({PAIR_ID}) - {pair_config['arch']} - E11-v3")
print(f"{'='*60}")

print(f"\n[1/4] Loading BASE: {pair_config['base']}")
tokenizer_base = AutoTokenizer.from_pretrained(pair_config['base'], trust_remote_code=True)
model_base = AutoModelForCausalLM.from_pretrained(
    pair_config['base'],
    torch_dtype=DTYPE, device_map='auto',
    trust_remote_code=True, attn_implementation="eager"
)
model_base.eval()
if tokenizer_base.pad_token is None:
    tokenizer_base.pad_token = tokenizer_base.eos_token

print(f"[2/4] Extracting BASE activations (3-seed averaging)...")

# 3-seed averaging for E11-v3
base_seed_results = []
for seed in SEEDS:
    print(f"  Seed {seed}...")
    torch.manual_seed(seed)
    np.random.seed(seed)
    
    base_activations = extract_head_activations(model_base, tokenizer_base, STANDARD_PROMPTS, MAX_LENGTH, use_chat_template=USE_CHAT_TEMPLATE)
    base_entropies = compute_head_entropy_profiles(base_activations['attention_patterns'], base_activations['attention_masks'])
    base_metrics = compute_specialization_metrics(base_entropies)
    base_seed_results.append({
        'seed': seed,
        'si': base_metrics['specialization_index'],
        'corr': base_metrics['mean_head_correlation'],
        'var': base_metrics['mean_head_variance']
    })
    print(f"    SI={base_metrics['specialization_index']:.4f}")

# Average across seeds
avg_base_si = np.mean([r['si'] for r in base_seed_results])
std_base_si = np.std([r['si'] for r in base_seed_results])

print(f"\n  Layers: {base_activations['num_layers']}, Heads: {base_activations['num_heads']}")
print(f"  BASE SI: {avg_base_si:.4f} ± {std_base_si:.6f}")

# Use last run's full metrics but update SI with average
results['base']['specialization'] = base_metrics
results['base']['specialization']['specialization_index'] = avg_base_si
results['base']['specialization']['si_std'] = std_base_si
results['base']['seed_results'] = base_seed_results
results['base']['entropies'] = base_entropies.tolist()

del model_base
torch.cuda.empty_cache()
print("  [Memory cleared]")

In [None]:
# Cell 5: Load INSTRUCT Model - 3-Seed Averaging

print(f"\n[3/4] Loading INSTRUCT: {pair_config['instruct']}")
tokenizer_inst = AutoTokenizer.from_pretrained(pair_config['instruct'], trust_remote_code=True)
model_inst = AutoModelForCausalLM.from_pretrained(
    pair_config['instruct'],
    torch_dtype=DTYPE, device_map='auto',
    trust_remote_code=True, attn_implementation="eager"
)
model_inst.eval()
if tokenizer_inst.pad_token is None:
    tokenizer_inst.pad_token = tokenizer_inst.eos_token

print(f"[4/4] Extracting INSTRUCT activations (3-seed averaging)...")

# 3-seed averaging for E11-v3
inst_seed_results = []
for seed in SEEDS:
    print(f"  Seed {seed}...")
    torch.manual_seed(seed)
    np.random.seed(seed)
    
    inst_activations = extract_head_activations(model_inst, tokenizer_inst, STANDARD_PROMPTS, MAX_LENGTH, use_chat_template=USE_CHAT_TEMPLATE)
    inst_entropies = compute_head_entropy_profiles(inst_activations['attention_patterns'], inst_activations['attention_masks'])
    inst_metrics = compute_specialization_metrics(inst_entropies)
    inst_seed_results.append({
        'seed': seed,
        'si': inst_metrics['specialization_index'],
        'corr': inst_metrics['mean_head_correlation'],
        'var': inst_metrics['mean_head_variance']
    })
    print(f"    SI={inst_metrics['specialization_index']:.4f}")

# Average across seeds
avg_inst_si = np.mean([r['si'] for r in inst_seed_results])
std_inst_si = np.std([r['si'] for r in inst_seed_results])

print(f"\n  INSTRUCT SI: {avg_inst_si:.4f} ± {std_inst_si:.6f}")

# Use last run's full metrics but update SI with average
results['instruct']['specialization'] = inst_metrics
results['instruct']['specialization']['specialization_index'] = avg_inst_si
results['instruct']['specialization']['si_std'] = std_inst_si
results['instruct']['seed_results'] = inst_seed_results
results['instruct']['entropies'] = inst_entropies.tolist()

del model_inst
torch.cuda.empty_cache()
print("  [Memory cleared]")

In [None]:
# Cell 6: Analysis

base_spec = results['base']['specialization']
inst_spec = results['instruct']['specialization']

base_si = base_spec['specialization_index']
inst_si = inst_spec['specialization_index']
delta_si = inst_si - base_si

base_corr = base_spec['mean_head_correlation']
inst_corr = inst_spec['mean_head_correlation']
delta_corr = inst_corr - base_corr

base_var = base_spec['mean_head_variance']
inst_var = inst_spec['mean_head_variance']
delta_var = inst_var - base_var

print(f"\n{'='*70}")
print(f"E11 RESULTS: {PAIR.upper()} ({PAIR_ID}) - {pair_config['arch']}")
print(f"{'='*70}")
print(f"\n{'Metric':<30} {'BASE':>12} {'INSTRUCT':>12} {'Delta':>12}")
print("-" * 70)
print(f"{'Specialization Index':<30} {base_si:>12.4f} {inst_si:>12.4f} {delta_si:>+12.4f}")
print(f"{'Mean Head Correlation':<30} {base_corr:>12.4f} {inst_corr:>12.4f} {delta_corr:>+12.4f}")
print(f"{'Mean Head Variance':<30} {base_var:>12.6f} {inst_var:>12.6f} {delta_var:>+12.6f}")

# Verdict
collapse_1 = delta_si < 0
collapse_2 = delta_corr > 0
collapse_3 = delta_var < 0
collapse_count = sum([collapse_1, collapse_2, collapse_3])

if collapse_count >= 2:
    verdict = "A_CONFIRMED"
elif collapse_count == 1:
    verdict = "B_PARTIAL"
else:
    verdict = "C_REFUTED"

print(f"\n{'='*70}")
print(f"VERDICT: {verdict}")
print(f"{'='*70}")

if verdict == "C_REFUTED":
    print("MHA architecture PRESERVES specialization under RLHF!")
    print(f"\nCross-family check:")
    print(f"  Mistral (MHA): SI Δ = +0.0420 (INCREASES)")
    print(f"  Yi-1.5  (MHA): SI Δ = {delta_si:+.4f}")
    if delta_si > 0:
        print(f"\n>>> CONSISTENT! A1 claim STRENGTHENED.")
else:
    print("Unexpected: MHA showing collapse pattern!")

results['verdict'] = {
    'code': verdict,
    'delta_si': delta_si,
    'delta_corr': delta_corr,
    'delta_var': delta_var
}


In [None]:
# Cell 7: Visualization

fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Plot 1: SI Comparison
ax1 = axes[0]
models = ['Base', 'Instruct']
si_vals = [base_si, inst_si]
colors = ['#2ecc71', '#e74c3c']
bars = ax1.bar(models, si_vals, color=colors, alpha=0.8, edgecolor='black')
ax1.set_ylabel('Specialization Index')
ax1.set_title(f'{PAIR.upper()} ({pair_config["arch"]}): SI\nΔ = {delta_si:+.4f}')
ax1.set_ylim(0, 1)

# Plot 2: Layer-wise Variance
ax2 = axes[1]
layers = range(len(base_spec['layer_variances']))
ax2.plot(layers, base_spec['layer_variances'], 'o-', color='#2ecc71', label='Base')
ax2.plot(layers, inst_spec['layer_variances'], 's-', color='#e74c3c', label='Instruct')
ax2.set_xlabel('Layer')
ax2.set_ylabel('Head Variance')
ax2.set_title(f'{PAIR.upper()}: Layer-wise Variance')
ax2.legend()
ax2.grid(True, alpha=0.3)

# Plot 3: Correlation Heatmap Diff
ax3 = axes[2]
base_corr_mat = np.array(base_spec['head_correlation_matrix'])
inst_corr_mat = np.array(inst_spec['head_correlation_matrix'])
diff_mat = inst_corr_mat - base_corr_mat
sns.heatmap(diff_mat, cmap='RdBu_r', center=0, ax=ax3, cbar_kws={'label': 'Δ Correlation'})
ax3.set_title(f'{PAIR.upper()}: Correlation Change\n(Instruct - Base)')

plt.tight_layout()
fig_path = f'figures/E11_{PAIR}_territorial_{TIMESTAMP}.png'
plt.savefig(fig_path, dpi=150, bbox_inches='tight')
plt.show()
print(f"Saved: {fig_path}")

In [None]:
# Cell 8: Save Results

def convert_to_native(obj):
    if isinstance(obj, dict):
        return {k: convert_to_native(v) for k, v in obj.items()}
    elif isinstance(obj, list):
        return [convert_to_native(v) for v in obj]
    elif isinstance(obj, (np.bool_, np.integer)):
        return int(obj)
    elif isinstance(obj, np.floating):
        return float(obj)
    elif isinstance(obj, np.ndarray):
        return obj.tolist()
    return obj

output = {
    'experiment': 'E11_Territorial_Collapse',
    'timestamp': TIMESTAMP,
    'pair': PAIR,
    'pair_id': PAIR_ID,
    'config': pair_config,
    'purpose': '2nd MHA family for A1 claim robustness',
    'e12p_result': pair_config['e12p_result'],
    # E11-v3 Methodology Block
    'methodology': {
        'standard': 'E11-v3',
        'seeds': SEEDS,
        'max_length': MAX_LENGTH,
        'dtype': str(DTYPE),
        'prompt_md5': ACTUAL_MD5,
        'prompt_md5_verified': PROMPTS_VERIFIED,
        'use_chat_template': USE_CHAT_TEMPLATE,
        'attention_masked': True,
        'num_prompts': len(STANDARD_PROMPTS),
        'prompt_set': 'Standard-10 v3',
        'quantization': 'NONE (Full Precision bfloat16)',
        'use_chat_template': False
    },
    'results': convert_to_native(results)
}

filename = f'results/E11_{PAIR}_territorial_{TIMESTAMP}.json'
with open(filename, 'w') as f:
    json.dump(output, f, indent=2)
print(f"Saved: {filename}")

print(f"\n📋 E11-v3 Compliance:")
print(f"   Seeds: {SEEDS} ✓")
print(f"   dtype: {DTYPE} ✓")
print(f"   MD5: {ACTUAL_MD5} {'✓' if PROMPTS_VERIFIED else '✗'}")
print(f"   MAX_LENGTH: {MAX_LENGTH} ✓")

try:
    from google.colab import files
    files.download(filename)
    files.download(fig_path)
except:
    pass


In [None]:
# ============================================================================
# AUTO-DOWNLOAD RESULTS (Colab only)
# ============================================================================
import glob
import shutil

def auto_download_results():
    try:
        from google.colab import files
    except ImportError:
        print('Not in Colab - skipping auto-download')
        return
    
    print('=' * 60)
    print('AUTO-DOWNLOADING RESULTS...')
    print('=' * 60)
    
    # Find all result files
    json_files = glob.glob('results/*.json') + glob.glob('figures/*.json')
    png_files = glob.glob('results/*.png') + glob.glob('figures/*.png')
    all_files = json_files + png_files
    
    if not all_files:
        print('WARNING: No result files found!')
        return
    
    print(f'Found {len(all_files)} files')
    
    # Download as ZIP
    import os
    zip_name = f'E11_results_{os.path.basename(os.getcwd())}'
    
    # Create combined folder
    os.makedirs('download_package', exist_ok=True)
    for f in all_files:
        shutil.copy(f, 'download_package/')
    
    shutil.make_archive(zip_name, 'zip', 'download_package')
    print(f'Downloading: {zip_name}.zip')
    files.download(f'{zip_name}.zip')
    print('DOWNLOAD COMPLETE!')

auto_download_results()