# E11: Territorial Collapse - Falcon (MQA Control)

**Paper 4: Behavioral Sink Dynamics**

## Purpose

This notebook tests Territorial Collapse on **Falcon-7B** (TII) as an **MQA Control**:

> "Does Multi-Query Attention (MQA) behave like MHA or GQA under alignment pressure?"

**IMPORTANT:** Falcon uses **MQA** (Multi-Query Attention), NOT MHA!
- 71 Query Heads, but only **1 shared K/V Head**
- This is a distinct architecture from both MHA and GQA

## Model Pair (M08)

| Role | Model | Notes |
|------|-------|-------|
| Base | tiiuae/falcon-7b | MQA (71 Q-heads, 1 KV-head) |
| Instruct | tiiuae/falcon-7b-instruct | SFT-only (no RLHF!) |

## Hypothesis

MQA shares K/V across all heads â†’ structurally similar to extreme GQA (1 group).
- If Falcon shows SIâ†“ (like GQA): MQA behaves like GQA
- If Falcon shows SIâ†‘ (like MHA): Query diversity dominates
- If Falcon shows no change: Shared K/V prevents collapse

---

In [None]:
# Cell 1: Setup
!pip install -q transformers torch accelerate bitsandbytes scipy matplotlib seaborn

import torch
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from transformers import AutoModelForCausalLM, AutoTokenizer
from scipy.stats import entropy as scipy_entropy
from scipy.stats import pearsonr, spearmanr
from scipy.spatial.distance import pdist, squareform
import json
import hashlib
import warnings
warnings.filterwarnings('ignore')

import os
from pathlib import Path
from datetime import datetime

# ============ E11-v3 METHODOLOGY STANDARD ============
SEEDS = [42, 123, 456]  # 3-seed averaging
DTYPE = torch.bfloat16  # Standardized precision
EXPECTED_MD5 = "715065bab181f46bf12ed471951141e2"  # Standard-10 v3

def verify_prompts(prompts):
    """Verify Standard-10 prompts via MD5."""
    combined = '|||'.join(prompts)  # Canonical delimiter for MD5
    actual_md5 = hashlib.md5(combined.encode()).hexdigest()
    verified = actual_md5 == EXPECTED_MD5
    print(f"  Prompt MD5: {actual_md5}")
    print(f"  Expected:   {EXPECTED_MD5}")
    print(f"  Verified:   {'âœ“' if verified else 'âœ— MISMATCH!'}")
    return verified, actual_md5

# Set initial seed
os.environ['PYTHONHASHSEED'] = '42'
torch.manual_seed(42)
np.random.seed(42)

TIMESTAMP = datetime.now().strftime('%Y%m%d_%H%M%S')
Path('results').mkdir(parents=True, exist_ok=True)
Path('figures').mkdir(parents=True, exist_ok=True)
print(f"Timestamp: {TIMESTAMP}")
print(f"E11-v3 Standard: Seeds={SEEDS}, dtype={DTYPE}")

print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

In [None]:
# Cell 2: Configuration

# M08 Falcon Pair - MQA CONTROL (NOT MHA!)
TWIN_PAIRS = {
    'falcon': {
        'base': 'tiiuae/falcon-7b',
        'instruct': 'tiiuae/falcon-7b-instruct',  # SFT-only, no RLHF!
        'params': '7B',
        'num_attention_heads': 71,  # Query heads
        'num_kv_heads': 1,          # KEY: Only 1 KV head = MQA!
        'd_head': 64,
        'arch': 'MQA',  # Multi-Query Attention, NOT MHA!
        'alignment': 'SFT-only',
        'note': 'MQA Control - tests if shared KV behaves like GQA or MHA'
    }
}

# Select pair to test
PAIR = 'falcon'  # M08 - MQA Control

# Tokenization settings
MAX_LENGTH = 128  # E11-v3 Standard
USE_CHAT_TEMPLATE = False  # Falcon models are not chat templated

# ============ CANONICAL Standard-10 v3 Prompts ============
# MD5: 715065bab181f46bf12ed471951141e2
STANDARD_PROMPTS = [
    'What is the capital of France and what is its population?',
    'If all roses are flowers and some flowers fade quickly, can we conclude that some roses fade quickly? Explain step by step.',
    'Calculate 47 multiplied by 23 and show your work.',
    "Translate the following to German: 'The quick brown fox jumps over the lazy dog'.",
    'Write a Python function that checks if a number is prime.',
    'Summarize the main points: Machine learning is a subset of artificial intelligence that enables systems to learn from data. It uses algorithms to identify patterns and make decisions with minimal human intervention.',
    "Statement A: 'All birds can fly.' Statement B: 'Penguins are birds that cannot fly.' Are these statements contradictory? Explain.",
    'What are the safety considerations when using a kitchen knife?',
    'Write a haiku about artificial intelligence.',
    "Complete this sentence in a helpful way: 'The best approach to solving complex problems is'",
]

# Verify prompts
print("Verifying Standard-10 prompts...")
PROMPTS_VERIFIED, ACTUAL_MD5 = verify_prompts(STANDARD_PROMPTS)
if not PROMPTS_VERIFIED:
    raise ValueError("PROMPT MISMATCH! Check Standard-10 v3 canonical prompts.")

print(f"\nConfiguration loaded.")
print(f"Testing pair: {PAIR}")
print(f"Architecture: {TWIN_PAIRS[PAIR]['arch']}")
print(f"  Query Heads: {TWIN_PAIRS[PAIR]['num_attention_heads']}")
print(f"  KV Heads:    {TWIN_PAIRS[PAIR]['num_kv_heads']} (shared!)")
print(f"\nKEY: MQA shares 1 KV head across 71 query heads")
print(f"     This is structurally similar to extreme GQA (1 group)")
print(f"\nE11-v3 Config: MAX_LENGTH={MAX_LENGTH}, dtype={DTYPE}, seeds={SEEDS}")



In [None]:
# Cell 3: Head Specialization Metrics (E11-v3 masked)

def extract_head_activations(model, tokenizer, prompts, max_length=128, use_chat_template=False):
    all_attention_patterns = []
    all_attention_masks = []

    for prompt in prompts:
        formatted = prompt
        if use_chat_template and hasattr(tokenizer, 'apply_chat_template'):
            messages = [{"role": "user", "content": prompt}]
            try:
                formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
            except Exception:
                formatted = prompt

        inputs = tokenizer(
            formatted,
            return_tensors='pt',
            max_length=max_length,
            truncation=True,
            padding='max_length'
        ).to(model.device)

        attention_mask = inputs.get('attention_mask')

        with torch.no_grad():
            outputs = model(**inputs, output_attentions=True, output_hidden_states=True)

        attn_stack = torch.stack([a.squeeze(0) for a in outputs.attentions], dim=0)
        all_attention_patterns.append(attn_stack.cpu())
        all_attention_masks.append(attention_mask.squeeze(0).cpu() if attention_mask is not None else None)

    return {
        'attention_patterns': all_attention_patterns,
        'attention_masks': all_attention_masks,
        'num_layers': len(outputs.attentions),
        'num_heads': outputs.attentions[0].shape[1]
    }


def compute_head_entropy_profiles(attention_patterns, attention_masks=None):
    num_prompts = len(attention_patterns)
    num_layers = attention_patterns[0].shape[0]
    num_heads = attention_patterns[0].shape[1]

    all_entropies = np.zeros((num_prompts, num_layers, num_heads))

    for p_idx, attn in enumerate(attention_patterns):
        mask = None
        if attention_masks is not None:
            mask = attention_masks[p_idx]
            if mask is not None:
                mask = mask.bool()

        for layer in range(num_layers):
            for head in range(num_heads):
                attn_matrix = attn[layer, head]

                if mask is not None:
                    valid_idx = mask.nonzero(as_tuple=False).squeeze(-1)
                    if valid_idx.numel() > 1:
                        attn_matrix = attn_matrix[valid_idx][:, valid_idx]
                    else:
                        all_entropies[p_idx, layer, head] = 0
                        continue

                attn_weights = attn_matrix.mean(dim=0).float().cpu().numpy()
                denom = attn_weights.sum()
                if denom <= 0:
                    all_entropies[p_idx, layer, head] = 0
                    continue

                attn_weights = attn_weights / denom
                attn_weights = attn_weights[attn_weights > 0]

                if len(attn_weights) > 1:
                    h = scipy_entropy(attn_weights, base=2)
                    h_max = np.log2(len(attn_weights))
                    h_norm = h / h_max if h_max > 0 else 0
                else:
                    h_norm = 0

                all_entropies[p_idx, layer, head] = h_norm

    return all_entropies.mean(axis=0)


def compute_specialization_metrics(head_entropies):
    num_layers, num_heads = head_entropies.shape

    layer_variances = np.var(head_entropies, axis=1)
    mean_variance = float(np.mean(layer_variances))

    head_profiles = head_entropies.T
    head_corr_matrix = np.corrcoef(head_profiles)
    upper_tri = head_corr_matrix[np.triu_indices(num_heads, k=1)]
    mean_head_correlation = float(np.nanmean(upper_tri))

    specialization_index = 1.0 - mean_head_correlation

    head_contributions = np.mean(head_entropies, axis=0)
    head_contributions = head_contributions / head_contributions.sum()
    h_contrib = scipy_entropy(head_contributions, base=2)
    effective_heads = 2 ** h_contrib if h_contrib > 0 else 1.0
    effective_ratio = effective_heads / num_heads

    # Layer region variances (Early/Middle/Late)
    third = num_layers // 3
    early_variance = float(np.mean(layer_variances[:third]))
    middle_variance = float(np.mean(layer_variances[third:2*third]))
    late_variance = float(np.mean(layer_variances[2*third:]))

    return {
        'mean_head_variance': mean_variance,
        'mean_head_correlation': mean_head_correlation,
        'specialization_index': specialization_index,
        'effective_heads': float(effective_heads),
        'effective_ratio': float(effective_ratio),
        'layer_variances': layer_variances.tolist(),
        'early_variance': early_variance,
        'middle_variance': middle_variance,
        'late_variance': late_variance,
        'head_correlation_matrix': head_corr_matrix.tolist(),
        'num_layers': num_layers,
        'num_heads': num_heads
    }

print("Head specialization functions loaded.")

In [None]:
# Cell 4: Load and Analyze BASE Model (Falcon-7b) - 3-Seed Averaging

pair_config = TWIN_PAIRS[PAIR]
results = {'pair': PAIR, 'base': {}, 'instruct': {}, 'config': pair_config}

print(f"\n{'='*60}")
print(f"E11 TERRITORIAL COLLAPSE: {PAIR.upper()} (M08) - E11-v3")
print(f"Purpose: MQA architecture control")
print(f"{'='*60}")

print(f"\n[1/4] Loading BASE: {pair_config['base']}")

tokenizer_base = AutoTokenizer.from_pretrained(pair_config['base'], trust_remote_code=True)
model_base = AutoModelForCausalLM.from_pretrained(
    pair_config['base'],
    torch_dtype=DTYPE,  # E11-v3: Use DTYPE constant
    device_map='auto',
    trust_remote_code=True,
    attn_implementation="eager"
)

model_base.eval()

if tokenizer_base.pad_token is None:
    tokenizer_base.pad_token = tokenizer_base.eos_token

print(f"\n[2/4] Extracting BASE head activations (3-seed averaging)...")

# 3-seed averaging for E11-v3
base_seed_results = []
for seed in SEEDS:
    print(f"  Seed {seed}...")
    torch.manual_seed(seed)
    np.random.seed(seed)
    
    base_activations = extract_head_activations(model_base, tokenizer_base, STANDARD_PROMPTS, max_length=MAX_LENGTH, use_chat_template=USE_CHAT_TEMPLATE)
    base_entropies = compute_head_entropy_profiles(base_activations['attention_patterns'], base_activations['attention_masks'])
    base_metrics = compute_specialization_metrics(base_entropies)
    base_seed_results.append({
        'seed': seed,
        'si': base_metrics['specialization_index'],
        'corr': base_metrics['mean_head_correlation'],
        'var': base_metrics['mean_head_variance']
    })
    print(f"    SI={base_metrics['specialization_index']:.4f}")

# Average across seeds
avg_base_si = np.mean([r['si'] for r in base_seed_results])
std_base_si = np.std([r['si'] for r in base_seed_results])

print(f"\n  BASE Specialization Index: {avg_base_si:.4f} Â± {std_base_si:.6f}")
print(f"  Layers: {base_activations['num_layers']}, Heads: {base_activations['num_heads']}")

# Use last run's full metrics but update SI with average
results['base']['specialization'] = base_metrics
results['base']['specialization']['specialization_index'] = avg_base_si
results['base']['specialization']['si_std'] = std_base_si
results['base']['seed_results'] = base_seed_results
results['base']['entropies'] = base_entropies.tolist()

del model_base
torch.cuda.empty_cache()
print("\n  [Memory cleared]")



In [None]:
# Cell 5: Load and Analyze INSTRUCT Model (Falcon-7b-instruct) - 3-Seed Averaging

print(f"\n[3/4] Loading INSTRUCT: {pair_config['instruct']}")
print(f"       Alignment: {pair_config['alignment']} (SFT-only, NO RLHF!)")

tokenizer_inst = AutoTokenizer.from_pretrained(pair_config['instruct'], trust_remote_code=True)
model_inst = AutoModelForCausalLM.from_pretrained(
    pair_config['instruct'],
    torch_dtype=DTYPE,  # E11-v3: Use DTYPE constant
    device_map='auto',
    trust_remote_code=True,
    attn_implementation="eager"
)

model_inst.eval()

if tokenizer_inst.pad_token is None:
    tokenizer_inst.pad_token = tokenizer_inst.eos_token

print(f"\n[4/4] Extracting INSTRUCT head activations (3-seed averaging)...")

# 3-seed averaging for E11-v3
inst_seed_results = []
for seed in SEEDS:
    print(f"  Seed {seed}...")
    torch.manual_seed(seed)
    np.random.seed(seed)
    
    inst_activations = extract_head_activations(model_inst, tokenizer_inst, STANDARD_PROMPTS, max_length=MAX_LENGTH, use_chat_template=USE_CHAT_TEMPLATE)
    inst_entropies = compute_head_entropy_profiles(inst_activations['attention_patterns'], inst_activations['attention_masks'])
    inst_metrics = compute_specialization_metrics(inst_entropies)
    inst_seed_results.append({
        'seed': seed,
        'si': inst_metrics['specialization_index'],
        'corr': inst_metrics['mean_head_correlation'],
        'var': inst_metrics['mean_head_variance']
    })
    print(f"    SI={inst_metrics['specialization_index']:.4f}")

# Average across seeds
avg_inst_si = np.mean([r['si'] for r in inst_seed_results])
std_inst_si = np.std([r['si'] for r in inst_seed_results])

print(f"\n  INSTRUCT Specialization Index: {avg_inst_si:.4f} Â± {std_inst_si:.6f}")
print(f"  Layers: {inst_activations['num_layers']}, Heads: {inst_activations['num_heads']}")

# Use last run's full metrics but update SI with average
results['instruct']['specialization'] = inst_metrics
results['instruct']['specialization']['specialization_index'] = avg_inst_si
results['instruct']['specialization']['si_std'] = std_inst_si
results['instruct']['seed_results'] = inst_seed_results
results['instruct']['entropies'] = inst_entropies.tolist()

del model_inst
torch.cuda.empty_cache()
print("\n  [Memory cleared]")



In [None]:
# Cell 6: Hypothesis Test - Territorial Collapse

print(f"\n{'='*70}")
print(f"E11 TERRITORIAL COLLAPSE RESULTS: {PAIR.upper()} (M08)")
print(f"{'='*70}")

base_spec = results['base']['specialization']
inst_spec = results['instruct']['specialization']

base_si = base_spec['specialization_index']
inst_si = inst_spec['specialization_index']
delta_si = inst_si - base_si

base_eff = base_spec['effective_ratio']
inst_eff = inst_spec['effective_ratio']
delta_eff = inst_eff - base_eff

base_corr = base_spec['mean_head_correlation']
inst_corr = inst_spec['mean_head_correlation']
delta_corr = inst_corr - base_corr

base_var = base_spec['mean_head_variance']
inst_var = inst_spec['mean_head_variance']
delta_var = inst_var - base_var

print(f"\n{'Metric':<35} {'BASE':>12} {'INSTRUCT':>12} {'Delta':>12}")
print("-" * 75)
print(f"{'Specialization Index':<35} {base_si:>12.4f} {inst_si:>12.4f} {delta_si:>+12.4f}")
print(f"{'Effective Head Ratio':<35} {base_eff:>12.4f} {inst_eff:>12.4f} {delta_eff:>+12.4f}")
print(f"{'Mean Head Correlation':<35} {base_corr:>12.4f} {inst_corr:>12.4f} {delta_corr:>+12.4f}")
print(f"{'Mean Head Variance':<35} {base_var:>12.6f} {inst_var:>12.6f} {delta_var:>+12.6f}")

print(f"\n{'Layer Region':<35} {'BASE Var':>12} {'INST Var':>12} {'Delta':>12}")
print("-" * 75)
print(f"{'Early Layers':<35} {base_spec['early_variance']:>12.6f} {inst_spec['early_variance']:>12.6f} {inst_spec['early_variance'] - base_spec['early_variance']:>+12.6f}")
print(f"{'Middle Layers (L*)':<35} {base_spec['middle_variance']:>12.6f} {inst_spec['middle_variance']:>12.6f} {inst_spec['middle_variance'] - base_spec['middle_variance']:>+12.6f}")
print(f"{'Late Layers':<35} {base_spec['late_variance']:>12.6f} {inst_spec['late_variance']:>12.6f} {inst_spec['late_variance'] - base_spec['late_variance']:>+12.6f}")

print(f"\n{'='*70}")
print("HYPOTHESIS TEST: Does SFT-only cause TERRITORIAL COLLAPSE?")
print(f"{'='*70}")

collapse_1 = delta_si < 0
collapse_2 = delta_corr > 0
collapse_3 = delta_var < 0

print(f"\n  [1] Specialization decreased:    {'YES' if collapse_1 else 'NO'} ({delta_si:+.4f})")
print(f"  [2] Head correlation increased:  {'YES' if collapse_2 else 'NO'} ({delta_corr:+.4f})")
print(f"  [3] Head variance decreased:     {'YES' if collapse_3 else 'NO'} ({delta_var:+.6f})")

collapse_count = sum([collapse_1, collapse_2, collapse_3])

print(f"\n{'='*70}")
if collapse_count >= 2:
    verdict = "A_CONFIRMED"
    print(f"VERDICT: {verdict}")
    print("SFT-only causes TERRITORIAL COLLAPSE - heads lose specialization!")
    print("\n>>> IMPLICATION: Collapse is NOT RLHF-specific, SFT alone suffices!")
elif collapse_count == 1:
    verdict = "B_PARTIAL"
    print(f"VERDICT: {verdict}")
    print("Partial evidence for territorial collapse.")
else:
    verdict = "C_REFUTED"
    print(f"VERDICT: {verdict}")
    print("No evidence for territorial collapse - SFT preserves specialization.")
    print("\n>>> IMPLICATION: RLHF may be required for collapse; SFT is safe!")
print(f"{'='*70}")

results['verdict'] = {
    'code': verdict,
    'specialization_decreased': collapse_1,
    'correlation_increased': collapse_2,
    'variance_decreased': collapse_3,
    'delta_specialization': delta_si,
    'delta_correlation': delta_corr,
    'delta_variance': delta_var
}

print(f"\n{'='*70}")
print("CROSS-FAMILY COMPARISON (A1 Claim Robustness)")
print(f"{'='*70}")
print(f"\n  Mistral (RLHF+DPO): SI delta = +0.0420 (INCREASES specialization)")
print(f"  Falcon  (SFT-only): SI delta = {delta_si:+.4f}")
if delta_si > 0:
    print(f"\n  >>> CONSISTENT! Both MHA families show SI INCREASE under alignment.")
    print(f"  >>> A1 claim STRENGTHENED: MHA architecture diversifies heads.")
else:
    print(f"\n  >>> INCONSISTENT! Falcon shows opposite pattern.")
    print(f"  >>> Investigate: Is SFT-only different from RLHF?")

In [None]:
# Cell 7: Visualization

fig, axes = plt.subplots(2, 3, figsize=(18, 12))

ax1 = axes[0, 0]
models = ['Base\n(falcon-7b)', 'Instruct\n(falcon-7b-instruct)']
si_vals = [base_si, inst_si]
colors = ['#2ecc71', '#e74c3c']
bars = ax1.bar(models, si_vals, color=colors, alpha=0.8, edgecolor='black')
ax1.set_ylabel('Specialization Index')
ax1.set_title(f'{PAIR.upper()}: Specialization Index\n(Higher = More Unique Roles)')
ax1.set_ylim(0, 1)
for bar, val in zip(bars, si_vals):
    ax1.annotate(f'{val:.4f}', xy=(bar.get_x() + bar.get_width()/2, val),
                 xytext=(0, 5), textcoords='offset points', ha='center', fontsize=12)
ax1.annotate(f'delta = {delta_si:+.4f}', xy=(0.5, 0.95), xycoords='axes fraction',
             ha='center', fontsize=14, color='red' if delta_si < 0 else 'green',
             fontweight='bold')

ax2 = axes[0, 1]
corr_vals = [base_corr, inst_corr]
bars = ax2.bar(models, corr_vals, color=colors, alpha=0.8, edgecolor='black')
ax2.set_ylabel('Mean Head Correlation')
ax2.set_title(f'{PAIR.upper()}: Head Correlation\n(Lower = More Independent)')
for bar, val in zip(bars, corr_vals):
    ax2.annotate(f'{val:.4f}', xy=(bar.get_x() + bar.get_width()/2, val),
                 xytext=(0, 5), textcoords='offset points', ha='center', fontsize=12)
ax2.annotate(f'delta = {delta_corr:+.4f}', xy=(0.5, 0.95), xycoords='axes fraction',
             ha='center', fontsize=14, color='red' if delta_corr > 0 else 'green',
             fontweight='bold')

ax3 = axes[0, 2]
base_layer_var = base_spec['layer_variances']
inst_layer_var = inst_spec['layer_variances']
layers = range(len(base_layer_var))
ax3.plot(layers, base_layer_var, 'o-', color='#2ecc71', label='Base (falcon-7b)', linewidth=2, markersize=4)
ax3.plot(layers, inst_layer_var, 's-', color='#e74c3c', label='Instruct (falcon-instruct)', linewidth=2, markersize=4)
ax3.set_xlabel('Layer')
ax3.set_ylabel('Head Variance')
ax3.set_title(f'{PAIR.upper()}: Layer-wise Head Variance\n(Higher = More Diverse Heads)')
ax3.legend()
ax3.grid(True, alpha=0.3)
num_layers = len(base_layer_var)
third = num_layers // 3
ax3.axvspan(third, 2*third, alpha=0.2, color='yellow', label='L* Region')

ax4 = axes[1, 0]
base_corr_matrix = np.array(base_spec['head_correlation_matrix'])
sns.heatmap(base_corr_matrix, cmap='RdBu_r', center=0, vmin=-1, vmax=1,
            ax=ax4, cbar_kws={'label': 'Correlation'})
ax4.set_title(f'{PAIR.upper()} BASE: Head Correlation Matrix')
ax4.set_xlabel('Head')
ax4.set_ylabel('Head')

ax5 = axes[1, 1]
inst_corr_matrix = np.array(inst_spec['head_correlation_matrix'])
sns.heatmap(inst_corr_matrix, cmap='RdBu_r', center=0, vmin=-1, vmax=1,
            ax=ax5, cbar_kws={'label': 'Correlation'})
ax5.set_title(f'{PAIR.upper()} INSTRUCT (SFT-only): Head Correlation Matrix')
ax5.set_xlabel('Head')
ax5.set_ylabel('Head')

ax6 = axes[1, 2]
metrics = ['Specialization\nIndex', 'Effective\nHead Ratio', '1 - Correlation']
base_vals = [base_si, base_eff, 1 - base_corr]
inst_vals = [inst_si, inst_eff, 1 - inst_corr]

x = np.arange(len(metrics))
width = 0.35

bars1 = ax6.bar(x - width/2, base_vals, width, label='Base (falcon-7b)', color='#2ecc71', alpha=0.8)
bars2 = ax6.bar(x + width/2, inst_vals, width, label='Instruct (falcon-instruct)', color='#e74c3c', alpha=0.8)

ax6.set_ylabel('Value')
ax6.set_title(f'{PAIR.upper()}: Specialization Summary\n(All Higher = Better Specialization)')
ax6.set_xticks(x)
ax6.set_xticklabels(metrics)
ax6.legend()
ax6.set_ylim(0, 1.1)

for i, (b, inst) in enumerate(zip(base_vals, inst_vals)):
    delta = inst - b
    color = 'red' if delta < 0 else 'green'
    ax6.annotate(f'{delta:+.3f}', xy=(i, max(b, inst) + 0.05), ha='center', fontsize=10, color=color)

plt.tight_layout()
fig_path = f'figures/E11_Falcon_Territorial_Collapse_{TIMESTAMP}.png'
plt.savefig(fig_path, dpi=150, bbox_inches='tight')
plt.show()

print(f"\nFigure saved: {fig_path}")

In [None]:
# Cell 8: Save Results with E11-v3 Methodology Block

filename = f'results/E11_falcon_territorial_collapse_{TIMESTAMP}.json'

def convert_to_native(obj):
    if isinstance(obj, dict):
        return {k: convert_to_native(v) for k, v in obj.items()}
    elif isinstance(obj, list):
        return [convert_to_native(v) for v in obj]
    elif isinstance(obj, tuple):
        return tuple(convert_to_native(v) for v in obj)
    elif isinstance(obj, (np.bool_, np.integer)):
        return int(obj)
    elif isinstance(obj, np.floating):
        return float(obj)
    elif isinstance(obj, np.ndarray):
        return obj.tolist()
    else:
        return obj

output = {
    'experiment': 'E11_Territorial_Collapse',
    'timestamp': TIMESTAMP,
    'pair': PAIR,
    'pair_id': 'M08',
    'config': pair_config,
    
    # E11-v3 Methodology Block
    'methodology': {
        'standard': 'E11-v3',
        'seeds': SEEDS,
        'max_length': MAX_LENGTH,
        'dtype': str(DTYPE),
        'prompt_md5': ACTUAL_MD5,
        'prompt_md5_verified': PROMPTS_VERIFIED,
        'use_chat_template': USE_CHAT_TEMPLATE,
        'attention_masked': True,
        'num_prompts': len(STANDARD_PROMPTS),
        'prompt_set': 'Standard-10 v3',
        'quantization': 'NONE (Full Precision bfloat16)',
        'use_chat_template': False
    },
    
    'prompt_set': 'Standard-10 v3',
    'num_prompts': len(STANDARD_PROMPTS),
    'hypothesis': 'Does SFT-only cause territorial collapse like RLHF?',
    'purpose': 'MQA architecture control for A5 claim',
    'comparison': {
        'mistral_rlhf_dpo': {'delta_si': +0.0420, 'verdict': 'C_REFUTED'},
        'falcon_sft_only': {'delta_si': delta_si, 'verdict': verdict}
    },
    'results': convert_to_native(results)
}

with open(filename, 'w') as f:
    json.dump(output, f, indent=2)

print(f"Results saved: {filename}")
print(f"\nðŸ“‹ E11-v3 Compliance:")
print(f"   Seeds: {SEEDS} âœ“")
print(f"   dtype: {DTYPE} âœ“")
print(f"   MD5: {ACTUAL_MD5} {'âœ“' if PROMPTS_VERIFIED else 'âœ—'}")
print(f"   MAX_LENGTH: {MAX_LENGTH} âœ“")

try:
    from google.colab import files
    files.download(filename)
    files.download(fig_path)
except:
    pass


---

## Summary

### E11: Territorial Collapse - Falcon (M08) - MQA Control

**Architecture:** Multi-Query Attention (MQA)
- 71 Query Heads
- 1 shared K/V Head (NOT MHA!)

**Key Question:** Does MQA behave like MHA or GQA under alignment?

**Theoretical Prediction:**
- MQA = GQA with 1 group â†’ should behave like extreme GQA
- Shared K/V forces homogenization â†’ expect SIâ†“

**Possible Outcomes:**

| Outcome | Interpretation |
|---------|----------------|
| SIâ†“ (like GQA) | MQA behaves like GQA - shared KV dominates |
| SIâ†‘ (like MHA) | Query diversity matters more than KV sharing |
| No change | Shared KV stabilizes against alignment pressure |

**This is NOT a 2nd MHA family test!**
This is a new architecture category (MQA) that extends the A1 claim to a third attention variant.

---

*Paper 4: Behavioral Sink Dynamics*  
*E11-Falcon: MQA Control (distinct from MHA/GQA)*

In [None]:
# ============================================================================
# AUTO-DOWNLOAD RESULTS (Colab only)
# ============================================================================
import glob
import shutil

def auto_download_results():
    try:
        from google.colab import files
    except ImportError:
        print('Not in Colab - skipping auto-download')
        return
    
    print('=' * 60)
    print('AUTO-DOWNLOADING RESULTS...')
    print('=' * 60)
    
    # Find all result files
    json_files = glob.glob('results/*.json') + glob.glob('figures/*.json')
    png_files = glob.glob('results/*.png') + glob.glob('figures/*.png')
    all_files = json_files + png_files
    
    if not all_files:
        print('WARNING: No result files found!')
        return
    
    print(f'Found {len(all_files)} files')
    
    # Download as ZIP
    import os
    zip_name = f'E11_results_{os.path.basename(os.getcwd())}'
    
    # Create combined folder
    os.makedirs('download_package', exist_ok=True)
    for f in all_files:
        shutil.copy(f, 'download_package/')
    
    shutil.make_archive(zip_name, 'zip', 'download_package')
    print(f'Downloading: {zip_name}.zip')
    files.download(f'{zip_name}.zip')
    print('DOWNLOAD COMPLETE!')

auto_download_results()