# Experiment 035F: AQ Compositional Bonding

**AKIRA Project - Oscar Goldman - Shogu Research Group @ Datamutant.ai**

---

## Core Hypothesis

Action Quanta (AQ) are irreducible action discrimination primitives. When multiple AQ combine
in context, they form **bonded states** - composite patterns that encode multiple action
discriminations simultaneously.

**Key Prediction**: If AQ are compositional primitives:
1. Prompts with N AQ components should activate patterns that **decompose** into N component signatures
2. The bonded pattern should show higher similarity to its components than to unrelated AQ
3. Component AQ patterns should be **detectable** within the bonded representation

---

## What Are True AQ?

AQ are the minimum patterns enabling discrimination between action alternatives:

| AQ | Discrimination | Action Enabled |
|:---|:---------------|:---------------|
| THREAT_PRESENT | threat vs no-threat | FLEE vs STAY |
| PROXIMITY | near vs far | ENGAGE vs OBSERVE |
| DIRECTION | toward vs away | INTERCEPT vs EVADE |
| URGENCY | immediate vs delayed | ACT_NOW vs PLAN |
| AGENT_INTENT | hostile vs friendly | DEFEND vs COOPERATE |
| RESOURCE_STATE | scarce vs abundant | CONSERVE vs EXPEND |

These are NOT output categories like "compute number" or "answer boolean".
They are **action primitives** - the minimum information needed to choose an action.

---

## Experimental Design

1. **Single AQ prompts**: Activate only one discrimination (e.g., just THREAT_PRESENT)
2. **Bonded AQ prompts**: Activate multiple discriminations (e.g., THREAT + PROXIMITY + DIRECTION)
3. **Analysis**: Test if bonded patterns decompose into detectable component signatures

---

## 1. Setup

In [None]:
!pip install transformers torch numpy scikit-learn matplotlib seaborn scipy -q

In [None]:
import torch
import torch.nn as nn
from transformers import AutoModelForCausalLM, AutoTokenizer
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt
import seaborn as sns
from typing import Dict, List, Tuple, Optional
from dataclasses import dataclass, field
from scipy import stats
from tqdm import tqdm
import warnings
import gc

warnings.filterwarnings('ignore')

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Device: {DEVICE}")
print(f"PyTorch version: {torch.__version__}")
if DEVICE == "cuda":
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

## 2. AQ Component Definitions

Each AQ is defined by:
- The discrimination it enables
- Marker words/phrases that activate it
- The action choice it enables

In [None]:
# Core AQ Components - each is an irreducible action discrimination
AQ_COMPONENTS = {
    "THREAT": {
        "discrimination": "threat vs no-threat",
        "action_enabled": "FLEE vs STAY",
        "markers": ["danger", "threat", "attack", "predator", "enemy", "fire", "flood", 
                    "collapse", "explosion", "gunfire", "intruder", "poison", "tsunami"],
        "neutral": ["object", "item", "thing", "element", "entity", "matter"]
    },
    "PROXIMITY": {
        "discrimination": "near vs far",
        "action_enabled": "ENGAGE vs OBSERVE",
        "markers": ["approaching", "nearby", "close", "immediate", "at the door", 
                    "right behind", "within reach", "meters away", "closing in"],
        "neutral": ["somewhere", "located", "positioned", "exists", "present"]
    },
    "DIRECTION": {
        "discrimination": "toward vs away",
        "action_enabled": "INTERCEPT vs EVADE",
        "markers": ["coming toward", "heading your way", "approaching you", "moving closer",
                    "advancing", "bearing down", "en route to your position"],
        "neutral": ["moving", "traveling", "going", "proceeding", "in motion"]
    },
    "URGENCY": {
        "discrimination": "immediate vs delayed",
        "action_enabled": "ACT_NOW vs PLAN",
        "markers": ["now", "immediately", "right now", "this instant", "without delay",
                    "urgent", "emergency", "critical", "time-sensitive"],
        "neutral": ["eventually", "sometime", "when possible", "at some point"]
    },
    "AGENT_INTENT": {
        "discrimination": "hostile vs friendly",
        "action_enabled": "DEFEND vs COOPERATE",
        "markers": ["aggressive", "hostile", "attacking", "malicious", "threatening",
                    "violent", "armed", "hunting", "stalking"],
        "neutral": ["present", "there", "existing", "around", "nearby"]
    },
    "RESOURCE": {
        "discrimination": "scarce vs abundant",
        "action_enabled": "CONSERVE vs EXPEND",
        "markers": ["running low", "almost out", "last remaining", "dwindling",
                    "limited supply", "rationing", "scarce", "depleted"],
        "neutral": ["available", "present", "existing", "there", "some"]
    }
}

print(f"Defined {len(AQ_COMPONENTS)} AQ components:")
for aq, info in AQ_COMPONENTS.items():
    print(f"  {aq}: {info['discrimination']} -> {info['action_enabled']}")

## 3. Prompt Generation

Generate three types of prompts:
1. **Single AQ**: Activates exactly one discrimination
2. **Double AQ**: Activates exactly two discriminations (bonded)
3. **Triple AQ**: Activates exactly three discriminations (bonded)

Control: Each bonded prompt explicitly combines markers from its component AQ.

In [None]:
def generate_single_aq_prompts(aq_name: str, n: int = 100) -> List[str]:
    """Generate prompts that activate exactly one AQ.
    
    Args:
        aq_name: Name of the AQ component
        n: Number of prompts to generate
        
    Returns:
        List of prompts activating only this AQ
    """
    assert aq_name in AQ_COMPONENTS, f"Unknown AQ: {aq_name}"
    
    aq = AQ_COMPONENTS[aq_name]
    markers = aq["markers"]
    
    # Templates designed to activate ONLY this AQ
    templates = {
        "THREAT": [
            "There is {m} detected. What should you do?",
            "A {m} has been identified. Your response:",
            "Warning: {m} present. Action required:",
            "Alert: {m} in the area. You should",
            "Notice: {m} reported. The appropriate action is",
        ],
        "PROXIMITY": [
            "Something is {m}. Your response:",
            "The object is {m}. What do you do?",
            "Location update: {m}. Action:",
            "Status: target is {m}. You should",
            "Report: item detected {m}. Response:",
        ],
        "DIRECTION": [
            "Movement detected: {m}. Your action:",
            "The object is {m}. Response:",
            "Tracking shows {m}. You should",
            "Vector analysis: {m}. Action needed:",
            "Motion: {m}. What do you do?",
        ],
        "URGENCY": [
            "This requires action {m}. You should",
            "Response needed {m}. Action:",
            "Timeline: {m}. Your response:",
            "Priority: act {m}. You must",
            "Timing: {m}. What do you do?",
        ],
        "AGENT_INTENT": [
            "The entity appears {m}. Your response:",
            "Behavior analysis: {m}. Action:",
            "Intent detected: {m}. You should",
            "The agent is {m}. What do you do?",
            "Assessment: entity is {m}. Response:",
        ],
        "RESOURCE": [
            "Supplies are {m}. Your action:",
            "Resource status: {m}. Response:",
            "Inventory shows {m}. You should",
            "Current stock is {m}. Action needed:",
            "Materials are {m}. What do you do?",
        ],
    }
    
    prompts = []
    np.random.seed(hash(aq_name) % 2**32)
    
    for i in range(n):
        template = templates[aq_name][i % len(templates[aq_name])]
        marker = markers[i % len(markers)]
        prompts.append(template.format(m=marker))
    
    return prompts


def generate_bonded_prompts(aq_names: List[str], n: int = 100) -> List[str]:
    """Generate prompts that activate multiple AQ simultaneously (bonded state).
    
    Args:
        aq_names: List of AQ components to combine
        n: Number of prompts to generate
        
    Returns:
        List of bonded prompts
    """
    for name in aq_names:
        assert name in AQ_COMPONENTS, f"Unknown AQ: {name}"
    
    # Bonded templates that combine multiple AQ
    if set(aq_names) == {"THREAT", "PROXIMITY"}:
        templates = [
            "A {t} is {p}. What should you do?",
            "Warning: {t} detected {p}. Your response:",
            "Alert: {t} {p}. Action required:",
            "{t} confirmed {p}. You should",
            "Danger: {t} is {p}. Response:",
        ]
    elif set(aq_names) == {"THREAT", "DIRECTION"}:
        templates = [
            "A {t} is {d}. What should you do?",
            "Warning: {t} {d}. Your response:",
            "Alert: {t} detected {d}. Action:",
            "{t} is {d}. You should",
            "Danger: {t} {d}. Response:",
        ]
    elif set(aq_names) == {"THREAT", "URGENCY"}:
        templates = [
            "A {t} requires response {u}. What do you do?",
            "{t} detected - act {u}. Your response:",
            "Warning: {t}, respond {u}. Action:",
            "{t} present, action needed {u}. You should",
            "Alert: {t}, {u}. Response:",
        ]
    elif set(aq_names) == {"PROXIMITY", "DIRECTION"}:
        templates = [
            "Object is {p} and {d}. What should you do?",
            "Target {p}, {d}. Your response:",
            "Status: {p} and {d}. Action:",
            "Tracking: {p}, {d}. You should",
            "Location: {p}, movement {d}. Response:",
        ]
    elif set(aq_names) == {"THREAT", "PROXIMITY", "DIRECTION"}:
        templates = [
            "A {t} is {p} and {d}. What should you do?",
            "Warning: {t} {p}, {d}. Your response:",
            "Alert: {t} detected {p}, {d}. Action:",
            "{t} confirmed {p} and {d}. You should",
            "Danger: {t} {p}, {d}. Response:",
        ]
    elif set(aq_names) == {"THREAT", "PROXIMITY", "URGENCY"}:
        templates = [
            "A {t} is {p}, respond {u}. What do you do?",
            "Warning: {t} {p} - act {u}. Response:",
            "{t} detected {p}, action needed {u}. You should",
            "Alert: {t} {p}, {u}. Action:",
            "Danger: {t} {p}, respond {u}. Your response:",
        ]
    elif set(aq_names) == {"THREAT", "AGENT_INTENT", "PROXIMITY"}:
        templates = [
            "A {a} {t} is {p}. What should you do?",
            "Warning: {a} {t} detected {p}. Response:",
            "Alert: {a} {t} {p}. Action:",
            "{a} {t} confirmed {p}. You should",
            "Danger: {a} {t} is {p}. Your response:",
        ]
    else:
        # Generic bonded template
        templates = [
            "Situation: " + ", ".join([f"{{{n[0].lower()}}}" for n in aq_names]) + ". What should you do?",
        ]
    
    prompts = []
    np.random.seed(hash(tuple(sorted(aq_names))) % 2**32)
    
    for i in range(n):
        template = templates[i % len(templates)]
        
        # Get markers for each component
        format_dict = {}
        for name in aq_names:
            key = name[0].lower()  # First letter as key
            markers = AQ_COMPONENTS[name]["markers"]
            format_dict[key] = markers[i % len(markers)]
        
        try:
            prompts.append(template.format(**format_dict))
        except KeyError:
            # Fallback for complex combinations
            parts = [f"{AQ_COMPONENTS[n]['markers'][i % len(AQ_COMPONENTS[n]['markers'])]}" for n in aq_names]
            prompts.append(f"Situation: {', '.join(parts)}. What should you do?")
    
    return prompts


# Test prompt generation
print("=== Single AQ Examples ===")
for aq in ["THREAT", "PROXIMITY", "DIRECTION"]:
    prompts = generate_single_aq_prompts(aq, n=3)
    print(f"\n{aq}:")
    for p in prompts:
        print(f"  - {p}")

print("\n=== Bonded AQ Examples ===")
bonded = generate_bonded_prompts(["THREAT", "PROXIMITY"], n=3)
print("\nTHREAT + PROXIMITY:")
for p in bonded:
    print(f"  - {p}")

bonded = generate_bonded_prompts(["THREAT", "PROXIMITY", "DIRECTION"], n=3)
print("\nTHREAT + PROXIMITY + DIRECTION:")
for p in bonded:
    print(f"  - {p}")

## 4. Model Loading and Activation Extraction

In [None]:
@dataclass
class ExperimentConfig:
    """Configuration for compositional bonding experiment."""
    
    models: Dict[str, str] = field(default_factory=lambda: {
        "gpt2-medium": "gpt2-medium",
        "pythia-1.4b": "EleutherAI/pythia-1.4b",
    })
    
    # Prompts per condition
    prompts_per_condition: int = 100
    
    # Statistical parameters
    n_bootstrap: int = 1000
    random_seed: int = 42
    
    def __post_init__(self):
        np.random.seed(self.random_seed)
        torch.manual_seed(self.random_seed)


config = ExperimentConfig()
print(f"Models: {list(config.models.keys())}")
print(f"Prompts per condition: {config.prompts_per_condition}")

In [None]:
def load_model(model_name: str) -> Tuple[AutoModelForCausalLM, AutoTokenizer, int]:
    """Load model and tokenizer.
    
    Args:
        model_name: HuggingFace model identifier
        
    Returns:
        Tuple of (model, tokenizer, num_layers)
    """
    print(f"Loading {model_name}...")
    
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
    
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype=torch.float16 if DEVICE == "cuda" else torch.float32,
        device_map="auto" if DEVICE == "cuda" else None,
        output_hidden_states=True
    )
    model.eval()
    
    # Get number of layers
    if hasattr(model.config, 'n_layer'):
        n_layers = model.config.n_layer
    elif hasattr(model.config, 'num_hidden_layers'):
        n_layers = model.config.num_hidden_layers
    else:
        n_layers = 24  # Default
    
    print(f"  Layers: {n_layers}")
    print(f"  Hidden size: {model.config.hidden_size}")
    
    return model, tokenizer, n_layers


def extract_activations(model: AutoModelForCausalLM, 
                        tokenizer: AutoTokenizer,
                        prompts: List[str],
                        layers: List[int]) -> Dict[int, np.ndarray]:
    """Extract activations from specified layers.
    
    Args:
        model: The language model
        tokenizer: The tokenizer
        prompts: List of prompts
        layers: Which layers to extract from
        
    Returns:
        Dict mapping layer index to activation matrix (n_prompts x hidden_size)
    """
    activations = {layer: [] for layer in layers}
    
    with torch.no_grad():
        for prompt in tqdm(prompts, desc="Extracting", leave=False):
            inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=128)
            inputs = {k: v.to(model.device) for k, v in inputs.items()}
            
            outputs = model(**inputs, output_hidden_states=True)
            hidden_states = outputs.hidden_states
            
            for layer in layers:
                # Use last token representation
                layer_act = hidden_states[layer][0, -1, :].cpu().numpy().astype(np.float32)
                activations[layer].append(layer_act)
    
    # Convert to arrays
    for layer in layers:
        activations[layer] = np.array(activations[layer])
    
    return activations

## 5. Decomposition Analysis

Test whether bonded states decompose into their component signatures using:

1. **Component Similarity**: Do bonded patterns show higher similarity to their
   component AQ than to unrelated AQ?

2. **Linear Probe Detection**: Can we detect the presence of each component AQ
   within the bonded representation using a linear classifier?

3. **Residual Analysis**: After subtracting component directions, what remains?

In [None]:
def compute_aq_centroids(activations_dict: Dict[str, np.ndarray]) -> Dict[str, np.ndarray]:
    """Compute centroid (mean) for each AQ category.
    
    Args:
        activations_dict: Dict mapping AQ name to activation matrix
        
    Returns:
        Dict mapping AQ name to centroid vector
    """
    centroids = {}
    for name, acts in activations_dict.items():
        centroids[name] = np.mean(acts, axis=0)
    return centroids


def component_similarity_test(bonded_acts: np.ndarray,
                              component_centroids: List[np.ndarray],
                              unrelated_centroids: List[np.ndarray]) -> Dict[str, float]:
    """Test if bonded patterns are more similar to components than unrelated AQ.
    
    Args:
        bonded_acts: Activations from bonded prompts (n_prompts x hidden_size)
        component_centroids: Centroids of the component AQ
        unrelated_centroids: Centroids of unrelated AQ (controls)
        
    Returns:
        Dict with similarity metrics
    """
    results = {}
    
    # Mean bonded pattern
    bonded_mean = np.mean(bonded_acts, axis=0).reshape(1, -1)
    
    # Similarity to components
    component_sims = []
    for centroid in component_centroids:
        sim = cosine_similarity(bonded_mean, centroid.reshape(1, -1))[0, 0]
        component_sims.append(sim)
    
    # Similarity to unrelated
    unrelated_sims = []
    for centroid in unrelated_centroids:
        sim = cosine_similarity(bonded_mean, centroid.reshape(1, -1))[0, 0]
        unrelated_sims.append(sim)
    
    results["mean_component_sim"] = np.mean(component_sims)
    results["mean_unrelated_sim"] = np.mean(unrelated_sims)
    results["sim_difference"] = results["mean_component_sim"] - results["mean_unrelated_sim"]
    
    # Statistical test
    if len(component_sims) > 1 and len(unrelated_sims) > 1:
        t_stat, p_val = stats.ttest_ind(component_sims, unrelated_sims)
        results["t_statistic"] = t_stat
        results["p_value"] = p_val
    
    return results


def linear_probe_detection(single_aq_acts: Dict[str, np.ndarray],
                           bonded_acts: np.ndarray,
                           component_names: List[str]) -> Dict[str, Dict]:
    """Train linear probes to detect each AQ, test on bonded patterns.
    
    If AQ are compositional, a probe trained to detect AQ_X should also
    detect AQ_X within bonded patterns containing AQ_X.
    
    Args:
        single_aq_acts: Dict mapping AQ name to activations from single-AQ prompts
        bonded_acts: Activations from bonded prompts
        component_names: Names of AQ that are present in the bonded prompts
        
    Returns:
        Dict with probe results for each AQ
    """
    results = {}
    
    all_aq_names = list(single_aq_acts.keys())
    
    for target_aq in all_aq_names:
        # Build binary classification: target_aq vs all others
        positive_acts = single_aq_acts[target_aq]
        negative_acts = np.vstack([single_aq_acts[aq] for aq in all_aq_names if aq != target_aq])
        
        # Subsample negative to balance
        n_pos = len(positive_acts)
        if len(negative_acts) > n_pos:
            idx = np.random.choice(len(negative_acts), n_pos, replace=False)
            negative_acts = negative_acts[idx]
        
        X = np.vstack([positive_acts, negative_acts])
        y = np.array([1] * len(positive_acts) + [0] * len(negative_acts))
        
        # Train probe
        probe = LogisticRegression(max_iter=1000, random_state=42)
        
        # Cross-validation accuracy
        cv_scores = cross_val_score(probe, X, y, cv=5)
        
        # Fit on all data
        probe.fit(X, y)
        
        # Predict on bonded
        bonded_probs = probe.predict_proba(bonded_acts)[:, 1]
        
        is_component = target_aq in component_names
        
        results[target_aq] = {
            "cv_accuracy": np.mean(cv_scores),
            "cv_std": np.std(cv_scores),
            "bonded_detection_prob": np.mean(bonded_probs),
            "bonded_detection_std": np.std(bonded_probs),
            "is_component": is_component,
            "expected_detection": "HIGH" if is_component else "LOW"
        }
    
    return results

## 6. Run Experiment

In [None]:
# Define experimental conditions
SINGLE_AQ = ["THREAT", "PROXIMITY", "DIRECTION", "URGENCY", "AGENT_INTENT", "RESOURCE"]

BONDED_CONDITIONS = [
    ["THREAT", "PROXIMITY"],
    ["THREAT", "DIRECTION"],
    ["THREAT", "URGENCY"],
    ["PROXIMITY", "DIRECTION"],
    ["THREAT", "PROXIMITY", "DIRECTION"],
    ["THREAT", "PROXIMITY", "URGENCY"],
    ["THREAT", "AGENT_INTENT", "PROXIMITY"],
]

print(f"Single AQ conditions: {len(SINGLE_AQ)}")
print(f"Bonded conditions: {len(BONDED_CONDITIONS)}")

In [None]:
def run_experiment_for_model(model_name: str, model_path: str) -> Dict:
    """Run full experiment for one model.
    
    Args:
        model_name: Display name
        model_path: HuggingFace path
        
    Returns:
        Dict with all results
    """
    print(f"\n{'='*60}")
    print(f"Running experiment for: {model_name}")
    print(f"{'='*60}")
    
    # Load model
    model, tokenizer, n_layers = load_model(model_path)
    
    # Select layers to probe (early, middle, late)
    layers = [0, n_layers // 4, n_layers // 2, 3 * n_layers // 4, n_layers - 1]
    print(f"Probing layers: {layers}")
    
    results = {
        "model": model_name,
        "layers": layers,
        "layer_results": {}
    }
    
    # Generate prompts
    print("\nGenerating prompts...")
    single_prompts = {aq: generate_single_aq_prompts(aq, config.prompts_per_condition) 
                     for aq in SINGLE_AQ}
    bonded_prompts = {tuple(cond): generate_bonded_prompts(cond, config.prompts_per_condition)
                     for cond in BONDED_CONDITIONS}
    
    # Extract activations for single AQ
    print("\nExtracting single AQ activations...")
    single_activations = {}  # {layer: {aq_name: activations}}
    for layer in layers:
        single_activations[layer] = {}
    
    for aq_name, prompts in single_prompts.items():
        print(f"  {aq_name}...")
        acts = extract_activations(model, tokenizer, prompts, layers)
        for layer in layers:
            single_activations[layer][aq_name] = acts[layer]
    
    # Extract activations for bonded conditions
    print("\nExtracting bonded activations...")
    bonded_activations = {}  # {layer: {condition: activations}}
    for layer in layers:
        bonded_activations[layer] = {}
    
    for condition, prompts in bonded_prompts.items():
        print(f"  {' + '.join(condition)}...")
        acts = extract_activations(model, tokenizer, prompts, layers)
        for layer in layers:
            bonded_activations[layer][condition] = acts[layer]
    
    # Analyze each layer
    print("\nAnalyzing decomposition...")
    for layer in layers:
        print(f"\nLayer {layer}:")
        layer_results = {
            "similarity_tests": {},
            "probe_tests": {}
        }
        
        # Compute centroids for single AQ
        centroids = compute_aq_centroids(single_activations[layer])
        
        # Test each bonded condition
        for condition, acts in bonded_activations[layer].items():
            condition_name = " + ".join(condition)
            
            # Component vs unrelated similarity
            component_centroids = [centroids[aq] for aq in condition]
            unrelated_aq = [aq for aq in SINGLE_AQ if aq not in condition]
            unrelated_centroids = [centroids[aq] for aq in unrelated_aq]
            
            sim_results = component_similarity_test(acts, component_centroids, unrelated_centroids)
            layer_results["similarity_tests"][condition_name] = sim_results
            
            print(f"  {condition_name}: comp_sim={sim_results['mean_component_sim']:.3f}, "
                  f"unrel_sim={sim_results['mean_unrelated_sim']:.3f}, "
                  f"diff={sim_results['sim_difference']:.4f}")
            
            # Linear probe detection
            probe_results = linear_probe_detection(
                single_activations[layer], acts, list(condition)
            )
            layer_results["probe_tests"][condition_name] = probe_results
        
        results["layer_results"][layer] = layer_results
    
    # Cleanup
    del model
    gc.collect()
    if DEVICE == "cuda":
        torch.cuda.empty_cache()
    
    return results


# Run for all models
all_results = {}
for model_name, model_path in config.models.items():
    try:
        all_results[model_name] = run_experiment_for_model(model_name, model_path)
    except Exception as e:
        print(f"Error with {model_name}: {e}")
        continue

## 7. Results Visualization

In [None]:
def plot_similarity_analysis(results: Dict) -> None:
    """Plot component vs unrelated similarity across layers."""
    
    n_models = len(results)
    fig, axes = plt.subplots(1, n_models, figsize=(7 * n_models, 5))
    if n_models == 1:
        axes = [axes]
    
    for ax, (model_name, model_results) in zip(axes, results.items()):
        layers = model_results["layers"]
        
        # Collect similarity differences per layer
        layer_diffs = {layer: [] for layer in layers}
        
        for layer in layers:
            for cond_name, sim_res in model_results["layer_results"][layer]["similarity_tests"].items():
                layer_diffs[layer].append(sim_res["sim_difference"])
        
        # Plot
        means = [np.mean(layer_diffs[l]) for l in layers]
        stds = [np.std(layer_diffs[l]) for l in layers]
        
        ax.errorbar(layers, means, yerr=stds, marker='o', capsize=5, linewidth=2, markersize=8)
        ax.axhline(y=0, color='r', linestyle='--', alpha=0.5, label='No difference')
        ax.set_xlabel('Layer', fontsize=12)
        ax.set_ylabel('Component - Unrelated Similarity', fontsize=12)
        ax.set_title(f'{model_name}\nComponent Similarity Advantage', fontsize=14)
        ax.legend()
        ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.savefig('035F_similarity_analysis.png', dpi=150, bbox_inches='tight')
    plt.show()


if all_results:
    plot_similarity_analysis(all_results)

In [None]:
def plot_probe_detection(results: Dict) -> None:
    """Plot linear probe detection rates."""
    
    for model_name, model_results in results.items():
        layers = model_results["layers"]
        
        # Use the last layer (most processed)
        final_layer = layers[-1]
        probe_data = model_results["layer_results"][final_layer]["probe_tests"]
        
        # Collect detection rates
        fig, axes = plt.subplots(2, 4, figsize=(16, 8))
        axes = axes.flatten()
        
        for idx, (condition_name, aq_results) in enumerate(probe_data.items()):
            if idx >= len(axes):
                break
            
            ax = axes[idx]
            
            aq_names = list(aq_results.keys())
            detection_probs = [aq_results[aq]["bonded_detection_prob"] for aq in aq_names]
            is_component = [aq_results[aq]["is_component"] for aq in aq_names]
            
            colors = ['green' if ic else 'gray' for ic in is_component]
            
            bars = ax.bar(range(len(aq_names)), detection_probs, color=colors)
            ax.set_xticks(range(len(aq_names)))
            ax.set_xticklabels([aq[:4] for aq in aq_names], rotation=45, ha='right')
            ax.set_ylabel('Detection Probability')
            ax.set_title(f'{condition_name}')
            ax.set_ylim(0, 1)
            ax.axhline(y=0.5, color='r', linestyle='--', alpha=0.5)
        
        # Remove empty subplots
        for idx in range(len(probe_data), len(axes)):
            axes[idx].axis('off')
        
        plt.suptitle(f'{model_name} - Layer {final_layer}\nAQ Detection in Bonded States\n(Green = component AQ, Gray = unrelated AQ)', 
                     fontsize=14)
        plt.tight_layout()
        plt.savefig(f'035F_probe_detection_{model_name}.png', dpi=150, bbox_inches='tight')
        plt.show()


if all_results:
    plot_probe_detection(all_results)

In [None]:
def plot_detection_summary(results: Dict) -> None:
    """Summary plot: component vs non-component detection rates."""
    
    fig, axes = plt.subplots(1, len(results), figsize=(6 * len(results), 5))
    if len(results) == 1:
        axes = [axes]
    
    for ax, (model_name, model_results) in zip(axes, results.items()):
        layers = model_results["layers"]
        
        component_rates = []
        noncomponent_rates = []
        
        for layer in layers:
            layer_comp = []
            layer_noncomp = []
            
            for condition_name, aq_results in model_results["layer_results"][layer]["probe_tests"].items():
                for aq_name, res in aq_results.items():
                    if res["is_component"]:
                        layer_comp.append(res["bonded_detection_prob"])
                    else:
                        layer_noncomp.append(res["bonded_detection_prob"])
            
            component_rates.append(np.mean(layer_comp))
            noncomponent_rates.append(np.mean(layer_noncomp))
        
        ax.plot(layers, component_rates, 'g-o', label='Component AQ', linewidth=2, markersize=8)
        ax.plot(layers, noncomponent_rates, 'gray', marker='s', linestyle='--', 
                label='Non-component AQ', linewidth=2, markersize=8)
        ax.axhline(y=0.5, color='r', linestyle=':', alpha=0.5, label='Chance')
        
        ax.set_xlabel('Layer', fontsize=12)
        ax.set_ylabel('Mean Detection Probability', fontsize=12)
        ax.set_title(f'{model_name}\nAQ Decomposition Evidence', fontsize=14)
        ax.legend()
        ax.set_ylim(0, 1)
        ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.savefig('035F_detection_summary.png', dpi=150, bbox_inches='tight')
    plt.show()


if all_results:
    plot_detection_summary(all_results)

## 8. Statistical Summary

In [None]:
def compute_effect_sizes(results: Dict) -> None:
    """Compute and report effect sizes for compositional bonding."""
    
    print("\n" + "="*70)
    print("STATISTICAL SUMMARY: AQ COMPOSITIONAL BONDING")
    print("="*70)
    
    for model_name, model_results in results.items():
        print(f"\n### {model_name} ###")
        
        final_layer = model_results["layers"][-1]
        
        # Similarity analysis
        print(f"\nSimilarity Analysis (Layer {final_layer}):")
        sim_tests = model_results["layer_results"][final_layer]["similarity_tests"]
        
        all_diffs = []
        for cond_name, res in sim_tests.items():
            diff = res["sim_difference"]
            all_diffs.append(diff)
            p_val = res.get("p_value", np.nan)
            sig = "*" if p_val < 0.05 else ""
            print(f"  {cond_name}: diff = {diff:.4f} {sig}")
        
        mean_diff = np.mean(all_diffs)
        print(f"  Mean difference: {mean_diff:.4f}")
        
        # Probe detection analysis
        print(f"\nProbe Detection Analysis (Layer {final_layer}):")
        probe_tests = model_results["layer_results"][final_layer]["probe_tests"]
        
        all_comp_rates = []
        all_noncomp_rates = []
        
        for cond_name, aq_results in probe_tests.items():
            for aq_name, res in aq_results.items():
                if res["is_component"]:
                    all_comp_rates.append(res["bonded_detection_prob"])
                else:
                    all_noncomp_rates.append(res["bonded_detection_prob"])
        
        mean_comp = np.mean(all_comp_rates)
        mean_noncomp = np.mean(all_noncomp_rates)
        
        # Cohen's d
        pooled_std = np.sqrt((np.std(all_comp_rates)**2 + np.std(all_noncomp_rates)**2) / 2)
        if pooled_std > 0:
            cohens_d = (mean_comp - mean_noncomp) / pooled_std
        else:
            cohens_d = 0
        
        # t-test
        t_stat, p_val = stats.ttest_ind(all_comp_rates, all_noncomp_rates)
        
        print(f"  Component AQ detection: {mean_comp:.3f} +/- {np.std(all_comp_rates):.3f}")
        print(f"  Non-component AQ detection: {mean_noncomp:.3f} +/- {np.std(all_noncomp_rates):.3f}")
        print(f"  Difference: {mean_comp - mean_noncomp:.3f}")
        print(f"  Cohen's d: {cohens_d:.3f}")
        print(f"  t-statistic: {t_stat:.3f}, p-value: {p_val:.6f}")
        
        # Interpretation
        print(f"\nInterpretation:")
        if mean_diff > 0.01 and cohens_d > 0.5 and p_val < 0.05:
            print(f"  SUPPORTS compositional bonding hypothesis.")
            print(f"  Bonded states show detectable component signatures.")
        elif mean_diff > 0 and cohens_d > 0.2:
            print(f"  WEAK evidence for compositional bonding.")
            print(f"  Some component detection, but effect is small.")
        else:
            print(f"  DOES NOT support compositional bonding hypothesis.")
            print(f"  No significant difference in component vs non-component detection.")


if all_results:
    compute_effect_sizes(all_results)

## 9. Conclusions

This experiment tests whether AQ are **compositional primitives** by examining
whether bonded states (multiple AQ in one prompt) decompose into detectable
component signatures.

**Key Findings:**

1. **Similarity Analysis**: Do bonded patterns show higher similarity to their
   component AQ centroids than to unrelated AQ?
   - Positive difference = evidence for compositional structure
   
2. **Linear Probe Detection**: Can we detect the presence of each component AQ
   within the bonded representation?
   - Component detection > non-component detection = evidence for decomposition

**Implications for AKIRA Theory:**

If compositional bonding is confirmed:
- AQ function as **irreducible primitives** that combine to form complex action
  representations
- The belief field processes AQ compositionally, maintaining component information
- This supports the "superposition to crystallization" model where multiple AQ
  can coexist and be independently detected

If compositional bonding is NOT confirmed:
- AQ may be **emergent patterns** rather than compositional primitives
- Bonding may create genuinely new representations rather than preserving components
- Alternative models needed for how multiple action discriminations combine