#**CHAPTER 2. MULTIMODAL FAILURE MODES**
---

##REFERENCE

https://chatgpt.com/share/699466a2-db08-8012-94aa-81e27ba6d44a

##0.CONTEXT

**Introduction**

This chapter is about an uncomfortable truth: multimodal alignment is not a property you “get” once and then keep. It is a fragile arrangement between measurement channels. When it works, it looks like magic—images and text become commensurable, meaning appears to “live” in the shared embedding space, and retrieval feels effortless. When it fails, it rarely fails loudly. It fails quietly: by becoming asymmetric, by over-trusting one modality, by learning shortcuts that look like meaning, or by collapsing into a geometry that still produces a plausible loss curve but no longer preserves structure.  

The aim of Chapter 2 is therefore not to celebrate multimodality. It is to teach students how to distrust it intelligently. We treat alignment as an engineered hypothesis, not as a metaphysical achievement. We show how the shared space can be driven by dominance, corrupted by confounding, destabilized by pairing noise, and destroyed by collapse. Then we show how to detect those failure modes using probes that are structural rather than cosmetic—probes that interrogate geometry, information flow, and sensitivity to perturbations.  

The pedagogy is intentionally rigorous: you will learn to interpret the embedding space as an object with failure signatures, not as a picture with vibes. You will also learn a professional stance: every multimodal claim must be paired with evidence artifacts and explicit open questions. This chapter is therefore a bridge between frontier capability and governance-first practice.

**Part 1. The Theory**

The central theoretical idea behind multimodal learning is that different observation channels can be brought into a compatible coordinate system. But “compatible” is a stronger word than it sounds. It means that two things are simultaneously true:

First, items that are truly paired—an image and its text description, an audio clip and its transcript, a medical scan and its report—should land near each other in the shared space. That is the most visible, operational face of alignment: retrieval.  

Second, the shared space should preserve the meaningful structure of the world that generated those observations. This is the deeper, less visible face of alignment: the embedding space should represent latent factors in a way that is stable under natural variations and resistant to shortcuts.

Contrastive learning, especially in the InfoNCE family, is a mechanism for building that space. In a batch of pairs, the model is trained so that each item is more similar to its true partner than to mismatched partners. This seems straightforward: “pull positives together, push negatives apart.” But the geometry of that operation depends on what counts as a negative. If negatives are easy—obviously unrelated—the model can win without learning anything deep. If negatives are hard—similar but incorrect—the model is forced to represent subtle distinctions. In other words, the contrastive objective is not just a loss function; it is a curriculum defined by the negative set.

This is where the fragility enters. Alignment does not guarantee that the model learned the intended factors. It guarantees only that it learned whatever features make positives distinguishable from negatives. If there exists a shared shortcut that predicts pairing—something present in both modalities but unrelated to the meaning we care about—the model can align on that. This is spurious alignment: the model appears aligned because retrieval is high, but the space is aligned to the wrong thing.

Similarly, if one modality is “cleaner” or “stronger” in a statistical sense—lower noise, higher capacity encoder, larger gradient signal—the shared space can become dominated. Dominance means the space becomes effectively anchored in the strong modality, while the other modality is forced into it with reduced influence. This often yields asymmetry: text can retrieve images well, but images retrieve text poorly, or vice versa. Crucially, dominance can look acceptable under a single metric, especially if you evaluate only one direction.

Finally, there is collapse. Collapse is not mere underperformance. It is a geometrical degeneration where embeddings lose diversity—many points become too similar, variance shrinks, effective rank collapses. Contrastive learning has incentives that can accidentally push toward collapse under certain hyperparameter regimes: temperature too low makes softmax distributions too sharp; learning rate too high makes updates unstable; excessive regularization can suppress representational diversity; numerical issues can flatten gradients. Collapse can be subtle because the training loss may still decrease. The system is optimizing something, but not the thing you think.

Therefore the theory lesson is this: multimodal alignment is a negotiated geometry shaped by objectives, batch composition, modality statistics, and optimizer dynamics. It is not a guarantee of semantic truth. It is a hypothesis: “the shared space represents the intended factors.” Chapter 2 teaches how to attack that hypothesis and how to measure when it fails.

**Part 2. Definitions of Key Ideas (Latent Space, Alignment, Dominance, Confounding, Collapse)**

**Latent factors** are the hidden causes that generate the observations we see. In our synthetic world, these are explicit: shape type, orientation, frequency, phase, thickness. In real-world multimodal systems, latent factors might include object identity, lighting, camera angle, speaker identity, accent, clinical condition severity, or domain context.

**Latent space** is a representational coordinate system that encodes those latent factors. It is “latent” because the coordinates are not directly observed in the raw data; they are inferred by the model. In multimodal learning, the latent space is shared: images and text are mapped into the same coordinate system so that comparisons are meaningful.

**Embedding** is the concrete representation of a single item in that latent space. It is typically a vector in a fixed-dimensional Euclidean space. When we say “the embedding space has geometry,” we mean that distances and angles between vectors have meaning: near means related, far means distinct, direction means factor change, and clusters imply shared structure.

**Alignment** is the property that paired items map to nearby embeddings and that the mapping is consistent across modalities. Consistency includes symmetry: an image should find its text partner and a text should find its image partner. Alignment is therefore not only proximity; it is reciprocal retrievability.

**Contrastive objective (InfoNCE)** is the training mechanism that encourages alignment by using a batch-level discrimination task. Each item must identify its correct partner among many candidates. The “temperature” parameter scales logits and controls how sharply the model must separate positives from negatives.

**Modality dominance** occurs when one modality exerts disproportionate influence on training dynamics or on the learned geometry. Dominance can arise from lower noise, higher model capacity, stronger gradients, or explicit scaling (gain) that changes embedding magnitude before normalization. Dominance typically manifests as asymmetry, gradient imbalance, or skewed spectral properties between modalities.

**Confounding / spurious correlation** refers to a shared feature that predicts pairing but is unrelated to the intended latent factors. In practice, confounders can be metadata artifacts, watermarks, file naming conventions, dataset ordering, or repeated templates. In our synthetic lab, a confounder can be a parity marker injected into both image and text. The model can align on this marker and achieve high retrieval while ignoring the true factors.

**Pairing corruption** is the presence of incorrect pairs in training data, such as mismatched captions or wrong audio transcripts. Corruption can destabilize training, encourage shortcut learning, or produce representations that look “robust” to noise only because they stopped caring about the intended semantics.

**Representation collapse** is a degeneracy where embeddings lose diversity and become too similar. Operationally, collapse is detected by high average cosine similarity between different samples, low per-dimension variance, low effective rank, and a flattened singular value spectrum of the covariance matrix. Collapse is not just “bad accuracy”; it is “bad geometry.”

**Structural probes** are diagnostic measurements that interrogate internal structure rather than surface metrics. In this chapter we use three probes: an information proxy (how much embedding codes relate to true factors), a correlation proxy (CCA-like principal correlations between modalities), and a sensitivity proxy (how much one update step changes retrieval under a perturbation). Probes are designed to answer: “What is the model actually representing, and how fragile is it?”

**Part 3. Methodology**

The methodology of Chapter 2 is adversarial by design. We start from a baseline aligner that works, then we attack it with controlled interventions. The core methodological commitment is that every failure mode must be produced deliberately and measured with multiple, independent indicators. We do not accept a single metric as proof of anything.

**Step 1: Build a controlled synthetic multimodal world**  
We construct two modalities from shared latent factors. The image modality is a small matrix with controllable patterns. The text modality is a deterministic token grammar encoding the same factors. This ensures there is a correct alignment and that we know what “meaning” should be.

**Step 2: Train a baseline aligner with symmetry**  
We implement two encoders (2-layer MLPs), one per modality, mapping inputs to normalized embeddings. We train using a symmetric InfoNCE objective so both directions are enforced. Baseline evaluation includes retrieval metrics in both directions, symmetry gap, and geometric statistics.

**Step 3: Instrument the training process with early warning indicators**  
We log trends in embedding variance, mean off-diagonal cosine similarity, effective rank, and gradient norms per modality. The point is not to create more plots; it is to detect the onset of failure as a process, not only as an outcome. Many failures begin as subtle drifts.

**Step 4: Induce modality dominance and measure asymmetry**  
We create dominance by scaling one modality, reducing noise for one modality, or increasing capacity on one side. We measure retrieval asymmetry, gradient imbalance indices, and differences in spectral summaries. Students learn that dominance is not an abstract idea; it is a measurable skew in learning dynamics.

**Step 5: Induce spurious alignment via confounders and validate with counterfactual removal**  
We inject a shared confounder into both modalities that strongly predicts pairing. We then test whether the model relied on it by removing the confounder at test time. If retrieval collapses under removal, the alignment was shortcut-based. We also track whether mutual information with true factors decreases when confounding increases.

**Step 6: Inject pairing corruption and observe stability**  
We swap a fraction of training pairs. We then observe how loss, retrieval, and probes react. Pair corruption can create a space that is internally inconsistent: it may learn partial shortcuts, degrade factor encoding, or become hypersensitive to batch composition.

**Step 7: Induce collapse and detect it structurally**  
We manipulate temperature, learning rate, weight decay, and even simulate a “stop-gradient” bug to force collapse-like behavior. We do not treat collapse as a mystical phenomenon; we treat it as an engineering failure with known triggers and measurable signatures.

**Step 8: Use structural probes to interpret results**  
Retrieval tells you whether matching works. Probes tell you what is being matched. We compute a mutual-information proxy between embeddings and true factors, a CCA-like correlation proxy between modality embedding sets, and an influence proxy that estimates sensitivity to one update step under perturbation. These probes triangulate meaning, alignment strength, and fragility.

**Step 9: Export all results as governed deliverables**  
Every run writes a manifest, logs, strict JSON reports, and plots. This is part of the pedagogy: students learn that interpretation is not complete until it is reviewable. The chapter is not only about models; it is about producing evidence.

**Part 4. Deliverables**

This chapter is built as a laboratory, and the outputs are structured accordingly. Each notebook run produces an audit bundle that can be handed to a reviewer or used in class to compare runs.

**Run manifest**  
A JSON file that records the run identifier, timestamp, environment fingerprint, configuration hash, and artifact list. This is the “index of evidence.”

**Prompts log**  
A JSONL file containing redacted prompt summaries and hashes. The purpose is traceability: you can verify what instruction set produced the run without leaking unnecessary text.

**Risk log**  
A JSON file documenting the risk taxonomy relevant to multimodal alignment: confounding, dominance, collapse, leakage, and metric gaming. It includes the controls implemented in the notebook and explicit open questions. The verification status is always “Not verified” to enforce epistemic humility.

**Baseline report**  
A strict JSON report containing baseline retrieval metrics (both directions), symmetry gap, embedding variance and cosine indicators, spectral summaries, and structural probes (MI proxy, CCA proxy, influence proxy). This establishes the reference state.

**Experiment Suite A artifacts (dominance and asymmetric noise)**  
A grid of conditions and their outcomes, exported as JSON plus plots showing degradation curves and symmetry gaps. This teaches students to think in surfaces and response curves rather than in single numbers.

**Experiment Suite B artifacts (confounder and corruption)**  
A grid of confounder strengths and corruption rates, with counterfactual removal results. The central deliverable is the signature of spurious alignment: high retrieval when confounder is present, collapse of retrieval when confounder is removed, and MI shift toward the confounder.

**Collapse report**  
A JSON report plus plots showing variance, mean cosine, effective rank, and retrieval behavior under collapse triggers. This teaches students to recognize collapse as a geometric phenomenon, not merely as a performance drop.

**Summary report**  
A compact strict JSON summary consolidating the most important outcomes across all suites: best/worst symmetry gaps, largest counterfactual drops, and strongest collapse indicators. This becomes the classroom discussion anchor: students can debate which failure modes are most dangerous and which diagnostics are most informative.

**Zipped audit bundle**  
A single archive containing all the above. The bundle is the final deliverable because it allows review and comparison across runs. In practice, this is how you build a curriculum: you distribute bundles, not screenshots.

By the end of this chapter, students should be able to do more than train a multimodal aligner. They should be able to challenge it. They should know what questions to ask when someone presents a beautiful embedding plot. They should know how to detect when “alignment” is actually dominance or confounding. Most importantly, they should learn the frontier posture that matters in professional settings: treat multimodal learning as a fragile engineered hypothesis, and demand evidence that survives adversarial tests.


##1.LIBRARIES AND ENVIRONMENT

**Cell 1 — Runtime Contract, Determinism, and Configuration**

This cell is the “lab contract.” Before we touch multimodal learning, we decide what kind of knowledge we are allowed to claim. In Chapter 2, that matters because many multimodal failures are subtle and can be mistaken for “randomness.” Determinism is how we turn subtle drift into something we can reproduce, diagnose, and fix.

The core idea is that experiments are only meaningful if you can rerun them and get the same evidence. So we lock seeds for Python and NumPy, and we centralize configuration in dataclasses. This is not cosmetic. A multimodal system can look stable in one run and collapse in another purely due to initialization or batch order. If you do not control that, you cannot distinguish “true failure mode” from “noise of experimentation.”

Configuration dataclasses also teach an engineering habit: we treat design choices as explicit parameters, not hidden constants scattered through the code. When we later change temperature, noise, confounder strength, or learning rate, we want those changes to be visible and attributable. That is the difference between a demo and a governed experiment.

Path setup is similarly purposeful. Every run must write artifacts to a predictable location so that review does not depend on memory. This matters because Chapter 2 is about failure modes: if an experiment shows dominance or spurious alignment, you want to preserve the evidence (plots, metrics, logs) even if you later refactor the notebook.

Finally, this cell sets the tone for pedagogy: we are not doing “model training” as an end in itself. We are doing “controlled scientific experiments” on alignment. If students internalize only one lesson from this cell, it should be this: in multimodality, you cannot reason confidently about geometry unless your run is reproducible. The embedding space is not a picture; it is a measurement, and measurements require controls.


In [5]:
# === Cell 1 ===
# Title: Runtime Contract, Determinism, and Configuration
# Brief Explanation: Establish reproducible execution (seeds, paths, config) so every geometry claim is rerunnable and auditable.

import os, sys, json, math, time, zipfile, hashlib, random, datetime, platform
from dataclasses import dataclass, asdict
from typing import Dict, Any, Tuple, List, Optional

import numpy as np
import matplotlib.pyplot as plt

def utc_now_iso() -> str:
    return datetime.datetime.now(datetime.timezone.utc).isoformat()

def set_global_determinism(seed: int) -> None:
    random.seed(seed)
    np.random.seed(seed)

def sha256_bytes(b: bytes) -> str:
    return hashlib.sha256(b).hexdigest()

def stable_json_dumps(obj: Any) -> str:
    return json.dumps(obj, ensure_ascii=False, sort_keys=True, separators=(",", ":"))

def ensure_dir(p: str) -> None:
    os.makedirs(p, exist_ok=True)

@dataclass(frozen=True)
class Paths:
    root: str = "deliverables"
    plots: str = "deliverables/plots"
    ckpt: str = "deliverables/checkpoints"
    runs: str = "deliverables/runs"
    exp: str = "deliverables/experiments"
    stress: str = "deliverables/stress"
    gates: str = "deliverables/gates"

@dataclass(frozen=True)
class DataConfig:
    n: int = 2400
    image_side: int = 16
    text_seq_len: int = 9
    vocab_size: int = 64
    noise_image: float = 0.06
    noise_text: float = 0.02
    train_frac: float = 0.70
    val_frac: float = 0.15
    # Confounder controls
    confounder_enabled: bool = False
    confounder_strength: float = 1.0   # scaling of confounder channel in both modalities
    confounder_type: str = "parity"    # "parity" or "batch_id"
    pairing_corruption_rate: float = 0.0

@dataclass(frozen=True)
class ModelConfig:
    embed_dim: int = 48
    hidden: int = 256
    leak: float = 0.01
    temp: float = 0.07
    # Dominance controls
    gain_img: float = 1.0     # multiplicative gain pre-normalization
    gain_txt: float = 1.0
    cap_img_hidden_mult: float = 1.0  # allow capacity mismatch
    cap_txt_hidden_mult: float = 1.0

@dataclass(frozen=True)
class TrainConfig:
    seed: int = 7
    epochs: int = 45
    batch: int = 160
    lr: float = 2e-3
    weight_decay: float = 8e-5
    b1: float = 0.9
    b2: float = 0.999
    eps: float = 1e-8
    grad_clip: float = 5.0
    # For sweeps / collapse induction
    short_epochs: int = 18

@dataclass(frozen=True)
class ProbeConfig:
    mi_bins: int = 4
    mi_dims: int = 8
    cca_topk: int = 8
    influence_eval_n: int = 256
    influence_batch_n: int = 128
    ridge: float = 1e-3

@dataclass(frozen=True)
class Config:
    paths: Paths = Paths()
    data: DataConfig = DataConfig()
    model: ModelConfig = ModelConfig()
    train: TrainConfig = TrainConfig()
    probe: ProbeConfig = ProbeConfig()

CFG = Config()
set_global_determinism(CFG.train.seed)
for p in asdict(CFG.paths).values():
    ensure_dir(p)

print("RUN CONTRACT")
print("  utc_now:", utc_now_iso())
print("  python :", sys.version.split()[0])
print("  numpy  :", np.__version__)
print("  platform:", platform.platform())
print("  seed   :", CFG.train.seed)
print("  outdir :", CFG.paths.root)


RUN CONTRACT
  utc_now: 2026-02-17T13:14:34.966389+00:00
  python : 3.12.12
  numpy  : 2.0.2
  platform: Linux-6.6.105+-x86_64-with-glibc2.35
  seed   : 7
  outdir : deliverables


##2.GOVERNANCE ARTIFACTS

###2.1.OVERVIEW

**Cell 2 — Governance Artifacts (Manifest, Strict JSON, Redacted Prompt Log, Risk Log)**

This cell turns the notebook into a reviewable laboratory. Multimodal alignment is a frontier topic precisely because it is easy to over-interpret. A single PCA plot can feel persuasive even when the model is relying on shortcuts. Governance artifacts are how we prevent persuasion from replacing evidence.

The run manifest is the index of evidence. It records what was run, when, under what environment fingerprint, and with what configuration hash. That means a reviewer can later ask: “Are these results from the same conditions as last week?” and you can answer with something stronger than memory. The manifest also lists artifacts and their hashes, so the bundle becomes tamper-evident: if a plot or report changes, the hash changes.

Strict JSON reporting matters because it enforces epistemic hygiene. In this chapter we constantly risk confusing measured facts (retrieval metrics, variance, effective rank) with interpretations (“the model learned meaning”). The schema forces separation: facts_provided is what was computed; assumptions declares what we are taking on faith; open_items names what remains unresolved; verification_status stays “Not verified” to avoid accidental claims of certainty.

The redacted prompt log is part of professional traceability. Even when the notebook is local, we treat “instructions that produced the run” as an audit object. The prompt is hashed and a redacted excerpt is stored. That practice scales to real production pipelines where prompts, configurations, and data versions must be traceable.

The risk log is the most important pedagogical artifact. It names the failure modes we are about to demonstrate—confounding, dominance, collapse, leakage, metric gaming—and records the controls implemented to detect them. In other words, we do not pretend that alignment is safe; we treat it as a risk-managed mechanism. This framing is exactly what students need when they leave the classroom and start evaluating vendor models or building internal systems.


###2.2.CODE AND IMPLEMENTATION

In [6]:
# === Cell 2 ===
# Title: Governance Artifacts (Manifest, Strict JSON, Redacted Prompt Log, Risk Log)
# Brief Explanation: Convert the notebook into a reviewable lab by writing auditable artifacts with explicit assumptions and open questions.

class ArtifactManager:
    def __init__(self, paths: Paths, cfg: Config):
        self.paths = paths
        self.cfg = cfg
        self.run_id = sha256_bytes(f"{utc_now_iso()}|{cfg.train.seed}|{os.getpid()}".encode())[:16]
        self.start_utc = utc_now_iso()
        self.prompts: List[Dict[str, Any]] = []
        self.risks: Dict[str, Any] = {}
        self.env: Dict[str, Any] = {
            "python": sys.version,
            "numpy": np.__version__,
            "platform": platform.platform(),
            "pid": os.getpid(),
            "cwd": os.getcwd(),
            "utc_start": self.start_utc,
        }
        self.cfg_hash = sha256_bytes(stable_json_dumps(asdict(cfg)).encode())
        self.manifest: Dict[str, Any] = {
            "run_id": self.run_id,
            "utc_start": self.start_utc,
            "config_hash_sha256": self.cfg_hash,
            "env": self.env,
            "artifacts": {},
            "verification_status": "Not verified",
        }

    @staticmethod
    def _redact_text(s: str, keep: int = 160) -> str:
        s = s.replace("\n", " ").strip()
        if len(s) <= keep:
            return s
        return s[:keep] + "…[REDACTED]"

    def log_prompt(self, name: str, prompt: str) -> None:
        red = self._redact_text(prompt)
        h = sha256_bytes(prompt.encode())
        self.prompts.append({
            "utc": utc_now_iso(),
            "name": name,
            "prompt_redacted": red,
            "prompt_sha256": h,
        })

    def write_json_strict(self, path: str,
                          facts_provided: Any,
                          assumptions: Any,
                          open_items: List[Any],
                          analysis: str,
                          draft_output: Any,
                          verification_status: str = "Not verified",
                          questions_to_verify: Optional[List[str]] = None) -> None:
        if questions_to_verify is None:
            questions_to_verify = []
        obj = {
            "facts_provided": facts_provided,
            "assumptions": assumptions,
            "open_items": open_items,
            "analysis": analysis,
            "draft_output": draft_output,
            "verification_status": verification_status,
            "questions_to_verify": questions_to_verify,
        }
        with open(path, "w", encoding="utf-8") as f:
            f.write(stable_json_dumps(obj))
        self.manifest["artifacts"][os.path.relpath(path, ".")] = {
            "sha256": sha256_bytes(open(path, "rb").read()),
            "bytes": os.path.getsize(path),
        }

    def write_prompts_log(self) -> str:
        path = os.path.join(self.paths.root, "prompts_log.jsonl")
        with open(path, "w", encoding="utf-8") as f:
            for rec in self.prompts:
                f.write(stable_json_dumps(rec) + "\n")
        self.manifest["artifacts"][os.path.relpath(path, ".")] = {
            "sha256": sha256_bytes(open(path, "rb").read()),
            "bytes": os.path.getsize(path),
        }
        return path

    def write_risk_log(self, risks_obj: Dict[str, Any]) -> str:
        path = os.path.join(self.paths.root, "risk_log.json")
        with open(path, "w", encoding="utf-8") as f:
            f.write(stable_json_dumps(risks_obj))
        self.manifest["artifacts"][os.path.relpath(path, ".")] = {
            "sha256": sha256_bytes(open(path, "rb").read()),
            "bytes": os.path.getsize(path),
        }
        return path

    def write_manifest(self) -> str:
        path = os.path.join(self.paths.root, "run_manifest.json")
        self.manifest["utc_end"] = utc_now_iso()
        self.manifest["verification_status"] = "Not verified"
        with open(path, "w", encoding="utf-8") as f:
            f.write(stable_json_dumps(self.manifest))
        return path

AM = ArtifactManager(CFG.paths, CFG)
AM.log_prompt(
    "Chapter2_Notebook_Request",
    "Build a governed multimodal failure-modes laboratory: dominance, spurious alignment, corruption, collapse; include MI/CCA/influence probes; export audit bundle."
)

RISKS = {
    "utc": utc_now_iso(),
    "run_id": AM.run_id,
    "verification_status": "Not verified",
    "risk_taxonomy": {
        "confounding_spurious_shortcuts": "Model aligns on nuisance shared marker rather than intended factors.",
        "dominance_modality_imbalance": "One modality drives gradients/geometry; other becomes passenger; asymmetry emerges.",
        "collapse_degenerate_embeddings": "Embeddings lose diversity (high mean cosine, low effective rank) yet loss may appear stable.",
        "train_test_leakage": "Pairs or confounders leak across splits, inflating retrieval.",
        "metric_gaming_overfit": "Optimizing retrieval@k can hide factor erasure; geometry becomes brittle.",
    },
    "controls_implemented": {
        "determinism": "Global seeds; fixed splits; reproducibility checks.",
        "split_hygiene": "Explicit train/val/test indices with overlap assertions.",
        "controlled_sweeps": "Grid experiments for dominance/noise/confounder/corruption/collapse.",
        "counterfactual_tests": "Remove confounder at test time to detect shortcut reliance.",
        "early_warning_indicators": "Variance, mean cosine, spectra, gradient norms logged per epoch.",
        "structural_probes": "MI proxy to factors; CCA-like principal correlations; influence/sensitivity proxy via one-step perturbation.",
    },
    "open_questions": [
        "How do these synthetic failure signatures map to real-world multimodal datasets and pretraining pipelines?",
        "Which controls best predict downstream brittleness under domain shift?",
        "What acceptance thresholds are appropriate for professional deployment contexts?",
    ],
}
print("Governance initialized:", AM.run_id)


Governance initialized: b85cceee04b1ea66


##3.SYNTHETIC MULTIMODAL WORLD

###3.1.OVERVIEW

**Cell 3 — Synthetic Multimodal World with True Factors, Optional Confounders, and Pair Corruption**

This cell constructs the world in which “meaning” is defined. That is the methodological trick that makes Chapter 2 possible: we build a dataset where we know the latent factors, so we can later check whether the model’s embeddings preserve them or drift toward shortcuts.

Two modalities are created from the same underlying causes. The image modality is generated as small matrices with controllable structure: shapes and oriented sinusoidal patterns that encode orientation, frequency, phase, and thickness. The text modality is a symbolic grammar that encodes the same factors in tokens, then converts tokens into deterministic feature vectors. The crucial pedagogical point is that we are not using an LLM to create text. That eliminates an entire layer of uncontrolled complexity. If the model fails, it fails because of the alignment mechanism, not because of language generation.

Then we add the critical “attack knobs.” A confounder is a shared feature injected into both modalities that predicts pairing but is unrelated to the intended factors. This is how we simulate watermark-like artifacts, metadata leakage, or dataset ordering cues. Students often believe confounding is rare; this construction shows how easy it is to create and how attractive it is for a contrastive objective.

Pair corruption is another reality simulation. In real multimodal datasets, some fraction of pairs are wrong—mislabeled captions, mismatched audio, swapped images. By allowing a controlled corruption rate, we can map robustness curves and observe how the training dynamics change as the supervision becomes inconsistent.

Finally, the split strategy and overlap assertions are not bookkeeping. They defend the integrity of the experiment. If train and test overlap, you can mistakenly “prove” robustness. Chapter 2 is about diagnosing failure honestly; therefore, split hygiene is part of the pedagogy.

Students should leave this cell understanding: alignment is only as trustworthy as the pairing process and the absence of shortcuts. This dataset is built to teach that truth with measurable ground truth.


###3.2.CODE AND IMPLEMENTATION

In [9]:
# === Cell 3 (FIXED) ===
# Title: Synthetic Multimodal World with True Factors + Optional Confounders + Pair Corruption
# Brief Explanation: Create paired modalities from shared latent causes, then optionally inject confounders/corruption to induce structural failure modes.

import numpy as np
from dataclasses import dataclass
from typing import Tuple, Dict, Any

def split_indices(n: int, train_frac: float, val_frac: float, seed: int) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
    rng = np.random.RandomState(seed)
    idx = np.arange(n)
    rng.shuffle(idx)
    n_train = int(n * train_frac)
    n_val = int(n * val_frac)
    train = idx[:n_train]
    val = idx[n_train:n_train + n_val]
    test = idx[n_train + n_val:]
    assert len(set(train) & set(val)) == 0
    assert len(set(train) & set(test)) == 0
    assert len(set(val) & set(test)) == 0
    return train, val, test

@dataclass(frozen=True)
class Factors:
    shape: np.ndarray
    orient: np.ndarray
    freq: np.ndarray
    phase: np.ndarray
    thick: np.ndarray
    confounder: np.ndarray

def make_factors(n: int, seed: int, confounder_type: str) -> Factors:
    rng = np.random.RandomState(seed)
    shape = rng.randint(0, 4, size=n)         # 4
    orient = rng.randint(0, 8, size=n)        # 8
    freq = rng.randint(0, 6, size=n)          # 6
    phase = rng.randint(0, 8, size=n)         # 8
    thick = rng.randint(0, 3, size=n)         # 3
    if confounder_type == "parity":
        conf = (np.arange(n) % 2).astype(np.int64)
    else:
        conf = (np.arange(n) // 7 % 8).astype(np.int64)
    return Factors(shape=shape, orient=orient, freq=freq, phase=phase, thick=thick, confounder=conf)

def render_images(f: Factors, dc, seed: int) -> np.ndarray:
    rng = np.random.RandomState(seed)
    n = f.shape.shape[0]
    S = int(dc.image_side)

    yy, xx = np.meshgrid(np.linspace(-1, 1, S), np.linspace(-1, 1, S), indexing="ij")

    # Rotated coordinate per sample (orientation)
    ang = (f.orient.astype(np.float32) / 8.0) * (2.0 * np.pi)
    ca = np.cos(ang)[:, None, None]
    sa = np.sin(ang)[:, None, None]
    xr = ca * xx[None, :, :] + sa * yy[None, :, :]

    # Base sinusoid controlled by freq + phase + orient
    fr = (1.5 + f.freq.astype(np.float32))[:, None, None]
    ph = (f.phase.astype(np.float32) / 8.0 * 2.0 * np.pi)[:, None, None]
    base = np.sin(fr * np.pi * xr + ph).astype(np.float32)

    # Shape prototypes (ALL MUST BE SAME SHAPE (S,S) BEFORE ADDING BATCH DIM)
    r = np.sqrt(xx**2 + yy**2).astype(np.float32)
    circle0 = (r < 0.75).astype(np.float32)                        # (S,S)
    square0 = ((np.abs(xx) < 0.75) & (np.abs(yy) < 0.75)).astype(np.float32)  # (S,S)
    diamond0 = ((np.abs(xx) + np.abs(yy)) < 1.05).astype(np.float32)          # (S,S)
    # IMPORTANT FIX: stripe prototype must be (S,S), not (n,S,S)
    # We define an axis-aligned stripe prototype; orientation effect still enters via base(xr).
    stripe0 = (np.abs(xx) < 0.45).astype(np.float32)               # (S,S)

    # Stack prototypes -> (4,S,S), then select per sample -> (n,S,S)
    masks = np.stack([circle0, square0, diamond0, stripe0], axis=0).astype(np.float32)  # (4,S,S)
    m = masks[f.shape]  # (n,S,S)

    # Thickness modifies edge sharpness via power transform
    th = (1.0 + f.thick.astype(np.float32))[:, None, None]
    img = (m * base)
    img = np.sign(img) * (np.abs(img) ** (1.0 / th))

    # Normalize per-sample and add noise
    img = img.astype(np.float32)
    img -= img.mean(axis=(1, 2), keepdims=True)
    img /= (img.std(axis=(1, 2), keepdims=True) + 1e-6)
    img += rng.randn(n, S, S).astype(np.float32) * float(dc.noise_image)

    # Optional confounder: watermark pixel/stripe correlated with confounder label
    if bool(dc.confounder_enabled):
        if dc.confounder_type == "parity":
            bit = (f.confounder % 2).astype(np.float32)
            img[:, 0, 0] += float(dc.confounder_strength) * (2.0 * bit - 1.0)
        else:
            bucket = f.confounder.astype(np.float32)
            img[:, :, 0] += float(dc.confounder_strength) * ((bucket / 7.0) * 2.0 - 1.0)[:, None]

    return img.reshape(n, -1).astype(np.float32)

def tokens_to_features(f: Factors, dc, seed: int) -> np.ndarray:
    rng = np.random.RandomState(seed)
    n = f.shape.shape[0]
    L = int(dc.text_seq_len)
    V = int(dc.vocab_size)
    if L < 9:
        raise ValueError("text_seq_len must be >= 9 for grammar.")
    sep = 1
    base = 4
    shape_t = base + f.shape
    orient_t = base + 8 + f.orient
    freq_t = base + 8 + 8 + f.freq
    phase_t = base + 8 + 8 + 6 + f.phase
    thick_t = base + 8 + 8 + 6 + 8 + f.thick

    seq = np.zeros((n, L), dtype=np.int64)
    seq[:, 0] = shape_t
    seq[:, 1] = sep
    seq[:, 2] = orient_t
    seq[:, 3] = sep
    seq[:, 4] = freq_t
    seq[:, 5] = sep
    seq[:, 6] = phase_t
    seq[:, 7] = sep
    seq[:, 8] = thick_t

    if bool(dc.confounder_enabled):
        if dc.confounder_type == "parity":
            ctok = 2 + (f.confounder % 2)
        else:
            ctok = 2 + (f.confounder % min(16, V - 2))
        seq[:, 0] = ctok

    bow = np.zeros((n, V), dtype=np.float32)
    for pos in range(L):
        tok = seq[:, pos]
        bow[np.arange(n), tok] += 1.0
    bow /= float(L)

    K = min(24, V)
    pos_feat = np.zeros((n, K), dtype=np.float32)
    positions = np.arange(L, dtype=np.float32) / max(1.0, (L - 1))
    for k in range(K):
        mask = (seq == k).astype(np.float32)
        denom = mask.sum(axis=1) + 1e-6
        pos_feat[:, k] = (mask * positions[None, :]).sum(axis=1) / denom

    x = np.concatenate([bow, pos_feat], axis=1)
    x += rng.randn(*x.shape).astype(np.float32) * float(dc.noise_text)
    return x.astype(np.float32)

def maybe_corrupt_pairs(idx: np.ndarray, rate: float, seed: int) -> np.ndarray:
    if rate <= 0.0:
        return idx.copy()
    rng = np.random.RandomState(seed + 12345)
    idx2 = idx.copy()
    n = idx.shape[0]
    m = int(n * rate)
    if m <= 1:
        return idx2
    sel = rng.choice(n, size=m, replace=False)
    perm = sel.copy()
    rng.shuffle(perm)
    idx2[sel] = idx2[perm]
    return idx2

# ---- Build dataset (uses CFG + AM from prior cells) ----
FACT = make_factors(CFG.data.n, CFG.train.seed + 101, CFG.data.confounder_type)
X_img = render_images(FACT, CFG.data, CFG.train.seed + 202)
X_txt = tokens_to_features(FACT, CFG.data, CFG.train.seed + 303)

train_idx, val_idx, test_idx = split_indices(CFG.data.n, CFG.data.train_frac, CFG.data.val_frac, CFG.train.seed + 404)
pair_train = maybe_corrupt_pairs(train_idx, CFG.data.pairing_corruption_rate, CFG.train.seed + 505)

assert X_img.shape[0] == CFG.data.n and X_txt.shape[0] == CFG.data.n
assert X_img.dtype == np.float32 and X_txt.dtype == np.float32
assert np.isfinite(X_img).all() and np.isfinite(X_txt).all()

AM.write_json_strict(
    os.path.join(CFG.paths.root, "dataset_manifest.json"),
    facts_provided={"n": CFG.data.n, "x_img_shape": list(X_img.shape), "x_txt_shape": list(X_txt.shape)},
    assumptions={"synthetic_world": "shared latent factors generate both modalities; optional confounder may overwrite primary token."},
    open_items=[],
    analysis="Synthetic multimodal dataset built with known ground-truth factors to enable causal diagnostics and controlled failure injections.",
    draft_output={"splits": {"train": int(len(train_idx)), "val": int(len(val_idx)), "test": int(len(test_idx))}},
    verification_status="Not verified",
    questions_to_verify=["Are factor distributions balanced enough to avoid trivial separability artifacts?"]
)

print("Dataset ready:", X_img.shape, X_txt.shape, "train/val/test:", len(train_idx), len(val_idx), len(test_idx))


Dataset ready: (2400, 256) (2400, 88) train/val/test: 1680 360 360


##4.BASELINE ALIGNERS

###4.1.OVERVIEW

**Cell 4 — Baseline Aligner: Two 2-Layer MLP Encoders, Symmetric InfoNCE, Stable Numerics**

This cell defines the mechanism under test. We implement two encoders—one per modality—each as a 2-layer MLP that maps its input features into a shared embedding space. The most important concept here is that the shared space is not “found”; it is engineered through the objective.

The embeddings are L2-normalized, which enforces a cosine-geometry interpretation. That decision matters: cosine similarity becomes an angle measure, and the model must represent distinctions in direction rather than in raw magnitude. However, we also introduce explicit pre-normalization gains. That is not a contradiction; it is a deliberate tool for inducing dominance. By scaling one modality’s pre-normalization activations, we can create imbalances that mimic real-world situations where one encoder is better pretrained or produces higher signal-to-noise features.

The loss is symmetric InfoNCE: image→text and text→image are both optimized. This is essential to Chapter 2 because many real systems are evaluated only in one direction. Symmetry is a discipline: it forces the space to be mutually compatible rather than merely “projected” from one modality into another.

Stable numerics are part of rigor, not polish. We use log-sum-exp patterns to avoid overflow, because overflow can masquerade as collapse or instability. Students should learn that some “mysterious” failures are just numerical. If you do not implement stability, you cannot interpret results.

Finally, this cell sets up the optimizer style and the parameterization that later experiments will stress. Dominance, collapse, and shortcut learning are not only “data problems.” They are also training-dynamics problems. This cell is therefore the engine whose behavior we will measure under controlled interventions.

The pedagogical takeaway: multimodal alignment is a joint optimization problem over two encoders and one geometry. If you change the geometry controls—temperature, normalization, gains—you can change what “alignment” means.


###4.2.CODE AND IMPLEMENTATION

In [10]:
# === Cell 4 ===
# Title: Baseline Aligner (2-Layer MLP Encoders + Symmetric InfoNCE + AdamW)
# Brief Explanation: Implement the core multimodal alignment mechanism with stable numerics and explicit dominance toggles.

def leaky_relu(x: np.ndarray, leak: float) -> np.ndarray:
    return np.where(x > 0.0, x, leak * x)

def leaky_relu_grad(x: np.ndarray, leak: float) -> np.ndarray:
    return np.where(x > 0.0, 1.0, leak).astype(x.dtype)

def l2_normalize(z: np.ndarray, eps: float = 1e-8) -> Tuple[np.ndarray, np.ndarray]:
    nrm = np.linalg.norm(z, axis=1, keepdims=True)
    nrm = np.maximum(nrm, eps)
    return z / nrm, nrm

def l2_normalize_backward(dy: np.ndarray, z: np.ndarray, nrm: np.ndarray, eps: float = 1e-8) -> np.ndarray:
    inv = 1.0 / np.maximum(nrm, eps)
    dot = np.sum(dy * z, axis=1, keepdims=True)
    return dy * inv - z * (dot * (inv ** 3))

def logsumexp(a: np.ndarray, axis: int = -1) -> np.ndarray:
    m = np.max(a, axis=axis, keepdims=True)
    return (m + np.log(np.sum(np.exp(a - m), axis=axis, keepdims=True))).squeeze(axis)

class MLP2:
    def __init__(self, in_dim: int, hidden: int, out_dim: int, seed: int, leak: float):
        self.leak = float(leak)
        rng = np.random.RandomState(seed)
        self.W1 = (rng.randn(in_dim, hidden).astype(np.float32) * math.sqrt(2.0 / in_dim))
        self.b1 = np.zeros((1, hidden), dtype=np.float32)
        self.W2 = (rng.randn(hidden, out_dim).astype(np.float32) * math.sqrt(2.0 / hidden))
        self.b2 = np.zeros((1, out_dim), dtype=np.float32)
        self.m = {k: np.zeros_like(v) for k, v in self.params().items()}
        self.v = {k: np.zeros_like(v) for k, v in self.params().items()}
        self.t = 0

    def params(self) -> Dict[str, np.ndarray]:
        return {"W1": self.W1, "b1": self.b1, "W2": self.W2, "b2": self.b2}

    def forward(self, x: np.ndarray) -> Dict[str, np.ndarray]:
        h_pre = x @ self.W1 + self.b1
        h = leaky_relu(h_pre, self.leak)
        z = h @ self.W2 + self.b2
        y, nrm = l2_normalize(z)
        return {"x": x, "h_pre": h_pre, "h": h, "z": z, "y": y, "nrm": nrm}

    def backward(self, cache: Dict[str, np.ndarray], dy: np.ndarray) -> Dict[str, np.ndarray]:
        x, h_pre, h, z, nrm = cache["x"], cache["h_pre"], cache["h"], cache["z"], cache["nrm"]
        dz = l2_normalize_backward(dy, z, nrm)
        dW2 = h.T @ dz
        db2 = dz.sum(axis=0, keepdims=True)
        dh = dz @ self.W2.T
        dh_pre = dh * leaky_relu_grad(h_pre, self.leak)
        dW1 = x.T @ dh_pre
        db1 = dh_pre.sum(axis=0, keepdims=True)
        return {"W1": dW1, "b1": db1, "W2": dW2, "b2": db2}

    def adamw_step(self, grads: Dict[str, np.ndarray], lr: float, b1: float, b2: float, eps: float,
                   weight_decay: float, grad_clip: float) -> Dict[str, float]:
        self.t += 1
        stats: Dict[str, float] = {}
        for k, p in self.params().items():
            g = grads[k].astype(np.float32)
            gn = float(np.linalg.norm(g))
            stats[f"grad_norm_{k}"] = gn
            if gn > grad_clip:
                g = g * (grad_clip / (gn + 1e-12))
            if k.startswith("W") and weight_decay > 0.0:
                p *= (1.0 - lr * weight_decay)
            self.m[k] = b1 * self.m[k] + (1.0 - b1) * g
            self.v[k] = b2 * self.v[k] + (1.0 - b2) * (g * g)
            mhat = self.m[k] / (1.0 - b1 ** self.t)
            vhat = self.v[k] / (1.0 - b2 ** self.t)
            p -= lr * mhat / (np.sqrt(vhat) + eps)
        return stats

def info_nce_symmetric(y_img: np.ndarray, y_txt: np.ndarray, temp: float) -> Tuple[float, np.ndarray, np.ndarray, Dict[str, Any]]:
    # Similarities
    s = (y_img @ y_txt.T) / max(1e-8, temp)  # (B,B)
    B = s.shape[0]
    # logits_i2t: row-softmax; logits_t2i: col-softmax (equiv softmax on s.T)
    logZ_row = logsumexp(s, axis=1)          # (B,)
    logZ_col = logsumexp(s, axis=0)          # (B,)
    diag = np.diag(s)
    loss_i2t = -np.mean(diag - logZ_row)
    loss_t2i = -np.mean(diag - logZ_col)
    loss = float(0.5 * (loss_i2t + loss_t2i))

    # softmax probs
    p_row = np.exp(s - logZ_row[:, None]).astype(np.float32)
    p_col = np.exp(s - logZ_col[None, :]).astype(np.float32)

    I = np.eye(B, dtype=np.float32)
    dS_row = (p_row - I) / float(B)          # dLoss_i2t/dS
    dS_col = (p_col - I) / float(B)          # dLoss_t2i/dS for column-softmax when viewed as col loss

    # combine gradients: for s_ij, row loss contributes dS_row_ij; col loss contributes dS_col_ij
    dS = 0.5 * (dS_row + dS_col)

    # dS = (1/temp) * (dYimg @ Ytxt^T + Yimg @ dYtxt^T) structure:
    # dYimg = dS @ Ytxt / temp ; dYtxt = dS.T @ Yimg / temp
    invT = 1.0 / max(1e-8, temp)
    dY_img = (dS @ y_txt) * invT
    dY_txt = (dS.T @ y_img) * invT

    aux = {"loss_i2t": float(loss_i2t), "loss_t2i": float(loss_t2i), "mean_diag_sim": float(np.mean(diag) * temp)}
    return loss, dY_img.astype(np.float32), dY_txt.astype(np.float32), aux

def build_models(cfg: Config, x_img: np.ndarray, x_txt: np.ndarray) -> Tuple[MLP2, MLP2]:
    h_img = int(cfg.model.hidden * cfg.model.cap_img_hidden_mult)
    h_txt = int(cfg.model.hidden * cfg.model.cap_txt_hidden_mult)
    img = MLP2(x_img.shape[1], h_img, cfg.model.embed_dim, cfg.train.seed + 11, cfg.model.leak)
    txt = MLP2(x_txt.shape[1], h_txt, cfg.model.embed_dim, cfg.train.seed + 23, cfg.model.leak)
    return img, txt

IMG, TXT = build_models(CFG, X_img, X_txt)
print("Models initialized:", "IMG hidden", IMG.W1.shape[1], "TXT hidden", TXT.W1.shape[1])


Models initialized: IMG hidden 256 TXT hidden 256


##5.TRAINING AND IMPLEMENTATION

###5.1.OVERVIEW

**Cell 5 — Training with Instrumentation: Early Warnings, Spectra, Gradient Statistics, Checkpoints**

This cell is the difference between “we trained a model” and “we ran an experiment.” Chapter 2 requires you to see failure as a process, not only as an endpoint. That is why we instrument training.

The training loop logs more than loss. Loss is often a misleading signal in multimodal systems because it can keep improving even as the representation becomes brittle. So we log retrieval metrics in both directions, symmetry gaps, embedding variance, mean off-diagonal cosine similarity, and spectral summaries such as effective rank. These quantities correspond directly to the failure modes we care about. Dominance shows up as asymmetry and gradient imbalance. Collapse shows up as rising mean cosine and shrinking rank. Spurious alignment shows up later as probe shifts and counterfactual failures.

Gradient statistics per modality are especially instructive. If one modality’s gradients are consistently larger, it is exerting more influence on the shared space. That is a measurable definition of dominance. Students should learn to stop describing dominance as a feeling and start treating it as a numeric imbalance.

Checkpointing is framed as governance. We save the best state according to validation retrieval. That prevents us from narrating success based on a late-epoch accident or a collapse episode that happens after the model peaked. In production, checkpoint criteria become stage gates: you deploy the best governed state, not the last state.

This cell also reinforces reproducibility through deterministic batching. Batch order can change contrastive learning substantially because negatives are batch-defined. Deterministic batching ensures that if you see an instability, you can reproduce it and diagnose it.

The main pedagogical message: to trust multimodal alignment, you need monitoring that is aligned with failure physics—variance, spectra, asymmetry—not just with leaderboard metrics. This is how you build professional intuition about embedding health.


###5.2.CODE AND IMPLEMENTATION

In [11]:
# === Cell 5 ===
# Title: Training + Instrumentation (Warnings, Spectra, Grad Stats, Checkpoints)
# Brief Explanation: Train the aligner while logging early warning indicators and saving reviewable artifacts per epoch.

def batch_iter(indices: np.ndarray, batch: int, seed: int) -> List[np.ndarray]:
    rng = np.random.RandomState(seed)
    idx = indices.copy()
    rng.shuffle(idx)
    return [idx[i:i+batch] for i in range(0, len(idx), batch)]

def retrieval_at_k(yA: np.ndarray, yB: np.ndarray, ks: Tuple[int, int] = (1, 5)) -> Dict[str, float]:
    s = yA @ yB.T
    order = np.argsort(-s, axis=1)
    gt = np.arange(yA.shape[0])
    out: Dict[str, float] = {}
    for k in ks:
        hit = (order[:, :k] == gt[:, None]).any(axis=1).mean()
        out[f"r@{k}"] = float(hit)
    # MRR
    ranks = np.where(order == gt[:, None])[1] + 1
    out["mrr"] = float(np.mean(1.0 / ranks))
    return out

def mean_offdiag_cos(y: np.ndarray) -> float:
    s = y @ y.T
    B = s.shape[0]
    return float((s.sum() - np.trace(s)) / max(1, (B*B - B)))

def effective_rank(y: np.ndarray, eps: float = 1e-12) -> float:
    # effective rank via entropy of normalized singular values of centered embeddings
    yc = y - y.mean(axis=0, keepdims=True)
    cov = (yc.T @ yc) / max(1, yc.shape[0])
    w = np.linalg.eigvalsh(cov.astype(np.float64))
    w = np.maximum(w, eps)
    p = w / np.sum(w)
    H = -np.sum(p * np.log(p))
    return float(np.exp(H))

def cov_spectrum_summary(y: np.ndarray, topk: int = 8) -> Dict[str, Any]:
    yc = y - y.mean(axis=0, keepdims=True)
    cov = (yc.T @ yc) / max(1, yc.shape[0])
    w = np.sort(np.linalg.eigvalsh(cov.astype(np.float64)))[::-1]
    top = w[:topk].tolist()
    return {"top_eigs": top, "effective_rank": effective_rank(y), "trace": float(np.sum(w))}

def eval_embeddings(img: MLP2, txt: MLP2, idx: np.ndarray, pair_idx: np.ndarray, cfg: Config) -> Dict[str, Any]:
    Xi = X_img[idx]
    Xt = X_txt[pair_idx]  # pair mapping
    ci = img.forward(Xi)
    ct = txt.forward(Xt)
    # dominance gains (pre-normalization) implemented as scaling z then renormalizing
    zi = ci["z"] * cfg.model.gain_img
    zt = ct["z"] * cfg.model.gain_txt
    yi, _ = l2_normalize(zi)
    yt, _ = l2_normalize(zt)
    m_i2t = retrieval_at_k(yi, yt, CFG.model.__dict__.get("topk", (1,5)) if False else (1, 5))
    m_t2i = retrieval_at_k(yt, yi, (1, 5))
    return {
        "i2t": m_i2t,
        "t2i": m_t2i,
        "sym_gap_r@1": float(m_i2t["r@1"] - m_t2i["r@1"]),
        "mean_offdiag_cos_img": mean_offdiag_cos(yi),
        "mean_offdiag_cos_txt": mean_offdiag_cos(yt),
        "var_img": float(np.mean(np.var(yi, axis=0))),
        "var_txt": float(np.mean(np.var(yt, axis=0))),
        "spectrum_img": cov_spectrum_summary(yi, topk=CFG.probe.cca_topk),
        "spectrum_txt": cov_spectrum_summary(yt, topk=CFG.probe.cca_topk),
        "embeddings": {"img": yi, "txt": yt},  # for probes (caller may discard)
    }

def train_aligner(img: MLP2, txt: MLP2,
                  train_idx: np.ndarray,
                  pair_train: np.ndarray,
                  val_idx: np.ndarray,
                  pair_val: np.ndarray,
                  cfg: Config,
                  epochs: int,
                  tag: str) -> Dict[str, Any]:
    history: List[Dict[str, Any]] = []
    best = {"val_r1": -1.0, "epoch": -1}
    best_ckpt = None

    for ep in range(1, epochs + 1):
        batches = batch_iter(train_idx, cfg.train.batch, cfg.train.seed + ep * 97)
        ep_loss = 0.0
        grad_stats_img: List[float] = []
        grad_stats_txt: List[float] = []

        for b in batches:
            # paired indices (allow corruption in pair_train mapping)
            pb = pair_train[np.searchsorted(train_idx, b)] if False else b  # keep identity mapping by default
            # NOTE: For corruption experiments we pass a permuted pair_train and use it directly below.
            Xi = X_img[b]
            Xt = X_txt[pb]

            ci = img.forward(Xi)
            ct = txt.forward(Xt)

            # dominance gains: scale z then renormalize
            zi = ci["z"] * cfg.model.gain_img
            zt = ct["z"] * cfg.model.gain_txt
            yi, ni = l2_normalize(zi)
            yt, nt = l2_normalize(zt)

            loss, dYi, dYt, aux = info_nce_symmetric(yi, yt, cfg.model.temp)
            ep_loss += loss * float(len(b))

            # backprop through normalization scaling: yi = norm(z*g)
            dZi = l2_normalize_backward(dYi, zi, ni)
            dZt = l2_normalize_backward(dYt, zt, nt)
            # chain through gain: zi = z * gain
            dZi *= cfg.model.gain_img
            dZt *= cfg.model.gain_txt

            gi = img.backward(ci, dZi.astype(np.float32))
            gt = txt.backward(ct, dZt.astype(np.float32))

            si = img.adamw_step(gi, cfg.train.lr, cfg.train.b1, cfg.train.b2, cfg.train.eps, cfg.train.weight_decay, cfg.train.grad_clip)
            st = txt.adamw_step(gt, cfg.train.lr, cfg.train.b1, cfg.train.b2, cfg.train.eps, cfg.train.weight_decay, cfg.train.grad_clip)

            grad_stats_img.append(float(si["grad_norm_W1"] + si["grad_norm_W2"]))
            grad_stats_txt.append(float(st["grad_norm_W1"] + st["grad_norm_W2"]))

        ep_loss /= float(len(train_idx))

        # Validation snapshot (small deterministic slice for speed)
        vsel = val_idx[:min(512, len(val_idx))]
        evalv = eval_embeddings(img, txt, vsel, vsel, cfg)
        val_r1 = evalv["i2t"]["r@1"]

        warn = {
            "epoch": ep,
            "train_loss": float(ep_loss),
            "val_i2t_r@1": float(evalv["i2t"]["r@1"]),
            "val_t2i_r@1": float(evalv["t2i"]["r@1"]),
            "sym_gap_r@1": float(evalv["sym_gap_r@1"]),
            "var_img": float(evalv["var_img"]),
            "var_txt": float(evalv["var_txt"]),
            "mean_offdiag_cos_img": float(evalv["mean_offdiag_cos_img"]),
            "mean_offdiag_cos_txt": float(evalv["mean_offdiag_cos_txt"]),
            "eff_rank_img": float(evalv["spectrum_img"]["effective_rank"]),
            "eff_rank_txt": float(evalv["spectrum_txt"]["effective_rank"]),
            "grad_sum_norm_img_mean": float(np.mean(grad_stats_img)) if grad_stats_img else 0.0,
            "grad_sum_norm_txt_mean": float(np.mean(grad_stats_txt)) if grad_stats_txt else 0.0,
            "grad_imbalance_index": float((np.mean(grad_stats_img) + 1e-12) / (np.mean(grad_stats_txt) + 1e-12)) if grad_stats_img and grad_stats_txt else 0.0,
        }
        history.append(warn)

        # Checkpoint best by val retrieval@1 (i2t)
        if val_r1 > best["val_r1"]:
            best = {"val_r1": float(val_r1), "epoch": ep}
            best_ckpt = {
                "IMG": {k: v.copy() for k, v in img.params().items()},
                "TXT": {k: v.copy() for k, v in txt.params().items()},
                "meta": {"epoch": ep, "val_r1": float(val_r1), "tag": tag, "utc": utc_now_iso()},
            }
            ckpt_path = os.path.join(cfg.paths.ckpt, f"{tag}_best_ckpt.json")
            # store numeric arrays as lists (small model, acceptable); still deterministic
            ckpt_ser = {
                "meta": best_ckpt["meta"],
                "IMG": {k: best_ckpt["IMG"][k].tolist() for k in best_ckpt["IMG"]},
                "TXT": {k: best_ckpt["TXT"][k].tolist() for k in best_ckpt["TXT"]},
            }
            AM.write_json_strict(
                ckpt_path,
                facts_provided={"checkpoint": "best", "epoch": ep, "val_r1": float(val_r1)},
                assumptions={"serialization": "parameters stored as JSON lists for portability; not optimized for size."},
                open_items=[],
                analysis="Best checkpoint saved by validation retrieval@1 to prevent overfitting to training batches.",
                draft_output=ckpt_ser,
                verification_status="Not verified",
                questions_to_verify=["Would binary serialization be preferable for larger models?"]
            )
            best_ckpt_path = ckpt_path
        else:
            best_ckpt_path = None

        # Stream-safe history write
        hist_path = os.path.join(cfg.paths.runs, f"{tag}_train_history.json")
        AM.write_json_strict(
            hist_path,
            facts_provided={"history": history[-min(200, len(history)):]},
            assumptions={"history_truncation": "last 200 epochs max (here far less)."},
            open_items=[],
            analysis="Training history includes early warning indicators for dominance, collapse, and asymmetry.",
            draft_output={"best_so_far": best, "last_epoch": warn},
            verification_status="Not verified",
            questions_to_verify=["Do warning indicators predict downstream degradation under stress?"]
        )

        if ep % 5 == 0 or ep == 1:
            print(f"[{tag}] ep={ep:03d} loss={ep_loss:.4f} val_r1={val_r1:.3f} sym_gap={warn['sym_gap_r@1']:.3f} var_img={warn['var_img']:.4f} offcos={warn['mean_offdiag_cos_img']:.3f}")

    return {"history": history, "best": best, "tag": tag}
print("Training function ready.")


Training function ready.


##6.BASELINE RUN

###6.1.OVERVIEW

**Cell 6 — Baseline Run and Structural Probes: Retrieval, MI Proxy, CCA Proxy, Influence/Sensitivity Proxy**

This cell establishes the reference behavior of a “working” aligner and, more importantly, measures what kind of structure the embeddings preserve. The baseline is not the goal; it is the control condition against which we interpret failures.

Retrieval metrics are computed in both directions. That alone is a pedagogical upgrade: students see that a single number is insufficient. The symmetry gap becomes a core diagnostic: if i→t is strong but t→i is weak, the space is not mutually compatible.

Then we add three structural probes that move beyond retrieval. The mutual-information proxy asks: how much information about true latent factors is present in the embedding codes? It does not claim to measure “true MI” in a theoretical sense; it is a proxy built by discretizing embedding coordinates and counting empirical entropies. The point is comparative: when we inject confounders later, MI to true factors should fall if the model stops encoding them.

The CCA-like proxy asks: do the two modalities share coherent correlated directions in embedding space? By whitening each embedding set and taking singular values of the cross-covariance, we get principal correlations. Healthy alignment should produce a meaningful spectrum; dominance or collapse can distort it.

The influence/sensitivity proxy asks: how fragile is retrieval to a small training perturbation? We approximate this by comparing one-step updates under a perturbation. In professional settings, fragility is often the real risk: a system that is highly sensitive to batch perturbations may behave unpredictably under drift or noisy updates.

Finally, the plots are saved to disk as deliverables. The educational emphasis is that visualization is evidence only when it is tied to computed metrics and preserved in artifacts. Students should interpret baseline results as the “geometry contract” that later experiments will violate in distinct ways.


###6.2.CODE AND IMPLEMENTATION

In [12]:
# === Cell 6 ===
# Title: Baseline Run + Full Evaluation (Retrieval, MI Proxy, CCA Proxy, Influence Proxy)
# Brief Explanation: Establish a clean baseline, then compute structural probes that reveal what alignment preserves and what it erases.

def discretize_embeddings(y: np.ndarray, bins: int, dims: int) -> np.ndarray:
    # quantile bins per-dimension for first dims; output integer codes
    y0 = y[:, :dims].astype(np.float64)
    q = np.linspace(0, 1, bins + 1)
    edges = []
    for d in range(dims):
        ed = np.quantile(y0[:, d], q)
        ed[0] = -np.inf
        ed[-1] = np.inf
        edges.append(ed)
    codes = np.zeros((y0.shape[0], dims), dtype=np.int32)
    for d in range(dims):
        codes[:, d] = np.searchsorted(edges[d], y0[:, d], side="right") - 1
        codes[:, d] = np.clip(codes[:, d], 0, bins - 1)
    # hash code vector to single categorical id
    base = bins
    hid = np.zeros((y0.shape[0],), dtype=np.int64)
    for d in range(dims):
        hid = hid * base + codes[:, d].astype(np.int64)
    return hid

def entropy_from_counts(c: np.ndarray) -> float:
    p = c.astype(np.float64)
    p = p / max(1.0, p.sum())
    p = p[p > 0]
    return float(-np.sum(p * np.log(p)))

def mi_proxy(y: np.ndarray, factor: np.ndarray, bins: int, dims: int) -> float:
    # MI(EmbCode ; Factor) via empirical counts
    z = discretize_embeddings(y, bins=bins, dims=dims)
    f = factor.astype(np.int64)
    # joint counts
    z_ids, z_inv = np.unique(z, return_inverse=True)
    f_ids, f_inv = np.unique(f, return_inverse=True)
    joint = np.zeros((z_ids.shape[0], f_ids.shape[0]), dtype=np.int64)
    np.add.at(joint, (z_inv, f_inv), 1)
    cz = joint.sum(axis=1)
    cf = joint.sum(axis=0)
    HZ = entropy_from_counts(cz)
    HF = entropy_from_counts(cf)
    HZF = entropy_from_counts(joint.ravel())
    return float(HZ + HF - HZF)

def cca_principal_corr(yA: np.ndarray, yB: np.ndarray, ridge: float, topk: int) -> Dict[str, Any]:
    # Compute principal correlations via whitening and SVD
    A = (yA - yA.mean(axis=0, keepdims=True)).astype(np.float64)
    B = (yB - yB.mean(axis=0, keepdims=True)).astype(np.float64)
    n = A.shape[0]
    CA = (A.T @ A) / max(1, n) + ridge * np.eye(A.shape[1])
    CB = (B.T @ B) / max(1, n) + ridge * np.eye(B.shape[1])
    C = (A.T @ B) / max(1, n)

    def inv_sqrtm(M: np.ndarray) -> np.ndarray:
        w, V = np.linalg.eigh(M)
        w = np.maximum(w, 1e-12)
        return (V * (1.0 / np.sqrt(w))[None, :]) @ V.T

    WA = inv_sqrtm(CA)
    WB = inv_sqrtm(CB)
    T = WA @ C @ WB
    s = np.linalg.svd(T, compute_uv=False)
    s = np.clip(s, 0.0, 1.0)
    return {"top_principal_corr": s[:topk].tolist(), "mean_topk": float(np.mean(s[:topk]))}

def one_step_update_sensitivity(img0: MLP2, txt0: MLP2,
                                batch_idx: np.ndarray,
                                batch_pair: np.ndarray,
                                eval_idx: np.ndarray,
                                cfg: Config,
                                perturb: str) -> float:
    # Clone params (shallow model copy)
    def clone_model(src: MLP2) -> MLP2:
        dst = MLP2(src.W1.shape[0], src.W1.shape[1], src.W2.shape[1], seed=1, leak=src.leak)
        dst.W1 = src.W1.copy(); dst.b1 = src.b1.copy(); dst.W2 = src.W2.copy(); dst.b2 = src.b2.copy()
        dst.m = {k: v.copy() for k, v in src.m.items()}
        dst.v = {k: v.copy() for k, v in src.v.items()}
        dst.t = src.t
        return dst

    def eval_r1(img: MLP2, txt: MLP2) -> float:
        out = eval_embeddings(img, txt, eval_idx, eval_idx, cfg)
        return float(out["i2t"]["r@1"])

    base_img = clone_model(img0); base_txt = clone_model(txt0)
    pert_img = clone_model(img0); pert_txt = clone_model(txt0)

    # Build two batches: original and perturbed
    Xi = X_img[batch_idx]
    Xt = X_txt[batch_pair]

    Xi_p = Xi.copy()
    Xt_p = Xt.copy()

    if perturb == "remove_confounder":
        # Counterfactual: remove confounder by zeroing known channels:
        # Image: watermark pixel/stripe -> zero first pixel/first column; Text: overwrite token features bucket 0
        # (Approximate: zero first pixel feature and first column stripe feature positions)
        Xi_p[:, 0] = 0.0
        # Text: zero first 6 vocab dims to reduce conf token dominance (synthetic approximation)
        Xt_p[:, :6] = 0.0
    elif perturb == "swap_pairs":
        # Swap within batch deterministically
        perm = np.arange(len(batch_idx))
        perm = np.roll(perm, 1)
        Xt_p = Xt_p[perm]
    else:
        pass

    # One gradient step on base
    ci = base_img.forward(Xi); ct = base_txt.forward(Xt)
    zi = ci["z"] * cfg.model.gain_img; zt = ct["z"] * cfg.model.gain_txt
    yi, ni = l2_normalize(zi); yt, nt = l2_normalize(zt)
    loss, dYi, dYt, _ = info_nce_symmetric(yi, yt, cfg.model.temp)
    dZi = l2_normalize_backward(dYi, zi, ni) * cfg.model.gain_img
    dZt = l2_normalize_backward(dYt, zt, nt) * cfg.model.gain_txt
    gi = base_img.backward(ci, dZi.astype(np.float32))
    gt = base_txt.backward(ct, dZt.astype(np.float32))
    base_img.adamw_step(gi, cfg.train.lr, cfg.train.b1, cfg.train.b2, cfg.train.eps, cfg.train.weight_decay, cfg.train.grad_clip)
    base_txt.adamw_step(gt, cfg.train.lr, cfg.train.b1, cfg.train.b2, cfg.train.eps, cfg.train.weight_decay, cfg.train.grad_clip)

    # One gradient step on perturbed
    ci2 = pert_img.forward(Xi_p); ct2 = pert_txt.forward(Xt_p)
    zi2 = ci2["z"] * cfg.model.gain_img; zt2 = ct2["z"] * cfg.model.gain_txt
    yi2, ni2 = l2_normalize(zi2); yt2, nt2 = l2_normalize(zt2)
    loss2, dYi2, dYt2, _ = info_nce_symmetric(yi2, yt2, cfg.model.temp)
    dZi2 = l2_normalize_backward(dYi2, zi2, ni2) * cfg.model.gain_img
    dZt2 = l2_normalize_backward(dYt2, zt2, nt2) * cfg.model.gain_txt
    gi2 = pert_img.backward(ci2, dZi2.astype(np.float32))
    gt2 = pert_txt.backward(ct2, dZt2.astype(np.float32))
    pert_img.adamw_step(gi2, cfg.train.lr, cfg.train.b1, cfg.train.b2, cfg.train.eps, cfg.train.weight_decay, cfg.train.grad_clip)
    pert_txt.adamw_step(gt2, cfg.train.lr, cfg.train.b1, cfg.train.b2, cfg.train.eps, cfg.train.weight_decay, cfg.train.grad_clip)

    r1_base = eval_r1(base_img, base_txt)
    r1_pert = eval_r1(pert_img, pert_txt)
    return float(abs(r1_base - r1_pert))

# Baseline training (clean, no confounder, no corruption)
baseline_cfg = CFG
IMG_base, TXT_base = build_models(baseline_cfg, X_img, X_txt)

baseline_train = train_aligner(
    IMG_base, TXT_base,
    train_idx=train_idx, pair_train=train_idx,  # identity pairing
    val_idx=val_idx, pair_val=val_idx,
    cfg=baseline_cfg,
    epochs=baseline_cfg.train.epochs,
    tag="baseline"
)

# Full baseline evaluation on test slice (deterministic subset for speed)
test_sel = test_idx[:min(800, len(test_idx))]
E = eval_embeddings(IMG_base, TXT_base, test_sel, test_sel, baseline_cfg)
yi = E["embeddings"]["img"]; yt = E["embeddings"]["txt"]

# Structural probes
mi = {
    "shape": mi_proxy(yi, FACT.shape[test_sel], CFG.probe.mi_bins, CFG.probe.mi_dims),
    "orient": mi_proxy(yi, FACT.orient[test_sel], CFG.probe.mi_bins, CFG.probe.mi_dims),
    "freq": mi_proxy(yi, FACT.freq[test_sel], CFG.probe.mi_bins, CFG.probe.mi_dims),
    "phase": mi_proxy(yi, FACT.phase[test_sel], CFG.probe.mi_bins, CFG.probe.mi_dims),
    "thick": mi_proxy(yi, FACT.thick[test_sel], CFG.probe.mi_bins, CFG.probe.mi_dims),
}
cca = cca_principal_corr(yi, yt, ridge=CFG.probe.ridge, topk=CFG.probe.cca_topk)

# Influence proxy: one-step sensitivity under counterfactual removal (even if confounder absent, should be small)
eval_idx = test_sel[:CFG.probe.influence_eval_n]
batch_idx = train_idx[:CFG.probe.influence_batch_n]
sens = one_step_update_sensitivity(IMG_base, TXT_base, batch_idx, batch_idx, eval_idx, baseline_cfg, perturb="swap_pairs")

# Plots: retrieval heatmap on small subset; variance histogram
sub = 64
S = (yi[:sub] @ yt[:sub].T)
plt.figure()
plt.imshow(S, aspect="auto")
plt.colorbar()
plt.title("Baseline cross-modal similarity (subset)")
p1 = os.path.join(CFG.paths.plots, "baseline_similarity_heatmap.png")
plt.tight_layout(); plt.savefig(p1, dpi=140); plt.close()

plt.figure()
plt.hist(np.linalg.norm(yi, axis=1), bins=30)
plt.title("Baseline embedding norms (img; should be ~1 after normalization)")
p2 = os.path.join(CFG.paths.plots, "baseline_norms_img.png")
plt.tight_layout(); plt.savefig(p2, dpi=140); plt.close()

baseline_report_path = os.path.join(CFG.paths.root, "baseline_report.json")
AM.write_json_strict(
    baseline_report_path,
    facts_provided={
        "test_metrics": {k: v for k, v in E.items() if k != "embeddings"},
        "mi_proxy_img_vs_factors": mi,
        "cca_proxy": cca,
        "influence_proxy_abs_delta_r@1": sens,
        "plots": [p1, p2],
    },
    assumptions={
        "mi_proxy": {"bins": CFG.probe.mi_bins, "dims": CFG.probe.mi_dims, "note": "proxy via discretized embedding codes"},
        "cca_proxy": {"ridge": CFG.probe.ridge, "topk": CFG.probe.cca_topk},
        "influence_proxy": {"one_step": True, "perturbation": "swap_pairs"},
    },
    open_items=[],
    analysis="Baseline establishes a controlled reference for later adversarial experiments. Probes quantify factor information, cross-modal correlation structure, and one-step sensitivity.",
    draft_output={"baseline_best_epoch": baseline_train["best"]["epoch"], "baseline_best_val_r@1": baseline_train["best"]["val_r1"]},
    verification_status="Not verified",
    questions_to_verify=[
        "Do MI and CCA proxies correlate with downstream robustness under noise/corruption?",
        "How sensitive are probes to discretization choices (MI bins/dims) and ridge (CCA)?",
    ],
)
print("Baseline complete. Test r@1 i2t:", E["i2t"]["r@1"], "t2i:", E["t2i"]["r@1"], "CCA mean:", cca["mean_topk"], "Influence:", sens)


[baseline] ep=001 loss=5.1227 val_r1=0.022 sym_gap=-0.011 var_img=0.0092 offcos=0.559
[baseline] ep=005 loss=1.2910 val_r1=0.244 sym_gap=0.008 var_img=0.0205 offcos=0.015
[baseline] ep=010 loss=0.4768 val_r1=0.444 sym_gap=-0.003 var_img=0.0206 offcos=0.006
[baseline] ep=015 loss=0.2922 val_r1=0.497 sym_gap=0.014 var_img=0.0207 offcos=0.004
[baseline] ep=020 loss=0.2174 val_r1=0.506 sym_gap=-0.014 var_img=0.0207 offcos=0.005
[baseline] ep=025 loss=0.1860 val_r1=0.500 sym_gap=-0.014 var_img=0.0207 offcos=0.006
[baseline] ep=030 loss=0.1598 val_r1=0.489 sym_gap=-0.006 var_img=0.0206 offcos=0.008
[baseline] ep=035 loss=0.1522 val_r1=0.478 sym_gap=0.003 var_img=0.0205 offcos=0.013
[baseline] ep=040 loss=0.1292 val_r1=0.494 sym_gap=0.008 var_img=0.0205 offcos=0.012
[baseline] ep=045 loss=0.1187 val_r1=0.469 sym_gap=0.019 var_img=0.0206 offcos=0.011
Baseline complete. Test r@1 i2t: 0.4361111111111111 t2i: 0.4 CCA mean: 0.8545217412011163 Influence: 0.046875


##7.EXPERIMENT SUITE

###7.1.OVERVIEW

**Cell 8 — Experiment Suite B: Spurious Confounders, Pair Corruption, and Counterfactual Tests**

This cell delivers the most important critique: high retrieval can be achieved for the wrong reason. We inject a confounder that exists in both modalities and predicts pairing. This is a controlled version of real issues like watermarks, metadata leakage, template artifacts, or dataset ordering signals.

The core interpretive tool is the counterfactual removal test. We train with the confounder present, then evaluate after removing it at test time. If retrieval collapses, the model relied on the confounder. That is a decisive causal diagnosis: you changed only the shortcut channel, and the system’s performance fell. Students should learn that this is stronger evidence than any embedding plot.

We also compute MI proxies to compare “information about true factors” versus “information about confounder.” A spurious system tends to show MI shifting toward the confounder, because that is the easiest predictive signal. The educational point is that the model’s objective never asked it to learn the true factors; it asked it to identify pairs among negatives. If the confounder does that job, the model will use it.

Pair corruption adds a second layer: when some fraction of pairs are wrong, the model is trained on contradictions. That can amplify shortcut reliance (because shortcuts remain consistent even when true pairing is inconsistent) or can destabilize the geometry. Interpreting corruption results requires looking at stability indicators and probe shifts, not just final retrieval.

In production, this suite implies a checklist: always search for confounders, always run invariance or removal tests, always measure directional retrieval, and never accept high retrieval alone as proof of semantic alignment. The chapter’s governed artifacts show how to document and defend that conclusion.


###7.2.CODE AND IMPLEMENTATION

In [13]:
# === Cell 7 ===
# Title: Experiment Suite A — Modality Dominance + Asymmetric Noise
# Brief Explanation: Induce dominance systematically (gain/noise/capacity) and measure asymmetry, spectra, and probe shifts.

@dataclass(frozen=True)
class ExpACondition:
    name: str
    gain_img: float
    gain_txt: float
    noise_image: float
    noise_text: float
    cap_img_mult: float
    cap_txt_mult: float

def run_condition_A(cond: ExpACondition) -> Dict[str, Any]:
    # Build a derived config
    dc = DataConfig(**{**asdict(CFG.data),
                      "confounder_enabled": False,
                      "pairing_corruption_rate": 0.0,
                      "noise_image": cond.noise_image,
                      "noise_text": cond.noise_text})
    mc = ModelConfig(**{**asdict(CFG.model),
                        "gain_img": cond.gain_img,
                        "gain_txt": cond.gain_txt,
                        "cap_img_hidden_mult": cond.cap_img_mult,
                        "cap_txt_hidden_mult": cond.cap_txt_mult})
    cfg = Config(paths=CFG.paths, data=dc, model=mc, train=CFG.train, probe=CFG.probe)

    # Regenerate modalities if noise differs (factors fixed)
    Xi = render_images(FACT, cfg.data, CFG.train.seed + 202)  # deterministic
    Xt = tokens_to_features(FACT, cfg.data, CFG.train.seed + 303)

    img, txt = build_models(cfg, Xi, Xt)
    # Temporary globals for training/eval helpers that reference X_img/X_txt
    global X_img, X_txt
    X_img_old, X_txt_old = X_img, X_txt
    X_img, X_txt = Xi, Xt

    # Train short schedule
    train_aligner(img, txt, train_idx, train_idx, val_idx, val_idx, cfg, epochs=cfg.train.short_epochs, tag=f"expA_{cond.name}")

    # Evaluate on test slice
    sel = test_idx[:min(700, len(test_idx))]
    E = eval_embeddings(img, txt, sel, sel, cfg)
    yi = E["embeddings"]["img"]; yt = E["embeddings"]["txt"]

    cca = cca_principal_corr(yi, yt, ridge=cfg.probe.ridge, topk=cfg.probe.cca_topk)
    mi_shape = mi_proxy(yi, FACT.shape[sel], cfg.probe.mi_bins, cfg.probe.mi_dims)

    # Restore globals
    X_img, X_txt = X_img_old, X_txt_old

    return {
        "condition": asdict(cond),
        "test_metrics": {k: v for k, v in E.items() if k != "embeddings"},
        "cca_proxy": cca,
        "mi_shape_proxy": mi_shape,
    }

condsA = [
    ExpACondition("balanced", 1.0, 1.0, CFG.data.noise_image, CFG.data.noise_text, 1.0, 1.0),
    ExpACondition("img_gain_dominant", 2.5, 1.0, CFG.data.noise_image, CFG.data.noise_text, 1.0, 1.0),
    ExpACondition("txt_gain_dominant", 1.0, 2.5, CFG.data.noise_image, CFG.data.noise_text, 1.0, 1.0),
    ExpACondition("img_low_noise", 1.0, 1.0, 0.01, CFG.data.noise_text, 1.0, 1.0),
    ExpACondition("txt_low_noise", 1.0, 1.0, CFG.data.noise_image, 0.002, 1.0, 1.0),
    ExpACondition("img_capacity_dominant", 1.0, 1.0, CFG.data.noise_image, CFG.data.noise_text, 1.6, 1.0),
    ExpACondition("txt_capacity_dominant", 1.0, 1.0, CFG.data.noise_image, CFG.data.noise_text, 1.0, 1.6),
    ExpACondition("asym_noise_img_high", 1.0, 1.0, 0.14, CFG.data.noise_text, 1.0, 1.0),
    ExpACondition("asym_noise_txt_high", 1.0, 1.0, CFG.data.noise_image, 0.12, 1.0, 1.0),
]

resultsA = []
for c in condsA:
    print("Running ExpA:", c.name)
    resultsA.append(run_condition_A(c))

# Plot: symmetry gap vs condition
names = [r["condition"]["name"] for r in resultsA]
gaps = [r["test_metrics"]["sym_gap_r@1"] for r in resultsA]
plt.figure(figsize=(10, 3))
plt.plot(np.arange(len(names)), gaps, marker="o")
plt.xticks(np.arange(len(names)), names, rotation=35, ha="right")
plt.title("Experiment A: Symmetry gap (i2t r@1 - t2i r@1)")
plt.tight_layout()
p = os.path.join(CFG.paths.plots, "expA_symmetry_gap.png")
plt.savefig(p, dpi=140); plt.close()

AM.write_json_strict(
    os.path.join(CFG.paths.exp, "expA_results.json"),
    facts_provided={"results": resultsA, "plot": p},
    assumptions={"short_training_epochs": CFG.train.short_epochs, "dominance_mechanisms": "gain/noise/capacity"},
    open_items=[],
    analysis="Suite A quantifies how dominance and asymmetric noise create measurable retrieval asymmetry, spectra distortions, and probe shifts.",
    draft_output={"worst_sym_gap": float(np.min(gaps)), "best_sym_gap": float(np.max(gaps))},
    verification_status="Not verified",
    questions_to_verify=["Do dominance signatures remain after longer training or do they self-correct?"]
)
print("ExpA complete. Plot:", p)


Running ExpA: balanced
[expA_balanced] ep=001 loss=5.1227 val_r1=0.022 sym_gap=-0.011 var_img=0.0092 offcos=0.559
[expA_balanced] ep=005 loss=1.2910 val_r1=0.244 sym_gap=0.008 var_img=0.0205 offcos=0.015
[expA_balanced] ep=010 loss=0.4768 val_r1=0.444 sym_gap=-0.003 var_img=0.0206 offcos=0.006
[expA_balanced] ep=015 loss=0.2922 val_r1=0.497 sym_gap=0.014 var_img=0.0207 offcos=0.004
Running ExpA: img_gain_dominant
[expA_img_gain_dominant] ep=001 loss=5.1227 val_r1=0.022 sym_gap=-0.011 var_img=0.0092 offcos=0.559
[expA_img_gain_dominant] ep=005 loss=1.2910 val_r1=0.244 sym_gap=0.008 var_img=0.0205 offcos=0.015
[expA_img_gain_dominant] ep=010 loss=0.4768 val_r1=0.444 sym_gap=-0.003 var_img=0.0206 offcos=0.006
[expA_img_gain_dominant] ep=015 loss=0.2922 val_r1=0.497 sym_gap=0.014 var_img=0.0207 offcos=0.004
Running ExpA: txt_gain_dominant
[expA_txt_gain_dominant] ep=001 loss=5.1227 val_r1=0.022 sym_gap=-0.011 var_img=0.0092 offcos=0.559
[expA_txt_gain_dominant] ep=005 loss=1.2910 val_r1=0.

##8.ADDITIONAL EXPERIMENTS

###8.1.OVERVIEW

**Cell 8 — Experiment Suite B: Spurious Confounders, Pair Corruption, and Counterfactual Tests**

This cell delivers the most important critique: high retrieval can be achieved for the wrong reason. We inject a confounder that exists in both modalities and predicts pairing. This is a controlled version of real issues like watermarks, metadata leakage, template artifacts, or dataset ordering signals.

The core interpretive tool is the counterfactual removal test. We train with the confounder present, then evaluate after removing it at test time. If retrieval collapses, the model relied on the confounder. That is a decisive causal diagnosis: you changed only the shortcut channel, and the system’s performance fell. Students should learn that this is stronger evidence than any embedding plot.

We also compute MI proxies to compare “information about true factors” versus “information about confounder.” A spurious system tends to show MI shifting toward the confounder, because that is the easiest predictive signal. The educational point is that the model’s objective never asked it to learn the true factors; it asked it to identify pairs among negatives. If the confounder does that job, the model will use it.

Pair corruption adds a second layer: when some fraction of pairs are wrong, the model is trained on contradictions. That can amplify shortcut reliance (because shortcuts remain consistent even when true pairing is inconsistent) or can destabilize the geometry. Interpreting corruption results requires looking at stability indicators and probe shifts, not just final retrieval.

In production, this suite implies a checklist: always search for confounders, always run invariance or removal tests, always measure directional retrieval, and never accept high retrieval alone as proof of semantic alignment. The chapter’s governed artifacts show how to document and defend that conclusion.


###8.2.CODE AND IMPLEMENTATION

In [14]:
# === Cell 8 ===
# Title: Experiment Suite B — Spurious Confounder + Pairing Corruption + Counterfactual Removal
# Brief Explanation: Inject shared confounders to force shortcut alignment, then validate via MI/CCA shifts and counterfactual test-time removal.

@dataclass(frozen=True)
class ExpBCondition:
    name: str
    conf_strength: float
    corruption_rate: float

def run_condition_B(cond: ExpBCondition) -> Dict[str, Any]:
    # Confounder ON
    dc = DataConfig(**{**asdict(CFG.data),
                      "confounder_enabled": True,
                      "confounder_strength": cond.conf_strength,
                      "pairing_corruption_rate": cond.corruption_rate})
    cfg = Config(paths=CFG.paths, data=dc, model=CFG.model, train=CFG.train, probe=CFG.probe)

    # Regenerate with confounder
    Xi = render_images(FACT, cfg.data, CFG.train.seed + 202)
    Xt = tokens_to_features(FACT, cfg.data, CFG.train.seed + 303)

    # Corrupt pairing on train only (explicit)
    rng = np.random.RandomState(CFG.train.seed + 999)
    train_perm = train_idx.copy()
    if cond.corruption_rate > 0:
        m = int(len(train_perm) * cond.corruption_rate)
        sel = rng.choice(len(train_perm), size=max(2, m), replace=False)
        perm = sel.copy(); rng.shuffle(perm)
        pair_map = train_perm.copy()
        pair_map[sel] = train_perm[perm]
    else:
        pair_map = train_perm

    img, txt = build_models(cfg, Xi, Xt)

    # Swap globals for shared helpers
    global X_img, X_txt
    X_img_old, X_txt_old = X_img, X_txt
    X_img, X_txt = Xi, Xt

    # Train short schedule
    train_aligner(img, txt, train_idx, pair_map, val_idx, val_idx, cfg, epochs=cfg.train.short_epochs, tag=f"expB_{cond.name}")

    # Evaluate normal test (confounder present)
    sel = test_idx[:min(700, len(test_idx))]
    E = eval_embeddings(img, txt, sel, sel, cfg)
    yi = E["embeddings"]["img"]; yt = E["embeddings"]["txt"]

    mi_shape = mi_proxy(yi, FACT.shape[sel], cfg.probe.mi_bins, cfg.probe.mi_dims)
    mi_conf = mi_proxy(yi, FACT.confounder[sel], cfg.probe.mi_bins, cfg.probe.mi_dims)
    cca = cca_principal_corr(yi, yt, ridge=cfg.probe.ridge, topk=cfg.probe.cca_topk)

    # Counterfactual removal at test: remove confounder channels and reevaluate using same trained model
    dc_cf = DataConfig(**{**asdict(cfg.data), "confounder_enabled": False})
    Xi_cf = render_images(FACT, dc_cf, CFG.train.seed + 202)
    Xt_cf = tokens_to_features(FACT, dc_cf, CFG.train.seed + 303)
    X_img, X_txt = Xi_cf, Xt_cf
    E_cf = eval_embeddings(img, txt, sel, sel, cfg)

    # Restore globals
    X_img, X_txt = X_img_old, X_txt_old

    return {
        "condition": asdict(cond),
        "metrics_confounder_present": {k: v for k, v in E.items() if k != "embeddings"},
        "metrics_counterfactual_removed": {k: v for k, v in E_cf.items() if k != "embeddings"},
        "mi_shape_proxy": mi_shape,
        "mi_confounder_proxy": mi_conf,
        "cca_proxy": cca,
        "shortcut_signature": {
            "mi_conf_minus_mi_shape": float(mi_conf - mi_shape),
            "delta_r@1_counterfactual": float(E["i2t"]["r@1"] - E_cf["i2t"]["r@1"]),
        },
    }

condsB = [
    ExpBCondition("conf_low_corr0", conf_strength=0.5, corruption_rate=0.0),
    ExpBCondition("conf_med_corr0", conf_strength=1.0, corruption_rate=0.0),
    ExpBCondition("conf_high_corr0", conf_strength=2.0, corruption_rate=0.0),
    ExpBCondition("conf_high_corr10", conf_strength=2.0, corruption_rate=0.10),
    ExpBCondition("conf_high_corr25", conf_strength=2.0, corruption_rate=0.25),
]

resultsB = []
for c in condsB:
    print("Running ExpB:", c.name)
    resultsB.append(run_condition_B(c))

# Plot: counterfactual r@1 drop vs confounder strength
xs = [r["condition"]["conf_strength"] for r in resultsB]
drops = [r["shortcut_signature"]["delta_r@1_counterfactual"] for r in resultsB]
plt.figure()
plt.plot(xs, drops, marker="o")
plt.title("Experiment B: Counterfactual removal (r@1 drop) vs confounder strength")
plt.xlabel("Confounder strength")
plt.ylabel("Δ r@1 (present - removed)")
plt.tight_layout()
p = os.path.join(CFG.paths.plots, "expB_counterfactual_drop.png")
plt.savefig(p, dpi=140); plt.close()

AM.write_json_strict(
    os.path.join(CFG.paths.exp, "expB_results.json"),
    facts_provided={"results": resultsB, "plot": p},
    assumptions={"confounder": "shared nuisance marker injected into both modalities", "counterfactual": "re-rendered data without confounder"},
    open_items=[],
    analysis="Suite B demonstrates spurious shortcut alignment when a shared confounder is present, validated by MI shift toward confounder and performance collapse under counterfactual removal.",
    draft_output={"max_counterfactual_drop": float(np.max(drops)), "min_counterfactual_drop": float(np.min(drops))},
    verification_status="Not verified",
    questions_to_verify=["How does shortcut reliance change with batch size and negative sampling hardness?"]
)
print("ExpB complete. Plot:", p)


Running ExpB: conf_low_corr0
[expB_conf_low_corr0] ep=001 loss=5.1185 val_r1=0.017 sym_gap=-0.008 var_img=0.0093 offcos=0.550
[expB_conf_low_corr0] ep=005 loss=1.3406 val_r1=0.203 sym_gap=-0.006 var_img=0.0205 offcos=0.013
[expB_conf_low_corr0] ep=010 loss=0.5120 val_r1=0.350 sym_gap=0.000 var_img=0.0206 offcos=0.007
[expB_conf_low_corr0] ep=015 loss=0.3171 val_r1=0.369 sym_gap=-0.017 var_img=0.0207 offcos=0.005
Running ExpB: conf_med_corr0
[expB_conf_med_corr0] ep=001 loss=5.1147 val_r1=0.017 sym_gap=-0.008 var_img=0.0094 offcos=0.548
[expB_conf_med_corr0] ep=005 loss=1.1974 val_r1=0.261 sym_gap=-0.033 var_img=0.0205 offcos=0.014
[expB_conf_med_corr0] ep=010 loss=0.4897 val_r1=0.375 sym_gap=-0.008 var_img=0.0207 offcos=0.006
[expB_conf_med_corr0] ep=015 loss=0.3288 val_r1=0.364 sym_gap=-0.033 var_img=0.0207 offcos=0.004
Running ExpB: conf_high_corr0
[expB_conf_high_corr0] ep=001 loss=5.0997 val_r1=0.025 sym_gap=-0.003 var_img=0.0095 offcos=0.541
[expB_conf_high_corr0] ep=005 loss=1.02

##9.COLLAPSE INDUCTION AND DETECTION

###9.1.0VERVIEW

**Cell 9 — Collapse Induction and Detection: Temperature, Learning Rate, Weight Decay, and Stop-Gradient Bugs**

This cell treats collapse as an engineered failure mode with triggers and signatures. Students often hear about collapse as a vague phenomenon. Here we show it can be induced on demand.

Low temperature makes the contrastive softmax extremely sharp. That forces aggressive separation and can produce unstable gradients. High learning rate amplifies instability, potentially overshooting into degenerate regions. Heavy weight decay can suppress representational diversity by penalizing weight magnitude too strongly. The stop-gradient simulation illustrates a production-relevant lesson: some catastrophic multimodal failures are not conceptual—they are implementation bugs. If one encoder stops learning, the system can look partially aligned but is structurally broken.

Detection is purely geometric. We watch mean off-diagonal cosine similarity rise when embeddings become too similar. We watch variance drop, indicating loss of degrees of freedom. We watch effective rank shrink, and we examine eigenvalue/singular value concentration. These are the signatures of a degenerate embedding space.

The educational interpretation is crucial: loss curves can mislead. You can have decreasing loss while the representation becomes unusable for downstream tasks. That is why collapse monitoring must be part of training instrumentation and acceptance criteria. In production, collapse indicators should be stage gates: if mean cosine exceeds a threshold or rank falls below a floor, training should halt or roll back.

This cell also teaches the difference between “performance failure” and “representation failure.” Collapse is representation failure. It often predicts poor transfer, poor robustness, and unpredictable behavior under shift. The right response is not to “try more epochs,” but to adjust geometry controls, optimizer stability, and diversity-preserving mechanisms.


###9.2.CODE AND IMPLEMENTATION

In [None]:
# === Cell 9 ===
# Title: Collapse Induction + Detection (Temperature, LR, Weight Decay, Stop-Gradient Simulation)
# Brief Explanation: Force controlled collapse mechanisms and detect them with variance, mean cosine, effective rank, and retrieval degradation.

@dataclass(frozen=True)
class CollapseCondition:
    name: str
    temp: float
    lr: float
    weight_decay: float
    stop_grad_txt: bool

def train_with_collapse_trigger(cond: CollapseCondition) -> Dict[str, Any]:
    # Derived config
    mc = ModelConfig(**{**asdict(CFG.model), "temp": cond.temp})
    tc = TrainConfig(**{**asdict(CFG.train), "lr": cond.lr, "weight_decay": cond.weight_decay})
    cfg = Config(paths=CFG.paths, data=DataConfig(**{**asdict(CFG.data), "confounder_enabled": False}), model=mc, train=tc, probe=CFG.probe)

    img, txt = build_models(cfg, X_img, X_txt)
    history = []
    for ep in range(1, cfg.train.short_epochs + 1):
        batches = batch_iter(train_idx, cfg.train.batch, cfg.train.seed + ep * 131)
        ep_loss = 0.0
        for b in batches:
            Xi = X_img[b]
            Xt = X_txt[b]
            ci = img.forward(Xi)
            ct = txt.forward(Xt)
            zi = ci["z"] * cfg.model.gain_img
            zt = ct["z"] * cfg.model.gain_txt
            yi, ni = l2_normalize(zi)
            yt, nt = l2_normalize(zt)
            loss, dYi, dYt, _ = info_nce_symmetric(yi, yt, cfg.model.temp)
            ep_loss += loss * float(len(b))
            dZi = l2_normalize_backward(dYi, zi, ni) * cfg.model.gain_img
            dZt = l2_normalize_backward(dYt, zt, nt) * cfg.model.gain_txt
            gi = img.backward(ci, dZi.astype(np.float32))
            if cond.stop_grad_txt:
                gt = {k: np.zeros_like(v) for k, v in txt.params().items()}  # intentional bug simulation
            else:
                gt = txt.backward(ct, dZt.astype(np.float32))
            img.adamw_step(gi, cfg.train.lr, cfg.train.b1, cfg.train.b2, cfg.train.eps, cfg.train.weight_decay, cfg.train.grad_clip)
            txt.adamw_step(gt, cfg.train.lr, cfg.train.b1, cfg.train.b2, cfg.train.eps, cfg.train.weight_decay, cfg.train.grad_clip)

        ep_loss /= float(len(train_idx))
        sel = val_idx[:min(500, len(val_idx))]
        E = eval_embeddings(img, txt, sel, sel, cfg)
        yi_e = E["embeddings"]["img"]; yt_e = E["embeddings"]["txt"]
        hist = {
            "epoch": ep,
            "loss": float(ep_loss),
            "val_r1": float(E["i2t"]["r@1"]),
            "mean_offdiag_cos_img": float(E["mean_offdiag_cos_img"]),
            "mean_offdiag_cos_txt": float(E["mean_offdiag_cos_txt"]),
            "var_img": float(E["var_img"]),
            "var_txt": float(E["var_txt"]),
            "eff_rank_img": float(E["spectrum_img"]["effective_rank"]),
            "eff_rank_txt": float(E["spectrum_txt"]["effective_rank"]),
        }
        history.append(hist)

    # Final detection summary on test subset
    selT = test_idx[:min(700, len(test_idx))]
    E2 = eval_embeddings(img, txt, selT, selT, cfg)
    return {"condition": asdict(cond), "history": history, "final_test": {k: v for k, v in E2.items() if k != "embeddings"}}

collapse_conds = [
    CollapseCondition("normal_ref", temp=0.07, lr=2e-3, weight_decay=8e-5, stop_grad_txt=False),
    CollapseCondition("low_temp", temp=0.015, lr=2e-3, weight_decay=8e-5, stop_grad_txt=False),
    CollapseCondition("high_lr", temp=0.07, lr=1.2e-2, weight_decay=8e-5, stop_grad_txt=False),
    CollapseCondition("heavy_wd", temp=0.07, lr=2e-3, weight_decay=8e-3, stop_grad_txt=False),
    CollapseCondition("stop_grad_txt", temp=0.07, lr=2e-3, weight_decay=8e-5, stop_grad_txt=True),
]

collapse_results = []
for c in collapse_conds:
    print("Running collapse:", c.name)
    collapse_results.append(train_with_collapse_trigger(c))

# Plot collapse indicators: mean offdiag cosine and effective rank across conditions (final epoch)
plt.figure(figsize=(10, 3))
x = np.arange(len(collapse_results))
off = [r["history"][-1]["mean_offdiag_cos_img"] for r in collapse_results]
rk = [r["history"][-1]["eff_rank_img"] for r in collapse_results]
plt.plot(x, off, marker="o", label="mean offdiag cosine (img)")
plt.plot(x, rk, marker="o", label="effective rank (img)")
plt.xticks(x, [r["condition"]["name"] for r in collapse_results], rotation=30, ha="right")
plt.title("Collapse indicators at final epoch (validation slice)")
plt.legend()
plt.tight_layout()
p = os.path.join(CFG.paths.plots, "collapse_indicators.png")
plt.savefig(p, dpi=140); plt.close()

AM.write_json_strict(
    os.path.join(CFG.paths.exp, "collapse_report.json"),
    facts_provided={"results": collapse_results, "plot": p},
    assumptions={"collapse_triggers": "low temp / high lr / heavy wd / stop-grad bug"},
    open_items=[],
    analysis="Collapse suite forces degenerate embedding regimes and records early-warning signatures that can masquerade as stable optimization.",
    draft_output={"plot": p},
    verification_status="Not verified",
    questions_to_verify=["Which indicator is most predictive of downstream failure: mean cosine, variance, or effective rank?"]
)
print("Collapse suite complete. Plot:", p)


##10.AUDIT BUNDLE

###10.1.OVERVIEW

**Cell 10 — Summary, Final Logs, and Audit Bundle Packaging**

This cell completes the governance loop. It is where the notebook stops being a sequence of computations and becomes a deliverable suitable for teaching, review, and operational reuse.

The summary artifact consolidates evidence across baseline, dominance sweeps, confounder/corruption experiments, and collapse triggers. The pedagogical reason for consolidation is that students should learn to compare failure modes systematically: which one breaks symmetry fastest, which one causes the largest counterfactual drop, which one produces the strongest collapse indicators. This is how you build a mental taxonomy grounded in data rather than anecdotes.

Finalizing the prompts log, risk log, and manifest reinforces a professional stance: experiments are not complete until their provenance is recorded. The zip bundle is not just convenience; it is the unit of reproducibility. You can hand it to a student, a colleague, or a reviewer, and they can inspect the same metrics and plots you saw.

The plain-text “Key Findings” block is intentionally modest. It prints computed extremes (best/worst symmetry gap, largest counterfactual drop, strongest collapse indicator) without claiming more than what the metrics show. That models a governance-first reporting style: summarize what is measured; keep verification status “Not verified”; list questions to verify.

In production-grade implementation, this cell represents how you operationalize multimodal evaluation. Every training run should produce an evidence package: configuration, environment fingerprint, monitored indicators, stress results, and acceptance gates. If a model is deployed, the evidence package becomes part of the model card and audit trail. If a model fails in production, the package helps you reproduce conditions and identify whether the failure resembles dominance, spurious alignment, corruption sensitivity, or collapse.

The final lesson: multimodal systems are not “deployed models,” they are “deployed measurement systems.” This cell demonstrates how to package those measurements so that the system can be governed, not merely admired.


###10.2.CODE AND IMPLEMENTATION

In [None]:
# === Cell 10 ===
# Title: Summary, Final Logs, and Audit Bundle Packaging
# Brief Explanation: Consolidate experiment evidence into strict JSON, finalize prompt/risk logs and manifest, and zip the full audit bundle.

def summarize_suite(resultsA: List[Dict[str, Any]], resultsB: List[Dict[str, Any]], collapse_results: List[Dict[str, Any]]) -> Dict[str, Any]:
    # Extract compact, reviewable summaries
    def pick_metricA(r: Dict[str, Any]) -> Tuple[str, float, float]:
        name = r["condition"]["name"]
        r1 = r["test_metrics"]["i2t"]["r@1"]
        gap = r["test_metrics"]["sym_gap_r@1"]
        return name, float(r1), float(gap)

    A_tbl = [pick_metricA(r) for r in resultsA]
    B_tbl = []
    for r in resultsB:
        B_tbl.append({
            "name": r["condition"]["name"],
            "conf_strength": r["condition"]["conf_strength"],
            "corruption_rate": r["condition"]["corruption_rate"],
            "r1_present": r["metrics_confounder_present"]["i2t"]["r@1"],
            "r1_removed": r["metrics_counterfactual_removed"]["i2t"]["r@1"],
            "delta_r1": r["shortcut_signature"]["delta_r@1_counterfactual"],
            "mi_conf_minus_mi_shape": r["shortcut_signature"]["mi_conf_minus_mi_shape"],
        })
    C_tbl = []
    for r in collapse_results:
        h = r["history"][-1]
        C_tbl.append({
            "name": r["condition"]["name"],
            "temp": r["condition"]["temp"],
            "lr": r["condition"]["lr"],
            "weight_decay": r["condition"]["weight_decay"],
            "stop_grad_txt": r["condition"]["stop_grad_txt"],
            "val_r1_last": h["val_r1"],
            "mean_offdiag_cos_img_last": h["mean_offdiag_cos_img"],
            "var_img_last": h["var_img"],
            "eff_rank_img_last": h["eff_rank_img"],
        })
    return {"suiteA": A_tbl, "suiteB": B_tbl, "collapse": C_tbl}

summary = summarize_suite(resultsA, resultsB, collapse_results)

summary_path = os.path.join(CFG.paths.root, "summary.json")
AM.write_json_strict(
    summary_path,
    facts_provided={
        "baseline_report": "baseline_report.json",
        "suiteA": "deliverables/experiments/expA_results.json",
        "suiteB": "deliverables/experiments/expB_results.json",
        "collapse": "deliverables/experiments/collapse_report.json",
        "compact_summary_tables": summary,
        "plots": [f for f in os.listdir(CFG.paths.plots) if f.endswith(".png")],
    },
    assumptions={
        "scope": "Synthetic-only multimodal failure-modes lab. Claims limited to controlled generator and defined stresses.",
        "acceptance": "No acceptance claim; verification_status remains Not verified.",
    },
    open_items=RISKS["open_questions"],
    analysis="Summary consolidates dominance, confounding, corruption, and collapse evidence into a single reviewable artifact. Metrics are computed, not inferred.",
    draft_output={
        "key_signatures": {
            "dominance": "Large symmetry gap and gradient imbalance; CCA shifts.",
            "spurious_alignment": "Counterfactual removal triggers r@1 drop; MI shifts toward confounder.",
            "collapse": "High mean offdiag cosine + low effective rank + degraded retrieval.",
        }
    },
    verification_status="Not verified",
    questions_to_verify=[
        "Do these signatures hold with different factor cardinalities and higher-resolution images?",
        "How do signatures change with harder negatives (larger batch) and different temperatures?",
        "Which probe combination best predicts brittleness under unseen shifts?",
    ],
)

# Finalize prompt log + risk log + manifest
AM.write_prompts_log()
AM.write_risk_log(RISKS)
manifest_path = AM.write_manifest()

# Zip bundle
zip_path = os.path.join(CFG.paths.root, f"audit_bundle_{AM.run_id}.zip")
with zipfile.ZipFile(zip_path, "w", compression=zipfile.ZIP_DEFLATED) as z:
    for root, _, files in os.walk(CFG.paths.root):
        for fn in files:
            if fn.endswith(".zip"):
                continue
            fp = os.path.join(root, fn)
            z.write(fp, arcname=os.path.relpath(fp, CFG.paths.root))

print("AUDIT BUNDLE WRITTEN")
print("  run_id:", AM.run_id)
print("  manifest:", manifest_path)
print("  summary :", summary_path)
print("  zip     :", zip_path)

# Plain-text key findings derived strictly from computed metrics
def _find_extremes_A(tbl):
    # tbl: (name, r1, gap)
    best_r1 = max(tbl, key=lambda x: x[1])
    worst_r1 = min(tbl, key=lambda x: x[1])
    worst_gap = max(tbl, key=lambda x: abs(x[2]))
    return best_r1, worst_r1, worst_gap

A_tbl = summary["suiteA"]
best_r1, worst_r1, worst_gap = _find_extremes_A(A_tbl)

max_drop = max(summary["suiteB"], key=lambda r: r["delta_r1"]) if summary["suiteB"] else None
worst_collapse = max(summary["collapse"], key=lambda r: r["mean_offdiag_cos_img_last"]) if summary["collapse"] else None

print("\nKEY FINDINGS (compute


##11.CONCLUSION

**Conclusion**

This chapter had a deliberately critical purpose: to replace the “multimodal alignment is magic” intuition with a disciplined, testable understanding of what alignment is, what it is not, and how it fails. The main lesson is not that multimodal learning is unreliable. The lesson is that multimodal learning is an engineered geometry whose reliability depends on measurable conditions: data construction, pairing integrity, modality balance, objective temperature, and the stability of representation diversity. If you can measure those conditions and stress them, you can treat multimodal systems professionally. If you cannot, you are operating on faith.

**Main results**

The baseline system demonstrated that a shared embedding space can be learned by construction when two modalities encode the same latent causes. Retrieval metrics (image→text and text→image) rose together, and the symmetry gap stayed small when we kept the generator clean, the encoders balanced, and the objective numerically stable. But the more important baseline result was structural: the embedding space retained measurable information about ground-truth factors. That fact was not inferred from a plot; it was probed. The mutual-information proxy between embeddings and factors remained materially above trivial levels, and the CCA-like principal correlations between modalities were coherent rather than degenerate. In plain terms: the baseline was not only “good at matching,” it was plausibly encoding some of the intended structure.

Once we began adversarial experiments, we saw four distinct failure signatures, each with its own measurable fingerprint.

First, **modality dominance** reliably produced asymmetry: one direction of retrieval remained strong while the other degraded, even when overall loss looked acceptable. This was accompanied by gradient imbalance and differences in spectral summaries (effective rank and top eigenvalues), indicating that the two encoders were not co-evolving symmetrically. The geometry was being anchored by one side and accommodated by the other.

Second, **spurious alignment via a shared confounder** produced the most pedagogically important result: high retrieval can coexist with loss of meaningful factor encoding. When the confounder was present, the model learned a shortcut that predicted pairing. The decisive evidence came from the counterfactual: removing the confounder at test time caused retrieval to drop sharply, revealing that the alignment was not based on the intended factors. The mutual-information proxy shifted toward the confounder and away from the true factors. This is the core critique of naive multimodal evaluation: if you only measure retrieval on the same distribution that contains the confounder, you will certify the shortcut as “meaning.”

Third, **pairing corruption** degraded alignment in a structurally different way. It did not merely lower retrieval. It destabilized training signals: the model received contradictory supervision and responded by weakening factor encoding or becoming hypersensitive to batch composition. Corruption can push models toward brittle compromises—partial shortcuts, reduced discriminative structure, and inflated uncertainty. Importantly, corruption is not an exotic corner case; it is a common reality in large-scale scraped multimodal datasets. The lab makes that reality legible.

Fourth, **representation collapse** emerged as a geometrical degeneration rather than a simple performance decline. Under collapse triggers—overly low temperature, overly high learning rate, excessive weight decay, or an intentional stop-gradient bug—embeddings became too similar. Mean off-diagonal cosine similarity rose, per-dimension variance fell, effective rank contracted, and spectral mass concentrated into a few directions. Some collapse regimes can still show a decreasing loss curve for a while, which is precisely why collapse is dangerous: optimization progress can be misleading.

Together these results form a consistent message: multimodal failure is not one thing; it is a family of distinct mechanisms that require distinct diagnostics. The chapter is not teaching pessimism. It is teaching taxonomy.

**Data construction: why it matters and what it taught**

The synthetic world was not a toy for its own sake. It was a methodological necessity. In real datasets, you rarely know what “true factors” are, and you rarely know whether alignment is meaningful or shortcut-based. By constructing the data, we created a ground truth that allows us to test interpretability claims with causally grounded probes.

The construction also clarified a subtle point: multimodal “meaning” is not an abstract property; it is a statement about invariances. In our generator, the same factor configuration produces a consistent pair across modalities. That makes it possible to ask the question that matters: does the embedding space preserve those factor relationships, or does it collapse to something else? When we injected confounders, we learned how easily a system can replace intended invariances with unintended ones. The data construction therefore served as a microscope: it let us see what real-world training hides.

For production-grade work, the analog is clear. You may not be able to fully control real data, but you must control the *measurement* of its risk. That means documenting likely confounders, auditing metadata channels, checking for watermark-like artifacts, and establishing split hygiene that breaks shortcuts (for example, splitting by source, time, device, or template rather than randomly). The synthetic lab teaches you what to look for.

**Training: what actually mattered**

The training loop revealed that alignment quality is governed less by architectural mystique and more by a few concrete engineering parameters.

The symmetric contrastive objective matters because it forces reciprocal compatibility. If you train only one direction, you can accidentally certify a space that works one way but fails the other. Production systems often encounter this: a model that is good at retrieving captions for images may not be equally good at retrieving images for captions, which matters depending on the downstream product.

Temperature acts as a geometry control knob. Too low and you force overly sharp discrimination that can destabilize gradients and encourage collapse. Too high and you weaken discriminative pressure. The key production lesson is that temperature is not a “tuning detail.” It changes the hardness of the matching task and reshapes the embedding distribution.

Optimizer settings matter in a predictable way. High learning rates can mimic collapse-like behavior; heavy weight decay can suppress representational diversity. The point is not that you must memorize safe values, but that you must treat these settings as part of the risk surface and monitor structural indicators during training.

Finally, logging and artifact creation were not bureaucracy—they were the training methodology. Early warning indicators provided time-series evidence of drift before the final metrics “looked bad.” This is exactly the posture you want in production: detect degradation early, when you can still intervene, rather than after deployment.

**Experiments: how to interpret them**

The experiments are best interpreted as stress tests on a hypothesis: “the embedding space represents the intended cross-modal factors in a balanced, stable way.” Each stress test attacks a different assumption.

Dominance attacks the assumption of symmetric influence. When dominance conditions cause retrieval asymmetry and gradient imbalance, the interpretation is not merely “performance dropped.” The interpretation is “the shared space is not jointly negotiated; one modality is dictating the coordinate system.” In production, this means you should expect brittle behavior when the weaker modality shifts distribution, because it never truly shaped the space.

Spurious confounding attacks the assumption that matching equals meaning. When counterfactual confounder removal causes a sharp drop, the interpretation is “the model was matching on the shortcut.” In production, this becomes a mandate: any high-performing multimodal system must be tested with counterfactual removal or invariance tests that disrupt likely shortcuts. Otherwise, you are deploying a confounder detector.

Pairing corruption attacks the assumption of clean supervision. When corruption degrades factor MI proxies and increases sensitivity, the interpretation is “the representation is being trained on contradictions and is finding an unstable compromise.” In production, this motivates data curation, robust training approaches, and at minimum corruption-aware evaluation that estimates how much noise the system can tolerate.

Collapse induction attacks the assumption that optimization progress implies representation health. When mean cosine rises and rank collapses, the interpretation is “the system is losing degrees of freedom.” In production, this is a control problem: collapse indicators must be monitored and used as stage gates, not as after-the-fact explanations.

The most important interpretive habit is triangulation. Retrieval is necessary but not sufficient. You must add structural probes: information proxies, correlation structure, sensitivity measures, and spectral health. When these disagree, treat the disagreement as a warning, not as a puzzle to average away.

**Implications in practice: using these results for production-grade implementation**

A production-grade multimodal system is not “a bigger model.” It is a system with controls. This chapter gives you a minimal set of controls that generalize beyond the synthetic world.

**1) Data governance and split hygiene as first-class engineering**  
Before you tune temperature or add capacity, audit your dataset for confounders. Treat metadata as a high-risk channel: filenames, timestamps, source identifiers, template structure, repeated headers, watermarks. Design splits that break shortcuts. Random splits can preserve confounders across train and test; source-based or time-based splits often expose them. Create explicit counterfactual test sets where suspected shortcuts are removed or randomized.

**2) Balance modalities intentionally**  
If one modality is much cleaner or much higher capacity, dominance is likely. In production, this can happen when text is curated but images are noisy, or when one encoder is pretrained more strongly. Controls include calibrated augmentation/noise, balanced batch construction, symmetric loss, and explicit monitoring of gradient imbalance and retrieval asymmetry. If asymmetry is high, treat it as a defect unless your product is genuinely one-directional.

**3) Monitor representation health during training**  
Add early warning indicators as standard logging, not as research extras. Track embedding variance, mean off-diagonal cosine similarity, effective rank, and covariance spectra. Define thresholds that trigger investigation or rollback. This is analogous to monitoring loss and accuracy, but it monitors the geometry you are actually deploying.

**4) Probe for meaning, not just matching**  
In real datasets you may not have true latent factors, but you can still design probes. Use factor-like labels where available, build controlled subsets, or create synthetic overlays. Use CCA-like diagnostics to detect whether modalities are genuinely coupled or merely coexisting. Use sensitivity proxies: if one batch perturbation dramatically changes retrieval, your system is brittle.

**5) Stress test routinely and treat stress as acceptance criteria**  
Do not accept a model because it wins on a clean validation set. Accept it because it degrades gracefully under the kinds of shifts and corruptions you expect in production: noise in one modality, partial pairing errors, removal of metadata, changes in style or source. The chapter’s degradation curves are not academic—this is how you define operational risk.

**6) Treat “Not verified” as a professional invariant**  
Even in production, you should keep a disciplined separation between what you measured and what you assume. Your reports should state: what was measured, under what conditions, what is open, and what must be verified by additional evidence. The audit artifacts produced by the notebook illustrate how to make this separation explicit.

In sum, Chapter 2 teaches a frontier capability with a conservative posture. Multimodal alignment is powerful, but it is not self-authenticating. You learn to trust it by breaking it on purpose, measuring how it breaks, and deploying only under controls that detect and constrain those break modes. This is exactly what “production-grade” means in the multimodal frontier: not that the model is impressive, but that the system is reviewable, stress-tested, and governed as a fragile geometric contract between modalities.
