PLOT (Progressive Localization via Optimal Transport) submissions to the MIB Causal Variable Localization Track. PLOT picks (layer, token-position) sites via two-stage Sinkhorn OT, then trains DAS rotations only at the picked sites — targeting baseline-DAS-comparable accuracy at ≤10× fewer rotations trained.
What this repo ships is the PLOT-DAS variant from the source paper (Stage A + Stage B + DAS at picked sites). Other PLOT variants in the paper — PLOT (localization only), PLOT-native / PLOT-PCA (Stage B handles in native or PCA coords), Full DAS — aren't implemented here. Source repo for the paper: https://github.com/jchang153/causal-abstractions-ot.
12 of 26 cells with submissions (46.2%). 11 at full quality, 1 (arithmetic) at smoke.
Of the 12 shipped:
| status | cells | mechanism |
|---|---|---|
| 🏆 win/tie vs DAS leaderboard | 1 (Qwen pointer), 7 (ARC pointer), 8 (ARC answer)*, 22 (RAVEL Continent) | PLOT picks well |
| 📏 small structural gap (~5–7%) | 3 (Gemma pointer), 4 (Gemma answer) | confirmed outside seed band |
| ❌ documented structural gap | 2 (Qwen answer), 13/14 (IOI), 21/23 (RAVEL Country/Language) | each diagnosed in PLOT_SHORTCOMINGS.md |
| ⚠ smoke quality | 11 (arithmetic) | scale-up regressed; reverted to smoke |
* Cell 8's 0.999 score includes a non-obvious mechanism — see PLOT_SHORTCOMINGS.md §15.
The other 14 of 26 cells require ≥16 GB GPU (Qwen/Gemma IOI + 10 Llama cells); deferred to cloud.
The PLOT algorithm is from https://github.com/jchang153/causal-abstractions-ot. This repo ships the PLOT-DAS variant of the paper (Stage A localizes layers, Stage B localizes positions within those layers, DAS rotations are trained at the picked sites). Algorithm-shape-wise we match the paper's binary-addition pipeline experiments/binary_addition/ (formerly the codex/binary-addition-two-stage-plot branch). What is not a direct port of paper code:
| benchmark | paper status | what we did |
|---|---|---|
| Binary addition | in paper | not in scope here (not a MIB cell) |
| MCQA (cells 1–4) | in paper | re-implemented PLOT-DAS against MIB's MCQA dataset using the binary-addition framework; not a port of experiments/mcqa/ scripts |
| Two-digit addition (cell 11) | in paper | re-implemented against MIB's arithmetic task; not a port of experiments/two_digit_addition/ |
| ARC (cells 7, 8) | not in paper | new application — V=4 OT rows over symbol0..3 letter swaps |
| RAVEL (cells 21, 22, 23) | not in paper | new application — V=3 attribute rows + per-row dataset filter + causal-model-derived alphabet |
| IOI (cells 13, 14) | not in paper | new application — attention-head featurizers, PatchAttentionHeads joint DAS, IOI-specific linear-params bootstrap |
Things in this repo that aren't in the source paper:
- Per-row dataset filter for RAVEL (each OT row's signature collected only on bases where
queried_attribute == row_variable) - Attention-head dispatch for IOI: 3-tuple site keys
(layer, head, token_pos), joint DAS across picked heads, MSE-on-logit-diff loss - Eval-driver patches (
scripts/eval_cell.py): per-taskmax_new_tokensoverride,LMPipeline.loadposition_idsfallback for transformers 5.x, Qwen2head_diminjection — these are MIB-harness-specific --seedflag + seed-variance sweep methodology- ARC
stage_b_top_k_grid=(1,)tweak (PLOT_SHORTCOMINGS §8) and the resulting DAS-vs-identity finding (§15)
Scoring difference: we report MIB's IIA aggregation (per-split → per-layer max → highest-view across layers). Numbers in mib_submission/results/RESULTS.md are directly comparable to MIB's DAS leaderboard but not directly comparable to paper tables, which use task-specific metrics.
Things the paper has that this repo doesn't implement: PLOT (localization-only, no DAS), PLOT-native and PLOT-PCA (Stage B handles in native or PCA basis), PLOT-native-DAS and PLOT-PCA-DAS (DAS guided by Stage B support), Full DAS (DAS over all sites). We only ship PLOT-DAS.
CLAUDE.md— project context, status table, leaderboard comparison, rollout plan. Dense; engineer-oriented.WALKTHROUGHS.md— per-task prose walkthroughs of the PLOT-DAS pipeline (MCQA, ARC, arithmetic, RAVEL, IOI). Reads top-to-bottom on GitHub.notebooks/residual_walkthrough.ipynb— end-to-end Jupyter walkthrough of the residual-stream pipeline on RAVEL × Continent (cell 22), with the Stage A Sinkhorn-plan heatmap.notebooks/ioi_walkthrough.ipynb— end-to-end Jupyter walkthrough of the IOI attention-head pipeline (cell 14), including the E-I-2 diagnostic that surfaced the PLOT-picks-wrong-heads finding.PLOT_SHORTCOMINGS.md— 15-section catalog of diagnosed limitations. Read this for a calibrated view of where PLOT works vs doesn't, and why.mib_submission/results/RESULTS.md— auto-generated per-cell IIA tables.mib_submission/results/CELLS.md— per-cell status tracker.JOURNAL.md— methodological narrative, append-only by date. The full engineering record.HYPOTHESES.md— experimental hypotheses and outcomes from the diagnostic sessions.
PLOT trains DAS rotations at 2–7 picked sites per cell vs the baseline's 72 sites (every layer × token position). On cells where PLOT's site selection is well-matched to the task, scores are competitive at 10–25× fewer trained rotations. On cells where PLOT's signature design picks the wrong sites, the gap to baseline DAS is structural and documented.
A surprise finding from the diagnostic sessions: PLOT's value-add is concentrated in layer selection (Stage A). Stage B (position selection) and DAS training can be subtractive on some cells — see PLOT_SHORTCOMINGS.md §15. A leaner "Stage A only" PLOT remains an open follow-up.
The MIB/ submodule, submissions/, logs/, models/, and .venv-mib/ are gitignored — they need to be created locally.
git clone https://github.com/bojro/plot-mib-submissions.git
cd plot-mib-submissions
# Pull the MIB harness (gitignored). On WSL clone to ~ and symlink because
# /mnt/c can't chmod the git lockfiles in MIB hooks.
git clone https://github.com/aaronmueller/MIB.git ~/MIB
cd ~/MIB && git checkout b69dabe9899251d4a8fe90789afa4d655afc84c7
git submodule update --init --recursive MIB-causal-variable-track
cd -
ln -sf ~/MIB MIB
# Python 3.12 venv (sae_lens dep requires it). Install uv if needed:
# curl -LsSf https://astral.sh/uv/install.sh | sh
~/.local/bin/uv venv --python 3.12 ~/.venv-mib
ln -sf ~/.venv-mib .venv-mib
~/.local/bin/uv pip install --python ~/.venv-mib/bin/python \
-r MIB/MIB-causal-variable-track/requirements.txt
# HuggingFace token (Gemma is gated — accept the license at huggingface.co/google/gemma-2-2b first)
mkdir -p ~/.cache/huggingface
echo -n 'hf_<your_token>' > ~/.cache/huggingface/tokenSanity check the install:
.venv-mib/bin/python -c "import torch; print('CUDA:', torch.cuda.is_available())"
.venv-mib/bin/python -c "from huggingface_hub import whoami; print(whoami()['name'])"
.venv-mib/bin/python -m pytest tests/ -q # should report 126 passed# Verify the 10 shipped triplets pass the harness's submission format check
.venv-mib/bin/python MIB/MIB-causal-variable-track/verify_submission.py submissions/plot
# Look at per-cell scores
cat mib_submission/results/RESULTS.md
cat mib_submission/results/CELLS.md# Run PLOT on one cell. Defaults are loaded from mib_submission/plot/configs.py
# (per-task PlotConfig + RunConfig presets). Common overrides shown.
.venv-mib/bin/python -m mib_submission.plot.run \
--task <TASK> \
--model <HF_MODEL_NAME> \
--variable <VARIABLE_NAME> \
[--n-features N] \
[--epochs N] \
[--dataset-size N] \
[--train-batch-size 16] # use for ARC/RAVEL on 8 GB to avoid OOM
[--seed N] # for variance sweeps
[--bypass-sites "L:tok,L:tok"] # skip Stage A/B with manual picks
> logs/<cell>.log 2>&1<TASK> is one of 4_answer_MCQA, ARC_easy, arithmetic, ravel_task, ioi_task. The cell folder is created at submissions/plot/<task>_<modelClass>_<variable>/.
The MIB harness's evaluate_submission.py has two harness-specific quirks (no --no-private_data flag, max_new_tokens=1 hardcoded for all tasks) that break arithmetic, RAVEL, and IOI evals. We ship a patched driver:
.venv-mib/bin/python scripts/eval_cell.py \
--cell <cell_folder_name> \
> logs/<cell>_eval.log 2>&1The driver auto-dispatches to evaluate_submission_task for residual-stream cells and evaluate_ioi_submission_task for IOI cells, applies the right max_new_tokens per task, and patches the LMPipeline.load position_ids fallback for transformers 5.x.
# Copy the cell's results JSON into the curated results folder
cp submissions/plot/<cell>/*results.json mib_submission/results/<cell>.json
# Bump the cell row in CELLS.md (☐ → ☑) by hand
# Regenerate RESULTS.md from the curated JSONs
.venv-mib/bin/python -m mib_submission.results._aggregate \
--write mib_submission/results/RESULTS.md
# Append a session entry to JOURNAL.md if the run revealed something
# methodologically interestingplot-mib-submissions/
├── README.md # this file
├── CLAUDE.md # engineer-oriented project context
├── PLOT_SHORTCOMINGS.md # 15-section catalog of method limits
├── HYPOTHESES.md # experimental hypotheses + outcomes
├── JOURNAL.md # methodological narrative, append-only
├── mib_submission/
│ ├── pipeline.py # ExperimentBundle + setup_residual / attention head
│ ├── serialize.py, site_keys.py # MIB Featurizer triplet I/O
│ ├── plot/
│ │ ├── pipeline.py # select_sites_via_plot (Stage A + B + calibration)
│ │ ├── transport.py # Sinkhorn solvers
│ │ ├── features.py # signatures + abstract table
│ │ ├── _alphabets.py # LabelAlphabet (letter / multi-string / causal-model)
│ │ ├── configs.py # per-task PlotConfig + RunConfig presets
│ │ ├── run.py # CLI driver
│ │ └── bucketed.py # parked variant (see §1)
│ ├── ioi/
│ │ ├── bootstrap.py # IOI linear-params bootstrap
│ │ ├── submission.py # cell_dir, write_ioi_submission
│ │ └── _patches.py, _runner.py # transformers/pyvene compat patches
│ └── results/
│ ├── _aggregate.py # generates RESULTS.md from JSONs
│ ├── RESULTS.md # AUTO-GENERATED, don't edit
│ ├── CELLS.md # 26-cell status tracker
│ └── *.json # archived eval outputs (one per cell)
├── scripts/
│ ├── eval_cell.py # patched MIB eval driver
│ └── overnight*.sh # overnight launcher patterns
├── tests/ # 126 tests, pytest
└── reference/source_plot/ # read-only snapshot of paper code
# (binary-addition Stage A+B branch)
The MIB/, submissions/, logs/, models/, and .venv-mib/ directories are gitignored — created on first run. submissions/plot/ is where verified submissions live; submissions/_plot_backups/ (also gitignored) preserves pre-modification baselines for cells we re-ran during diagnostic sessions.
- New cell of an existing task — no code changes needed. Run
mib_submission.plot.runwith the new--task --model --variabletriple. Defaults come from the per-taskPlotConfigpreset inmib_submission/plot/configs.py. - New task — add a per-task preset function in
configs.py(aPlotConfigreturning the OT row schema, alphabet, and signature dataset) plus a branch indefault_config(). See_mcqa_v4_choices,_arc_v4_symbols,_ravel_v3_attributes,_arithmetic_v2_carry_children,_ioi_v3_splitsfor working patterns. Don't editrun.pyconstants — use the CLI. - New cell type (e.g. attention-head instead of residual-stream) — needs a
setup_<type>_experimentinmib_submission/pipeline.pyand amain_<type>branch inrun.py. Seemib_submission/ioi/for the attention-head precedent.
Developed on an 8 GB RTX 4060 Laptop. 12 of 26 cells fit at this scale. The other 14 (4 Qwen/Gemma IOI cells via pyvene's IntervenableModel + 10 Llama-8B cells) need ≥16 GB VRAM — cloud GPU work, deferred.
- Cell 8's 0.999 leaderboard-relative win comes from an interaction with the eval harness's identity-fallback at unselected positions, not from PLOT-trained DAS rotations. Methodologically valid per the harness's scoring rules. Full mechanism documented in
PLOT_SHORTCOMINGS.md§15. - 5 of 12 reachable cells have real structural gaps to DAS baseline. Each is diagnosed in
PLOT_SHORTCOMINGS.md(§2 cell 2, §13 cells 13/14, §14 cells 21/23). Closing them is out of scope for this submission. - Cell 11 arithmetic ds=1024 scale-up regressed. Reverted to the smoke result; the failed scale-up's submission is preserved at
submissions/_plot_backups/arithmetic_*_pre_c6_*for reference.