RFC-0001: Mechanistic Fact Editing Commands (crown, edit, memit) by mikeumus · Pull Request #2 · Divinci-AI/larql

mikeumus · 2026-04-17T23:18:17Z

Summary

Proposes three new LarQL subcommands that turn LarQL into the first mechanistic-interpretability-native fact-editing CLI:

larql crown — per-edit crown-layer discovery via module ablation (Chapter 17 Phase 125c)
larql edit — single-fact rank-1 edit with auto-scale calibration (Chapter 20 Phase 140 + Chapter 18 Phase 130)
larql memit — batch fact editing via joint least-squares, grouped by crown (Chapter 21 Phase 141c + Chapter 23 Phase 143b)

Plus a new patch file format (~55KB per Gemma 4 4B single edit) and a non-destructive larql apply-patch command.

Why

Nine chapters of experiments on Gemma 4 4B and 26B in April 2026 established the mechanism and proved editing works:

Entity zone at L20-29 (~67-97% depth) — Chapter 15
L27 MLP is the load-bearing country→capital writer on Gemma 4 4B — Chapter 17 (ablating it breaks "France → Paris" to "France → France")
Single rank-1 edit: 11/11 specificity at 0.9% relative weight perturbation — Chapter 20
MEMIT handles 2-3 concurrent edits; multi-layer + per-edit crown extends to 3/5 — Chapters 21-23
No single-neuron "Paris" (polysemantic superposition) forces the rank-1 approach — Chapter 19

Building this in Rust on top of the existing larql-inference forward-pass + capture infrastructure beats shipping it as a Python library because:

GGUF/ollama compatibility — edit quantized models, not just HF safetensors
Static + dynamic triangulation — LarQL's existing vindex analysis can predict editability cheaply
Self-calibrating — auto-scale + per-edit crown discovery don't require ML expertise to operate

What's in this PR

Just the design doc (docs/rfcs/0001-mechanistic-fact-editing.md). Phased implementation plan:

Phase A: larql crown — smallest new code, builds on existing capture hooks
Phase B: larql edit + patch format + apply-patch
Phase C: larql memit joint multi-fact
Phase D: larql-python binding extensions

Test plan

Design review + merge this RFC
Implementation PRs follow phased plan

References

Research chapters: Divinci-AI/server notebooks/CHAPTER_15_GHOST_TRANSPLANT.md through CHAPTER_23_PER_EDIT_CROWN.md
Target model: google/gemma-4-e2b-it (matches our published LarQL vindex at Divinci-AI/gemma-4-4b-e2b-vindex)
Colab artifacts: phase125c, phase140c, phase141c, phase143a/b JSON files reproducible from the Chapter notebooks

🤖 Generated with Claude Code

Proposes extending LarQL from weight-analysis into analysis+editing via three new subcommands that implement ROME/MEMIT-family algorithms on top of the existing larql-inference forward pass and capture hooks. Based on 9 chapters of experimentation on Gemma 4 (4B and 26B) documented in Divinci-AI/server notebooks/CHAPTER_15 through CHAPTER_23: - larql crown: per-edit crown-layer discovery via module ablation - larql edit: single-fact rank-1 edit with auto-scale calibration - larql memit: batch fact editing via joint least-squares, grouped by crown Also defines a patch file format (~55KB per Gemma 4 4B single edit) and a non-destructive larql apply-patch command. Phased 4-step rollout plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Implements Phase A of RFC-0001 (#2): per-layer MLP ablation scan to find the layer whose last-position MLP output is load-bearing for a given (prompt, expected-token) pair. Changes: - crates/larql-inference/src/ffn/ablating.rs — new LastPositionAblatingFfn that wraps any FfnBackend and zeroes its output at the last-token row for one target layer. Thin wrapper, no math changes. - crates/larql-cli/src/commands/extraction/crown_cmd.rs — new `larql crown` subcommand. Tokenises the prompt, runs a baseline forward pass, then iterates layers in [start..=end] running predict_with_ffn against the ablating backend, reports per-layer Δ in expected-token probability and picks the layer whose ablation causes the top prediction to flip with the largest suppression magnitude. Methodology matches Phase 125c of Divinci-AI/server notebooks/CHAPTER_17_CORONATION.md — on Gemma 4 4B, ablating L27 MLP on "Capital of France? A:" makes the top prediction flip from " Paris" to "France" (the country token). The command outputs JSON (optional --json) so downstream commands (edit, memit) can consume the crown_layer field. Compile-checked with `cargo check --package larql-cli`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mikeumus · 2026-04-17T23:28:07Z

RFC follow-up roadmap

Phase A (larql crown) has been opened as #3. Tracking Phases B/C/D here since the repo has issues disabled:

Phase B — `larql edit` + patch format + `apply-patch`

Blocked by: #3

CLI: larql edit <model> --src-prompt \"...\" --old \" Paris\" --new \" Tokyo\" --auto-scale --output france_to_tokyo.patch
Algorithm: Chapter 20 Phase 140 (rank-1 outer product) + Chapter 18 Phase 130 (binary-search auto-scale)
Patch file format: <name>.meta.json + <name>.d.bin + <name>.k.bin (~55KB Gemma 4 4B)
Validated result: France→Tokyo flip with 11/11 other-capitals preserved at 0.9% relative weight perturbation
Acceptance: round-trip edit → apply-patch → predict yields the new token; 5-capital specificity spot-check

Phase C — `larql memit` + specificity validation

Blocked by: Phase B (reuses patch format)

CLI: larql memit <model> --edits edits.json --output patches/ --validate-specificity 50
Algorithm: existing run_memit at larql-inference/forward/memit.rs + Chapter 23 Phase 143b per-edit crown grouping
Specificity validation: probe N held-out facts, report preserve-rate
Known ceiling: ~3/5 target flips at 2× scale with correlated keys (Chapter 22)

Phase D — `larql-python` bindings for Colab-style scripting

Independent (depends on A/B/C only conceptually)

Expose crown, edit, memit as Python functions via PyO3
Python experiments from Divinci-AI/server Chapters 15-23 become one-liner Rust invocations
Unblocks publication-grade reproducibility — researchers can use LarQL from Jupyter without Rust toolchain

Each of B/C/D will be its own focused PR, keeping scope reviewable.

… RFC-0001) Implements Phase B of RFC-0001 (#2): single-fact rank-1 editor with portable patch file format. Builds on Phase A's LastPositionAblatingFfn (#3) and adds the symmetric LastPositionInjectingFfn for scale search. ### New library module: `larql-inference/src/edit.rs` - `EditPatch` struct (serializable via serde) - `compute_rank1(k, d, scale, layer, provenance) -> EditPatch` - `write_patch(path, &patch)` / `read_patch(path) -> EditPatch` with a simple binary format: LQPATCH magic + JSON meta + little-endian f32 vectors for d and k_norm. ~55 KB for Gemma 4 4B. - `apply_patch(&mut ModelWeights, &EditPatch)`: installs the rank-1 outer product into `down_proj.weight` in place, handling both `[hidden, intermediate]` and `[intermediate, hidden]` layouts. ### New FFN wrapper: `larql-inference/src/ffn/injecting.rs` - `LastPositionInjectingFfn` — adds a fixed delta vector to the inner backend's last-row output at one target layer. Symmetric to the ablating wrapper from PR #3. Used for auto-scale search. ### New CLI commands - `larql edit <model> --src "..." --tgt "..." --new-token " Tokyo" --output f2t.lqpatch` Runs Phase A crown discovery (or accepts `--layer`), captures k at the crown layer for both prompts, computes d = W_down @ (k_tgt - k_src), linearly searches [0.5, 1, 1.5, 2, 2.5, 3, 4] for the minimum scale that flips the source's top-1 to --new-token, emits the patch. - `larql apply-patch <model> --patch f2t.lqpatch --prompt "..."` Non-destructively installs one or more patches into the loaded weights, optionally runs a test prediction. Supports `--reverse` to subtract a patch (verifies reversibility). ### Supporting change - Added `InferenceModel::weights_mut()` accessor so apply-patch can mutate the in-memory weight map without reloading. Methodology validated in Python across Divinci-AI/server notebooks/CHAPTER_20_HONEY.md (Phase 140c: France→Tokyo with 11/11 specificity at 0.9% weight perturbation) and CHAPTER_18_THE_EDIT.md (Phase 130 scale search). The Rust port preserves the same math. Compile-checked with `cargo check --package larql-cli`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wraps the existing covariance-MEMIT solver (larql_inference::forward::memit:: run_memit) with a CLI, an edits.json file format, and automatic crown-layer discovery for each edit. Groups edits by crown layer, invokes the joint least-squares solve, emits one dense `.lqpatch` per affected layer plus a manifest.json. Phase C of RFC-0001 (#2), stacked on Phase B (#4). ### Extended patch file format (still backward compatible) - Bumped patch version 1 → 2 with a `kind` field (defaults to "rank_one") - New `kind = "dense"` variant carries a flat row-major ΔW matrix, needed because MEMIT's covariance-projected solve isn't natively a rank-1 outer product. Larger on disk (~72 MB per Gemma 4 4B layer) but semantically exact — no SVD approximation step. - `write_patch`, `read_patch`, `apply_patch` all dispatch on kind. Phase B rank-1 patches continue to round-trip unchanged. - New `compute_dense()` helper builds a Dense patch from an Array2<f32>. ### New CLI: `larql memit` - Reads edits.json (list of {label, src, new_token, layer?} records). - For each edit: tokenises src, resolves target_token_id, resolves crown layer (explicit or auto-scan). - Calls `run_memit` with Vec<MemitFact>, receives one `MemitResult` per affected layer. - Serialises each layer's ΔW as a Dense patch into the output directory, writes a manifest.json enumerating them. - Prints the apply-patch command to install the batch. ### Usage cat > edits.json <<EOF [ {"label":"france-to-tokyo","src":"Capital of France? A:", "new_token":" Tokyo","layer":27}, {"label":"germany-to-rome","src":"Capital of Germany? A:", "new_token":" Rome","layer":27} ] EOF larql memit /path/to/gemma4 --edits edits.json --output patches/ larql apply-patch /path/to/gemma4 \\ -p patches/memit_L27.lqpatch \\ --prompt "Capital of France? A:" ### Known ceiling Chapter 22 established that single-layer MEMIT with correlated keys (~60% cosine) lands ~3/5 concurrent targets. For 5+ correlated edits, users can now distribute across multiple crown layers via `layer` overrides in edits.json — MEMIT runs once per layer group. Compile-checked with `cargo check --package larql-cli`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Implements Phase A of RFC-0001 (#2): per-layer MLP ablation scan to find the layer whose last-position MLP output is load-bearing for a given (prompt, expected-token) pair. Changes: - crates/larql-inference/src/ffn/ablating.rs — new LastPositionAblatingFfn that wraps any FfnBackend and zeroes its output at the last-token row for one target layer. Thin wrapper, no math changes. - crates/larql-cli/src/commands/extraction/crown_cmd.rs — new `larql crown` subcommand. Tokenises the prompt, runs a baseline forward pass, then iterates layers in [start..=end] running predict_with_ffn against the ablating backend, reports per-layer Δ in expected-token probability and picks the layer whose ablation causes the top prediction to flip with the largest suppression magnitude. Methodology matches Phase 125c of Divinci-AI/server notebooks/CHAPTER_17_CORONATION.md — on Gemma 4 4B, ablating L27 MLP on "Capital of France? A:" makes the top prediction flip from " Paris" to "France" (the country token). The command outputs JSON (optional --json) so downstream commands (edit, memit) can consume the crown_layer field. Compile-checked with `cargo check --package larql-cli`. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… RFC-0001) Implements Phase B of RFC-0001 (#2): single-fact rank-1 editor with portable patch file format. Builds on Phase A's LastPositionAblatingFfn (#3) and adds the symmetric LastPositionInjectingFfn for scale search. ### New library module: `larql-inference/src/edit.rs` - `EditPatch` struct (serializable via serde) - `compute_rank1(k, d, scale, layer, provenance) -> EditPatch` - `write_patch(path, &patch)` / `read_patch(path) -> EditPatch` with a simple binary format: LQPATCH magic + JSON meta + little-endian f32 vectors for d and k_norm. ~55 KB for Gemma 4 4B. - `apply_patch(&mut ModelWeights, &EditPatch)`: installs the rank-1 outer product into `down_proj.weight` in place, handling both `[hidden, intermediate]` and `[intermediate, hidden]` layouts. ### New FFN wrapper: `larql-inference/src/ffn/injecting.rs` - `LastPositionInjectingFfn` — adds a fixed delta vector to the inner backend's last-row output at one target layer. Symmetric to the ablating wrapper from PR #3. Used for auto-scale search. ### New CLI commands - `larql edit <model> --src "..." --tgt "..." --new-token " Tokyo" --output f2t.lqpatch` Runs Phase A crown discovery (or accepts `--layer`), captures k at the crown layer for both prompts, computes d = W_down @ (k_tgt - k_src), linearly searches [0.5, 1, 1.5, 2, 2.5, 3, 4] for the minimum scale that flips the source's top-1 to --new-token, emits the patch. - `larql apply-patch <model> --patch f2t.lqpatch --prompt "..."` Non-destructively installs one or more patches into the loaded weights, optionally runs a test prediction. Supports `--reverse` to subtract a patch (verifies reversibility). ### Supporting change - Added `InferenceModel::weights_mut()` accessor so apply-patch can mutate the in-memory weight map without reloading. Methodology validated in Python across Divinci-AI/server notebooks/CHAPTER_20_HONEY.md (Phase 140c: France→Tokyo with 11/11 specificity at 0.9% weight perturbation) and CHAPTER_18_THE_EDIT.md (Phase 130 scale search). The Rust port preserves the same math. Compile-checked with `cargo check --package larql-cli`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… RFC-0001) (#7) Implements Phase B of RFC-0001 (#2): single-fact rank-1 editor with portable patch file format. Builds on Phase A's LastPositionAblatingFfn (#3) and adds the symmetric LastPositionInjectingFfn for scale search. ### New library module: `larql-inference/src/edit.rs` - `EditPatch` struct (serializable via serde) - `compute_rank1(k, d, scale, layer, provenance) -> EditPatch` - `write_patch(path, &patch)` / `read_patch(path) -> EditPatch` with a simple binary format: LQPATCH magic + JSON meta + little-endian f32 vectors for d and k_norm. ~55 KB for Gemma 4 4B. - `apply_patch(&mut ModelWeights, &EditPatch)`: installs the rank-1 outer product into `down_proj.weight` in place, handling both `[hidden, intermediate]` and `[intermediate, hidden]` layouts. ### New FFN wrapper: `larql-inference/src/ffn/injecting.rs` - `LastPositionInjectingFfn` — adds a fixed delta vector to the inner backend's last-row output at one target layer. Symmetric to the ablating wrapper from PR #3. Used for auto-scale search. ### New CLI commands - `larql edit <model> --src "..." --tgt "..." --new-token " Tokyo" --output f2t.lqpatch` Runs Phase A crown discovery (or accepts `--layer`), captures k at the crown layer for both prompts, computes d = W_down @ (k_tgt - k_src), linearly searches [0.5, 1, 1.5, 2, 2.5, 3, 4] for the minimum scale that flips the source's top-1 to --new-token, emits the patch. - `larql apply-patch <model> --patch f2t.lqpatch --prompt "..."` Non-destructively installs one or more patches into the loaded weights, optionally runs a test prediction. Supports `--reverse` to subtract a patch (verifies reversibility). ### Supporting change - Added `InferenceModel::weights_mut()` accessor so apply-patch can mutate the in-memory weight map without reloading. Methodology validated in Python across Divinci-AI/server notebooks/CHAPTER_20_HONEY.md (Phase 140c: France→Tokyo with 11/11 specificity at 0.9% weight perturbation) and CHAPTER_18_THE_EDIT.md (Phase 130 scale search). The Rust port preserves the same math. Compile-checked with `cargo check --package larql-cli`. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wraps the existing covariance-MEMIT solver (larql_inference::forward::memit:: run_memit) with a CLI, an edits.json file format, and automatic crown-layer discovery for each edit. Groups edits by crown layer, invokes the joint least-squares solve, emits one dense `.lqpatch` per affected layer plus a manifest.json. Phase C of RFC-0001 (#2), stacked on Phase B (#4). ### Extended patch file format (still backward compatible) - Bumped patch version 1 → 2 with a `kind` field (defaults to "rank_one") - New `kind = "dense"` variant carries a flat row-major ΔW matrix, needed because MEMIT's covariance-projected solve isn't natively a rank-1 outer product. Larger on disk (~72 MB per Gemma 4 4B layer) but semantically exact — no SVD approximation step. - `write_patch`, `read_patch`, `apply_patch` all dispatch on kind. Phase B rank-1 patches continue to round-trip unchanged. - New `compute_dense()` helper builds a Dense patch from an Array2<f32>. ### New CLI: `larql memit` - Reads edits.json (list of {label, src, new_token, layer?} records). - For each edit: tokenises src, resolves target_token_id, resolves crown layer (explicit or auto-scan). - Calls `run_memit` with Vec<MemitFact>, receives one `MemitResult` per affected layer. - Serialises each layer's ΔW as a Dense patch into the output directory, writes a manifest.json enumerating them. - Prints the apply-patch command to install the batch. ### Usage cat > edits.json <<EOF [ {"label":"france-to-tokyo","src":"Capital of France? A:", "new_token":" Tokyo","layer":27}, {"label":"germany-to-rome","src":"Capital of Germany? A:", "new_token":" Rome","layer":27} ] EOF larql memit /path/to/gemma4 --edits edits.json --output patches/ larql apply-patch /path/to/gemma4 \\ -p patches/memit_L27.lqpatch \\ --prompt "Capital of France? A:" ### Known ceiling Chapter 22 established that single-layer MEMIT with correlated keys (~60% cosine) lands ~3/5 concurrent targets. For 5+ correlated edits, users can now distribute across multiple crown layers via `layer` overrides in edits.json — MEMIT runs once per layer group. Compile-checked with `cargo check --package larql-cli`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wraps the existing covariance-MEMIT solver (larql_inference::forward::memit:: run_memit) with a CLI, an edits.json file format, and automatic crown-layer discovery for each edit. Groups edits by crown layer, invokes the joint least-squares solve, emits one dense `.lqpatch` per affected layer plus a manifest.json. Phase C of RFC-0001 (#2), stacked on Phase B (#4). ### Extended patch file format (still backward compatible) - Bumped patch version 1 → 2 with a `kind` field (defaults to "rank_one") - New `kind = "dense"` variant carries a flat row-major ΔW matrix, needed because MEMIT's covariance-projected solve isn't natively a rank-1 outer product. Larger on disk (~72 MB per Gemma 4 4B layer) but semantically exact — no SVD approximation step. - `write_patch`, `read_patch`, `apply_patch` all dispatch on kind. Phase B rank-1 patches continue to round-trip unchanged. - New `compute_dense()` helper builds a Dense patch from an Array2<f32>. ### New CLI: `larql memit` - Reads edits.json (list of {label, src, new_token, layer?} records). - For each edit: tokenises src, resolves target_token_id, resolves crown layer (explicit or auto-scan). - Calls `run_memit` with Vec<MemitFact>, receives one `MemitResult` per affected layer. - Serialises each layer's ΔW as a Dense patch into the output directory, writes a manifest.json enumerating them. - Prints the apply-patch command to install the batch. ### Usage cat > edits.json <<EOF [ {"label":"france-to-tokyo","src":"Capital of France? A:", "new_token":" Tokyo","layer":27}, {"label":"germany-to-rome","src":"Capital of Germany? A:", "new_token":" Rome","layer":27} ] EOF larql memit /path/to/gemma4 --edits edits.json --output patches/ larql apply-patch /path/to/gemma4 \\ -p patches/memit_L27.lqpatch \\ --prompt "Capital of France? A:" ### Known ceiling Chapter 22 established that single-layer MEMIT with correlated keys (~60% cosine) lands ~3/5 concurrent targets. For 5+ correlated edits, users can now distribute across multiple crown layers via `layer` overrides in edits.json — MEMIT runs once per layer group. Compile-checked with `cargo check --package larql-cli`. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mikeumus mentioned this pull request Apr 17, 2026

feat(cli): larql crown — crown-layer discovery (Phase A of RFC-0001) #3

Merged

mikeumus mentioned this pull request Apr 17, 2026

feat(cli): larql edit + apply-patch — rank-1 fact editing (Phase B of RFC-0001) #4

Closed

This was referenced Apr 17, 2026

feat(cli): larql memit — batch fact editing (Phase C of RFC-0001) #5

Open

feat(python): PyO3 bindings for crown/edit/apply_patch/memit (Phase D of RFC-0001) #6

Open

mikeumus merged commit 074d512 into main Apr 17, 2026

mikeumus deleted the feat/mechanistic-edit-rfc branch April 17, 2026 23:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC-0001: Mechanistic Fact Editing Commands (crown, edit, memit)#2

RFC-0001: Mechanistic Fact Editing Commands (crown, edit, memit)#2
mikeumus merged 1 commit intomainfrom
feat/mechanistic-edit-rfc

mikeumus commented Apr 17, 2026

Uh oh!

mikeumus commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mikeumus commented Apr 17, 2026

Summary

Why

What's in this PR

Test plan

References

Uh oh!

mikeumus commented Apr 17, 2026

RFC follow-up roadmap

Phase B — larql edit + patch format + apply-patch

Phase C — larql memit + specificity validation

Phase D — larql-python bindings for Colab-style scripting

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Phase B — `larql edit` + patch format + `apply-patch`

Phase C — `larql memit` + specificity validation

Phase D — `larql-python` bindings for Colab-style scripting