RFC-0001: Mechanistic Fact Editing Commands (crown, edit, memit)#2
Merged
RFC-0001: Mechanistic Fact Editing Commands (crown, edit, memit)#2
Conversation
Proposes extending LarQL from weight-analysis into analysis+editing via three new subcommands that implement ROME/MEMIT-family algorithms on top of the existing larql-inference forward pass and capture hooks. Based on 9 chapters of experimentation on Gemma 4 (4B and 26B) documented in Divinci-AI/server notebooks/CHAPTER_15 through CHAPTER_23: - larql crown: per-edit crown-layer discovery via module ablation - larql edit: single-fact rank-1 edit with auto-scale calibration - larql memit: batch fact editing via joint least-squares, grouped by crown Also defines a patch file format (~55KB per Gemma 4 4B single edit) and a non-destructive larql apply-patch command. Phased 4-step rollout plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mikeumus
added a commit
that referenced
this pull request
Apr 17, 2026
Implements Phase A of RFC-0001 (#2): per-layer MLP ablation scan to find the layer whose last-position MLP output is load-bearing for a given (prompt, expected-token) pair. Changes: - crates/larql-inference/src/ffn/ablating.rs — new LastPositionAblatingFfn that wraps any FfnBackend and zeroes its output at the last-token row for one target layer. Thin wrapper, no math changes. - crates/larql-cli/src/commands/extraction/crown_cmd.rs — new `larql crown` subcommand. Tokenises the prompt, runs a baseline forward pass, then iterates layers in [start..=end] running predict_with_ffn against the ablating backend, reports per-layer Δ in expected-token probability and picks the layer whose ablation causes the top prediction to flip with the largest suppression magnitude. Methodology matches Phase 125c of Divinci-AI/server notebooks/CHAPTER_17_CORONATION.md — on Gemma 4 4B, ablating L27 MLP on "Capital of France? A:" makes the top prediction flip from " Paris" to "France" (the country token). The command outputs JSON (optional --json) so downstream commands (edit, memit) can consume the crown_layer field. Compile-checked with `cargo check --package larql-cli`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Author
RFC follow-up roadmapPhase A ( Phase B —
|
mikeumus
added a commit
that referenced
this pull request
Apr 17, 2026
… RFC-0001) Implements Phase B of RFC-0001 (#2): single-fact rank-1 editor with portable patch file format. Builds on Phase A's LastPositionAblatingFfn (#3) and adds the symmetric LastPositionInjectingFfn for scale search. ### New library module: `larql-inference/src/edit.rs` - `EditPatch` struct (serializable via serde) - `compute_rank1(k, d, scale, layer, provenance) -> EditPatch` - `write_patch(path, &patch)` / `read_patch(path) -> EditPatch` with a simple binary format: LQPATCH magic + JSON meta + little-endian f32 vectors for d and k_norm. ~55 KB for Gemma 4 4B. - `apply_patch(&mut ModelWeights, &EditPatch)`: installs the rank-1 outer product into `down_proj.weight` in place, handling both `[hidden, intermediate]` and `[intermediate, hidden]` layouts. ### New FFN wrapper: `larql-inference/src/ffn/injecting.rs` - `LastPositionInjectingFfn` — adds a fixed delta vector to the inner backend's last-row output at one target layer. Symmetric to the ablating wrapper from PR #3. Used for auto-scale search. ### New CLI commands - `larql edit <model> --src "..." --tgt "..." --new-token " Tokyo" --output f2t.lqpatch` Runs Phase A crown discovery (or accepts `--layer`), captures k at the crown layer for both prompts, computes d = W_down @ (k_tgt - k_src), linearly searches [0.5, 1, 1.5, 2, 2.5, 3, 4] for the minimum scale that flips the source's top-1 to --new-token, emits the patch. - `larql apply-patch <model> --patch f2t.lqpatch --prompt "..."` Non-destructively installs one or more patches into the loaded weights, optionally runs a test prediction. Supports `--reverse` to subtract a patch (verifies reversibility). ### Supporting change - Added `InferenceModel::weights_mut()` accessor so apply-patch can mutate the in-memory weight map without reloading. Methodology validated in Python across Divinci-AI/server notebooks/CHAPTER_20_HONEY.md (Phase 140c: France→Tokyo with 11/11 specificity at 0.9% weight perturbation) and CHAPTER_18_THE_EDIT.md (Phase 130 scale search). The Rust port preserves the same math. Compile-checked with `cargo check --package larql-cli`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mikeumus
added a commit
that referenced
this pull request
Apr 17, 2026
Wraps the existing covariance-MEMIT solver (larql_inference::forward::memit:: run_memit) with a CLI, an edits.json file format, and automatic crown-layer discovery for each edit. Groups edits by crown layer, invokes the joint least-squares solve, emits one dense `.lqpatch` per affected layer plus a manifest.json. Phase C of RFC-0001 (#2), stacked on Phase B (#4). ### Extended patch file format (still backward compatible) - Bumped patch version 1 → 2 with a `kind` field (defaults to "rank_one") - New `kind = "dense"` variant carries a flat row-major ΔW matrix, needed because MEMIT's covariance-projected solve isn't natively a rank-1 outer product. Larger on disk (~72 MB per Gemma 4 4B layer) but semantically exact — no SVD approximation step. - `write_patch`, `read_patch`, `apply_patch` all dispatch on kind. Phase B rank-1 patches continue to round-trip unchanged. - New `compute_dense()` helper builds a Dense patch from an Array2<f32>. ### New CLI: `larql memit` - Reads edits.json (list of {label, src, new_token, layer?} records). - For each edit: tokenises src, resolves target_token_id, resolves crown layer (explicit or auto-scan). - Calls `run_memit` with Vec<MemitFact>, receives one `MemitResult` per affected layer. - Serialises each layer's ΔW as a Dense patch into the output directory, writes a manifest.json enumerating them. - Prints the apply-patch command to install the batch. ### Usage cat > edits.json <<EOF [ {"label":"france-to-tokyo","src":"Capital of France? A:", "new_token":" Tokyo","layer":27}, {"label":"germany-to-rome","src":"Capital of Germany? A:", "new_token":" Rome","layer":27} ] EOF larql memit /path/to/gemma4 --edits edits.json --output patches/ larql apply-patch /path/to/gemma4 \\ -p patches/memit_L27.lqpatch \\ --prompt "Capital of France? A:" ### Known ceiling Chapter 22 established that single-layer MEMIT with correlated keys (~60% cosine) lands ~3/5 concurrent targets. For 5+ correlated edits, users can now distribute across multiple crown layers via `layer` overrides in edits.json — MEMIT runs once per layer group. Compile-checked with `cargo check --package larql-cli`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 17, 2026
mikeumus
added a commit
that referenced
this pull request
Apr 18, 2026
Implements Phase A of RFC-0001 (#2): per-layer MLP ablation scan to find the layer whose last-position MLP output is load-bearing for a given (prompt, expected-token) pair. Changes: - crates/larql-inference/src/ffn/ablating.rs — new LastPositionAblatingFfn that wraps any FfnBackend and zeroes its output at the last-token row for one target layer. Thin wrapper, no math changes. - crates/larql-cli/src/commands/extraction/crown_cmd.rs — new `larql crown` subcommand. Tokenises the prompt, runs a baseline forward pass, then iterates layers in [start..=end] running predict_with_ffn against the ablating backend, reports per-layer Δ in expected-token probability and picks the layer whose ablation causes the top prediction to flip with the largest suppression magnitude. Methodology matches Phase 125c of Divinci-AI/server notebooks/CHAPTER_17_CORONATION.md — on Gemma 4 4B, ablating L27 MLP on "Capital of France? A:" makes the top prediction flip from " Paris" to "France" (the country token). The command outputs JSON (optional --json) so downstream commands (edit, memit) can consume the crown_layer field. Compile-checked with `cargo check --package larql-cli`. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mikeumus
added a commit
that referenced
this pull request
Apr 18, 2026
… RFC-0001) Implements Phase B of RFC-0001 (#2): single-fact rank-1 editor with portable patch file format. Builds on Phase A's LastPositionAblatingFfn (#3) and adds the symmetric LastPositionInjectingFfn for scale search. ### New library module: `larql-inference/src/edit.rs` - `EditPatch` struct (serializable via serde) - `compute_rank1(k, d, scale, layer, provenance) -> EditPatch` - `write_patch(path, &patch)` / `read_patch(path) -> EditPatch` with a simple binary format: LQPATCH magic + JSON meta + little-endian f32 vectors for d and k_norm. ~55 KB for Gemma 4 4B. - `apply_patch(&mut ModelWeights, &EditPatch)`: installs the rank-1 outer product into `down_proj.weight` in place, handling both `[hidden, intermediate]` and `[intermediate, hidden]` layouts. ### New FFN wrapper: `larql-inference/src/ffn/injecting.rs` - `LastPositionInjectingFfn` — adds a fixed delta vector to the inner backend's last-row output at one target layer. Symmetric to the ablating wrapper from PR #3. Used for auto-scale search. ### New CLI commands - `larql edit <model> --src "..." --tgt "..." --new-token " Tokyo" --output f2t.lqpatch` Runs Phase A crown discovery (or accepts `--layer`), captures k at the crown layer for both prompts, computes d = W_down @ (k_tgt - k_src), linearly searches [0.5, 1, 1.5, 2, 2.5, 3, 4] for the minimum scale that flips the source's top-1 to --new-token, emits the patch. - `larql apply-patch <model> --patch f2t.lqpatch --prompt "..."` Non-destructively installs one or more patches into the loaded weights, optionally runs a test prediction. Supports `--reverse` to subtract a patch (verifies reversibility). ### Supporting change - Added `InferenceModel::weights_mut()` accessor so apply-patch can mutate the in-memory weight map without reloading. Methodology validated in Python across Divinci-AI/server notebooks/CHAPTER_20_HONEY.md (Phase 140c: France→Tokyo with 11/11 specificity at 0.9% weight perturbation) and CHAPTER_18_THE_EDIT.md (Phase 130 scale search). The Rust port preserves the same math. Compile-checked with `cargo check --package larql-cli`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mikeumus
added a commit
that referenced
this pull request
Apr 18, 2026
… RFC-0001) (#7) Implements Phase B of RFC-0001 (#2): single-fact rank-1 editor with portable patch file format. Builds on Phase A's LastPositionAblatingFfn (#3) and adds the symmetric LastPositionInjectingFfn for scale search. ### New library module: `larql-inference/src/edit.rs` - `EditPatch` struct (serializable via serde) - `compute_rank1(k, d, scale, layer, provenance) -> EditPatch` - `write_patch(path, &patch)` / `read_patch(path) -> EditPatch` with a simple binary format: LQPATCH magic + JSON meta + little-endian f32 vectors for d and k_norm. ~55 KB for Gemma 4 4B. - `apply_patch(&mut ModelWeights, &EditPatch)`: installs the rank-1 outer product into `down_proj.weight` in place, handling both `[hidden, intermediate]` and `[intermediate, hidden]` layouts. ### New FFN wrapper: `larql-inference/src/ffn/injecting.rs` - `LastPositionInjectingFfn` — adds a fixed delta vector to the inner backend's last-row output at one target layer. Symmetric to the ablating wrapper from PR #3. Used for auto-scale search. ### New CLI commands - `larql edit <model> --src "..." --tgt "..." --new-token " Tokyo" --output f2t.lqpatch` Runs Phase A crown discovery (or accepts `--layer`), captures k at the crown layer for both prompts, computes d = W_down @ (k_tgt - k_src), linearly searches [0.5, 1, 1.5, 2, 2.5, 3, 4] for the minimum scale that flips the source's top-1 to --new-token, emits the patch. - `larql apply-patch <model> --patch f2t.lqpatch --prompt "..."` Non-destructively installs one or more patches into the loaded weights, optionally runs a test prediction. Supports `--reverse` to subtract a patch (verifies reversibility). ### Supporting change - Added `InferenceModel::weights_mut()` accessor so apply-patch can mutate the in-memory weight map without reloading. Methodology validated in Python across Divinci-AI/server notebooks/CHAPTER_20_HONEY.md (Phase 140c: France→Tokyo with 11/11 specificity at 0.9% weight perturbation) and CHAPTER_18_THE_EDIT.md (Phase 130 scale search). The Rust port preserves the same math. Compile-checked with `cargo check --package larql-cli`. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mikeumus
added a commit
that referenced
this pull request
Apr 18, 2026
Wraps the existing covariance-MEMIT solver (larql_inference::forward::memit:: run_memit) with a CLI, an edits.json file format, and automatic crown-layer discovery for each edit. Groups edits by crown layer, invokes the joint least-squares solve, emits one dense `.lqpatch` per affected layer plus a manifest.json. Phase C of RFC-0001 (#2), stacked on Phase B (#4). ### Extended patch file format (still backward compatible) - Bumped patch version 1 → 2 with a `kind` field (defaults to "rank_one") - New `kind = "dense"` variant carries a flat row-major ΔW matrix, needed because MEMIT's covariance-projected solve isn't natively a rank-1 outer product. Larger on disk (~72 MB per Gemma 4 4B layer) but semantically exact — no SVD approximation step. - `write_patch`, `read_patch`, `apply_patch` all dispatch on kind. Phase B rank-1 patches continue to round-trip unchanged. - New `compute_dense()` helper builds a Dense patch from an Array2<f32>. ### New CLI: `larql memit` - Reads edits.json (list of {label, src, new_token, layer?} records). - For each edit: tokenises src, resolves target_token_id, resolves crown layer (explicit or auto-scan). - Calls `run_memit` with Vec<MemitFact>, receives one `MemitResult` per affected layer. - Serialises each layer's ΔW as a Dense patch into the output directory, writes a manifest.json enumerating them. - Prints the apply-patch command to install the batch. ### Usage cat > edits.json <<EOF [ {"label":"france-to-tokyo","src":"Capital of France? A:", "new_token":" Tokyo","layer":27}, {"label":"germany-to-rome","src":"Capital of Germany? A:", "new_token":" Rome","layer":27} ] EOF larql memit /path/to/gemma4 --edits edits.json --output patches/ larql apply-patch /path/to/gemma4 \\ -p patches/memit_L27.lqpatch \\ --prompt "Capital of France? A:" ### Known ceiling Chapter 22 established that single-layer MEMIT with correlated keys (~60% cosine) lands ~3/5 concurrent targets. For 5+ correlated edits, users can now distribute across multiple crown layers via `layer` overrides in edits.json — MEMIT runs once per layer group. Compile-checked with `cargo check --package larql-cli`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mikeumus
added a commit
that referenced
this pull request
Apr 18, 2026
Wraps the existing covariance-MEMIT solver (larql_inference::forward::memit:: run_memit) with a CLI, an edits.json file format, and automatic crown-layer discovery for each edit. Groups edits by crown layer, invokes the joint least-squares solve, emits one dense `.lqpatch` per affected layer plus a manifest.json. Phase C of RFC-0001 (#2), stacked on Phase B (#4). ### Extended patch file format (still backward compatible) - Bumped patch version 1 → 2 with a `kind` field (defaults to "rank_one") - New `kind = "dense"` variant carries a flat row-major ΔW matrix, needed because MEMIT's covariance-projected solve isn't natively a rank-1 outer product. Larger on disk (~72 MB per Gemma 4 4B layer) but semantically exact — no SVD approximation step. - `write_patch`, `read_patch`, `apply_patch` all dispatch on kind. Phase B rank-1 patches continue to round-trip unchanged. - New `compute_dense()` helper builds a Dense patch from an Array2<f32>. ### New CLI: `larql memit` - Reads edits.json (list of {label, src, new_token, layer?} records). - For each edit: tokenises src, resolves target_token_id, resolves crown layer (explicit or auto-scan). - Calls `run_memit` with Vec<MemitFact>, receives one `MemitResult` per affected layer. - Serialises each layer's ΔW as a Dense patch into the output directory, writes a manifest.json enumerating them. - Prints the apply-patch command to install the batch. ### Usage cat > edits.json <<EOF [ {"label":"france-to-tokyo","src":"Capital of France? A:", "new_token":" Tokyo","layer":27}, {"label":"germany-to-rome","src":"Capital of Germany? A:", "new_token":" Rome","layer":27} ] EOF larql memit /path/to/gemma4 --edits edits.json --output patches/ larql apply-patch /path/to/gemma4 \\ -p patches/memit_L27.lqpatch \\ --prompt "Capital of France? A:" ### Known ceiling Chapter 22 established that single-layer MEMIT with correlated keys (~60% cosine) lands ~3/5 concurrent targets. For 5+ correlated edits, users can now distribute across multiple crown layers via `layer` overrides in edits.json — MEMIT runs once per layer group. Compile-checked with `cargo check --package larql-cli`. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Proposes three new LarQL subcommands that turn LarQL into the first mechanistic-interpretability-native fact-editing CLI:
larql crown— per-edit crown-layer discovery via module ablation (Chapter 17 Phase 125c)larql edit— single-fact rank-1 edit with auto-scale calibration (Chapter 20 Phase 140 + Chapter 18 Phase 130)larql memit— batch fact editing via joint least-squares, grouped by crown (Chapter 21 Phase 141c + Chapter 23 Phase 143b)Plus a new patch file format (~55KB per Gemma 4 4B single edit) and a non-destructive
larql apply-patchcommand.Why
Nine chapters of experiments on Gemma 4 4B and 26B in April 2026 established the mechanism and proved editing works:
Building this in Rust on top of the existing
larql-inferenceforward-pass + capture infrastructure beats shipping it as a Python library because:auto-scale+ per-edit crown discovery don't require ML expertise to operateWhat's in this PR
Just the design doc (
docs/rfcs/0001-mechanistic-fact-editing.md). Phased implementation plan:larql crown— smallest new code, builds on existing capture hookslarql edit+ patch format +apply-patchlarql memitjoint multi-factlarql-pythonbinding extensionsTest plan
References
google/gemma-4-e2b-it(matches our published LarQL vindex atDivinci-AI/gemma-4-4b-e2b-vindex)🤖 Generated with Claude Code