Skip to content

feat(cli): larql crown — crown-layer discovery (Phase A of RFC-0001)#3

Merged
mikeumus merged 1 commit intomainfrom
feat/crown-command
Apr 18, 2026
Merged

feat(cli): larql crown — crown-layer discovery (Phase A of RFC-0001)#3
mikeumus merged 1 commit intomainfrom
feat/crown-command

Conversation

@mikeumus
Copy link
Copy Markdown

Summary

Phase A of RFC-0001 (#2): implements larql crown, the first subcommand in the mechanistic fact-editing pipeline. Given a prompt and an expected next-token, it scans per-layer last-position MLP ablations and reports the layer that most suppresses the expected token — the "crown" writer for that fact.

What this PR adds

crates/larql-inference/src/ffn/ablating.rs

New LastPositionAblatingFfn — a thin FfnBackend wrapper that passes through to any inner backend at most layers but zeroes the last-row output at a single configured target layer. Zero math changes; it just masks one position at one layer.

crates/larql-cli/src/commands/extraction/crown_cmd.rs

New larql crown subcommand. Given --model, --prompt, --expect:

  1. Runs a baseline forward pass, captures the top-k predictions (default 100).
  2. For each layer L in [start_layer..=end_layer] (defaults to 60%..N-2 of depth), runs predict_with_ffn with LastPositionAblatingFfn::new(&weight_ffn, L) and records the per-layer Δ in the expected token's probability plus whether top-1 flipped.
  3. Selects the crown: prefers flipped-top layers, tie-breaks by most-negative Δ probability.
  4. Optionally emits JSON (--json) so downstream commands (edit, memit) can consume crown_layer.

Usage

larql crown /path/to/gemma4 \
    --prompt \"Capital of France? A:\" \
    --expect \" Paris\" \
    --json

Expected behavior on Gemma 4 4B (validated in Python in Chapter 17 Phase 125c): crown_layer = 27, top-after-ablation = "France".

Methodology

Direct reimplementation in Rust of the Phase 125c ablation scan from CHAPTER_17_CORONATION.md in the Divinci-AI/server research repo. On Gemma 4 4B, that scan found L27 MLP as the load-bearing country→capital writer (ablation flips top token to "France").

Dependencies

  • Reuses the existing FfnBackend trait, WeightFfn, predict_with_ffn — no changes to core inference.
  • Adds serde derives on a small result struct; both serde crates are already workspace dependencies.

Testing

  • cargo check --package larql-inference
  • cargo check --package larql-cli
  • Live testing against a real Gemma 4 4B model is pending (would require ~10GB model download + Rust GGUF loading of the Gemma 4 tokenizer, out of scope for this PR — Phase A intent is CLI + trait scaffolding).

Follow-up PRs

Per RFC-0001 phased rollout:

  • Phase B: larql edit — rank-1 fact edit + patch file format + apply-patch
  • Phase C: larql memit — batch fact editing via the existing run_memit in larql-inference/forward/memit.rs + new CLI wrapper + specificity validation
  • Phase D: larql-python bindings exposing crown/edit/memit to Python

GitHub issues for B/C/D will be opened after this PR merges.

Linked RFC

🤖 Generated with Claude Code

Implements Phase A of RFC-0001 (#2): per-layer MLP ablation scan to find
the layer whose last-position MLP output is load-bearing for a given
(prompt, expected-token) pair.

Changes:
- crates/larql-inference/src/ffn/ablating.rs — new LastPositionAblatingFfn
  that wraps any FfnBackend and zeroes its output at the last-token row for
  one target layer. Thin wrapper, no math changes.
- crates/larql-cli/src/commands/extraction/crown_cmd.rs — new `larql crown`
  subcommand. Tokenises the prompt, runs a baseline forward pass, then
  iterates layers in [start..=end] running predict_with_ffn against the
  ablating backend, reports per-layer Δ in expected-token probability and
  picks the layer whose ablation causes the top prediction to flip with the
  largest suppression magnitude.

Methodology matches Phase 125c of Divinci-AI/server
notebooks/CHAPTER_17_CORONATION.md — on Gemma 4 4B, ablating L27 MLP on
"Capital of France? A:" makes the top prediction flip from " Paris" to
"France" (the country token). The command outputs JSON (optional --json)
so downstream commands (edit, memit) can consume the crown_layer field.

Compile-checked with `cargo check --package larql-cli`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mikeumus added a commit that referenced this pull request Apr 17, 2026
… RFC-0001)

Implements Phase B of RFC-0001 (#2): single-fact rank-1 editor with
portable patch file format. Builds on Phase A's LastPositionAblatingFfn
(#3) and adds the symmetric LastPositionInjectingFfn for scale search.

### New library module: `larql-inference/src/edit.rs`
- `EditPatch` struct (serializable via serde)
- `compute_rank1(k, d, scale, layer, provenance) -> EditPatch`
- `write_patch(path, &patch)` / `read_patch(path) -> EditPatch` with a
  simple binary format: LQPATCH magic + JSON meta + little-endian f32
  vectors for d and k_norm. ~55 KB for Gemma 4 4B.
- `apply_patch(&mut ModelWeights, &EditPatch)`: installs the rank-1
  outer product into `down_proj.weight` in place, handling both
  `[hidden, intermediate]` and `[intermediate, hidden]` layouts.

### New FFN wrapper: `larql-inference/src/ffn/injecting.rs`
- `LastPositionInjectingFfn` — adds a fixed delta vector to the inner
  backend's last-row output at one target layer. Symmetric to the
  ablating wrapper from PR #3. Used for auto-scale search.

### New CLI commands
- `larql edit <model> --src "..." --tgt "..." --new-token " Tokyo" --output f2t.lqpatch`
  Runs Phase A crown discovery (or accepts `--layer`), captures k at the
  crown layer for both prompts, computes d = W_down @ (k_tgt - k_src),
  linearly searches [0.5, 1, 1.5, 2, 2.5, 3, 4] for the minimum scale
  that flips the source's top-1 to --new-token, emits the patch.
- `larql apply-patch <model> --patch f2t.lqpatch --prompt "..."`
  Non-destructively installs one or more patches into the loaded
  weights, optionally runs a test prediction. Supports `--reverse`
  to subtract a patch (verifies reversibility).

### Supporting change
- Added `InferenceModel::weights_mut()` accessor so apply-patch can
  mutate the in-memory weight map without reloading.

Methodology validated in Python across Divinci-AI/server
notebooks/CHAPTER_20_HONEY.md (Phase 140c: France→Tokyo with 11/11
specificity at 0.9% weight perturbation) and CHAPTER_18_THE_EDIT.md
(Phase 130 scale search). The Rust port preserves the same math.

Compile-checked with `cargo check --package larql-cli`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mikeumus mikeumus merged commit 2324af4 into main Apr 18, 2026
@mikeumus mikeumus deleted the feat/crown-command branch April 18, 2026 00:00
mikeumus added a commit that referenced this pull request Apr 18, 2026
… RFC-0001)

Implements Phase B of RFC-0001 (#2): single-fact rank-1 editor with
portable patch file format. Builds on Phase A's LastPositionAblatingFfn
(#3) and adds the symmetric LastPositionInjectingFfn for scale search.

### New library module: `larql-inference/src/edit.rs`
- `EditPatch` struct (serializable via serde)
- `compute_rank1(k, d, scale, layer, provenance) -> EditPatch`
- `write_patch(path, &patch)` / `read_patch(path) -> EditPatch` with a
  simple binary format: LQPATCH magic + JSON meta + little-endian f32
  vectors for d and k_norm. ~55 KB for Gemma 4 4B.
- `apply_patch(&mut ModelWeights, &EditPatch)`: installs the rank-1
  outer product into `down_proj.weight` in place, handling both
  `[hidden, intermediate]` and `[intermediate, hidden]` layouts.

### New FFN wrapper: `larql-inference/src/ffn/injecting.rs`
- `LastPositionInjectingFfn` — adds a fixed delta vector to the inner
  backend's last-row output at one target layer. Symmetric to the
  ablating wrapper from PR #3. Used for auto-scale search.

### New CLI commands
- `larql edit <model> --src "..." --tgt "..." --new-token " Tokyo" --output f2t.lqpatch`
  Runs Phase A crown discovery (or accepts `--layer`), captures k at the
  crown layer for both prompts, computes d = W_down @ (k_tgt - k_src),
  linearly searches [0.5, 1, 1.5, 2, 2.5, 3, 4] for the minimum scale
  that flips the source's top-1 to --new-token, emits the patch.
- `larql apply-patch <model> --patch f2t.lqpatch --prompt "..."`
  Non-destructively installs one or more patches into the loaded
  weights, optionally runs a test prediction. Supports `--reverse`
  to subtract a patch (verifies reversibility).

### Supporting change
- Added `InferenceModel::weights_mut()` accessor so apply-patch can
  mutate the in-memory weight map without reloading.

Methodology validated in Python across Divinci-AI/server
notebooks/CHAPTER_20_HONEY.md (Phase 140c: France→Tokyo with 11/11
specificity at 0.9% weight perturbation) and CHAPTER_18_THE_EDIT.md
(Phase 130 scale search). The Rust port preserves the same math.

Compile-checked with `cargo check --package larql-cli`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mikeumus added a commit that referenced this pull request Apr 18, 2026
… RFC-0001) (#7)

Implements Phase B of RFC-0001 (#2): single-fact rank-1 editor with
portable patch file format. Builds on Phase A's LastPositionAblatingFfn
(#3) and adds the symmetric LastPositionInjectingFfn for scale search.

### New library module: `larql-inference/src/edit.rs`
- `EditPatch` struct (serializable via serde)
- `compute_rank1(k, d, scale, layer, provenance) -> EditPatch`
- `write_patch(path, &patch)` / `read_patch(path) -> EditPatch` with a
  simple binary format: LQPATCH magic + JSON meta + little-endian f32
  vectors for d and k_norm. ~55 KB for Gemma 4 4B.
- `apply_patch(&mut ModelWeights, &EditPatch)`: installs the rank-1
  outer product into `down_proj.weight` in place, handling both
  `[hidden, intermediate]` and `[intermediate, hidden]` layouts.

### New FFN wrapper: `larql-inference/src/ffn/injecting.rs`
- `LastPositionInjectingFfn` — adds a fixed delta vector to the inner
  backend's last-row output at one target layer. Symmetric to the
  ablating wrapper from PR #3. Used for auto-scale search.

### New CLI commands
- `larql edit <model> --src "..." --tgt "..." --new-token " Tokyo" --output f2t.lqpatch`
  Runs Phase A crown discovery (or accepts `--layer`), captures k at the
  crown layer for both prompts, computes d = W_down @ (k_tgt - k_src),
  linearly searches [0.5, 1, 1.5, 2, 2.5, 3, 4] for the minimum scale
  that flips the source's top-1 to --new-token, emits the patch.
- `larql apply-patch <model> --patch f2t.lqpatch --prompt "..."`
  Non-destructively installs one or more patches into the loaded
  weights, optionally runs a test prediction. Supports `--reverse`
  to subtract a patch (verifies reversibility).

### Supporting change
- Added `InferenceModel::weights_mut()` accessor so apply-patch can
  mutate the in-memory weight map without reloading.

Methodology validated in Python across Divinci-AI/server
notebooks/CHAPTER_20_HONEY.md (Phase 140c: France→Tokyo with 11/11
specificity at 0.9% weight perturbation) and CHAPTER_18_THE_EDIT.md
(Phase 130 scale search). The Rust port preserves the same math.

Compile-checked with `cargo check --package larql-cli`.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant