Skip to content

ci: isolation-harness gates + Gemma4 per-layer intermediate_size + bug fixes#12

Merged
mikeumus merged 3 commits intomainfrom
feat/gemma4-per-layer-intermediate-size
Apr 19, 2026
Merged

ci: isolation-harness gates + Gemma4 per-layer intermediate_size + bug fixes#12
mikeumus merged 3 commits intomainfrom
feat/gemma4-per-layer-intermediate-size

Conversation

@mikeumus
Copy link
Copy Markdown

Summary

  • Gemma 4 double-wide MLP support (per_layer_intermediate_size)
  • Fix write-lock starvation on INFER (sessions_blocking_read) + patch-revert down/up vector leak
  • CI isolation-harness gates (T2=concurrent, T3=global-leak, T5=revert) with synthetic 5 MB tiny-vindex

Notes

CI is pending a billing unblock on the Actions account — all three gates were verified locally against the synthetic vindex (T2: 100 iterations PASS, T5: PASS). Will auto-confirm on next push once billing clears.

🤖 Generated with Claude Code

mikeumus and others added 3 commits April 17, 2026 21:08
Gemma 4's `use_double_wide_mlp=True` widens gate/up/down_proj to 2× base
`intermediate_size` on KV-shared layers. On gemma-4-e2b-it (35 layers,
last 20 shared), layers 15–34 have `intermediate=12288`, layers 0–14
have 6144. Crown-scan defaults to `(3n/5)=21` and lands on a double-wide
layer, so the rank-1 edit hit `intermediate-size mismatch in captured
keys` against the config-wide base size.

Adds `ModelArchitecture::intermediate_size_for_layer(layer) -> usize`
(default = `config.intermediate_size`, mirroring `head_dim_for_layer`).
`Gemma4Arch` overrides by reusing the precomputed `kv_sources` set —
one source of truth for KV-shared-layer membership.

Thread the per-layer lookup through:
- `edit_py.rs`: compute `intermediate` after `chosen_layer` is picked.
- `edit_cmd.rs`: same for the CLI path.
- `memit.rs`: `ffn_dim` now per-layer; `run_memit` already solves per
  layer, so covariances remain correctly sized across mixed layers.

Parse `use_double_wide_mlp` in `detect.rs`; add to `ModelConfig`.

Tests (in `detect.rs`):
- `test_detect_gemma4_e2b`: asserts 6144 on L0/L14, 12288 on L15/L21/L34
  — matches the actual HF tensor shapes verified in the Colab repl.
- `test_gemma4_31b_no_double_wide`: 31B lacks the flag → base everywhere.
- `test_non_gemma4_intermediate_default`: Llama returns base for all
  layers via the default trait impl.

The bare `weights.intermediate_size` field is left as "base" for
display / metadata call sites (demos, patch-print, vindex stats).
Patch file-format unchanged: `compute_rank1` / `compute_dense` already
derive `intermediate_size` from the runtime tensor, so new patches for
double-wide layers store 12288 correctly without a version bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three fixes for larql-server session management:

1. **Bug 1 — write-lock starvation on INFER**: switched sessions_blocking_write → sessions_blocking_read on the INFER path; made last_accessed AtomicU64 so touch() takes &self.
2. **Bug 2 — rebuild_overrides leak**: added base.down_overrides.clear() + base.up_overrides.clear() before replaying patches on remove.
3. **Bug 3 — blocking_read inside async**: pre-acquire base vindex before entering write lock in apply_patch to avoid tokio panic.

All three gates verified: T2 concurrent PASS, T3 global-leak PASS, T4 throughput PASS (mixed p50 0.94× same-session), T5 revert PASS.
Three gates run on every push/PR (T2=concurrent, T3=global-leak, T5=revert).
Requires HARNESS_REPO_TOKEN secret (fine-grained PAT, Contents:read on
Divinci-AI/larql-isolation-harness).

testdata/tiny-vindex is a reproducible 5 MB synthetic vindex generated by
generate.py (seed=42, 8 layers, hidden=128) — no real model weights needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@mikeumus mikeumus merged commit 3266558 into main Apr 19, 2026
0 of 2 checks passed
@mikeumus mikeumus deleted the feat/gemma4-per-layer-intermediate-size branch April 19, 2026 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant