Skip to content

Sync with upstream Architecture B (chrishayuk/larql#30)#13

Merged
mikeumus merged 61 commits intomainfrom
sync-upstream-architecture-b
Apr 22, 2026
Merged

Sync with upstream Architecture B (chrishayuk/larql#30)#13
mikeumus merged 61 commits intomainfrom
sync-upstream-architecture-b

Conversation

@mikeumus
Copy link
Copy Markdown

Summary

Merges 31 upstream commits from chrishayuk/larql PR chrishayuk#30 "Architecture B" into our Divinci-AI fork.

What Architecture B brings

  • GPU-graph + shader-based layer execution (distributed grid support)
  • Gemma 4 MoE architecture
  • Binary vindex format + improved publish pull
  • LM head / prefill / decode decoupling
  • Residual-stream capture + benchmarks
  • Remote FFN walk backend (ffn/remote.rs)
  • patch/core.rspatch/overlay.rs rename, apply logic split into new patch/overlay_apply.rs
  • CLI restructure: primary verbs (run/chat/bench/pull/link/list/show/slice/publish/rm/extract) promoted to top-level; research tools stay under larql dev *

Conflict resolutions

File Resolution
crates/larql-cli/src/main.rs Our RFC-0001 Crown/Edit/ApplyPatch/Memit are DevCommand variants, not Commands. Moved their dispatch into run_dev.
crates/larql-vindex/src/patch/overlay.rs Upstream moved apply_patch/rebuild_overrides to new overlay_apply.rs — which already has our InsertKnn/DeleteKnn handlers line-for-line. Deleted the duplicate HEAD block.
crates/larql-vindex/src/patch/overlay_apply.rs (Not a conflict, manual patch.) Preserved our base.down_overrides / base.up_overrides clear in rebuild_overrides so Phase-1 unlearning revert doesn't leak gate/down vectors across patches.
crates/larql-inference/src/ffn/mod.rs Combined our ablating/injecting modules with upstream's remote. Dropped stale pub mod experimental (file never existed in our main — pre-existing broken reference).
crates/larql-inference/src/lib.rs Re-exported both sides: our HighwayFfn/LastPositionAblatingFfn/LastPositionInjectingFfn and upstream's RemoteFfn* types.
crates/larql-models/src/detect.rs Combined our use_double_wide_mlp field with upstream's enable_moe_block/top_k_experts/moe_intermediate_size.

Validation

  • cargo check --workspace → clean (warnings only, all pre-existing)
  • 166 unit tests pass: 102 in larql-vindex + larql-models, 64 in larql-inference
  • Smoke-test: larql --help shows primary verbs + dev subcommand; larql dev --help shows crown/edit/apply-patch/memit all registered

Test plan

  • Spot-check larql dev crown --help / dev edit --help / dev apply-patch --help / dev memit --help args still render correctly
  • Run a Gate-3 suppression test against a real vindex (Paris→capital DELETE, verify describe rank changes) — confirms Architecture B's new apply/revert path still honors our bug fix
  • Build the larql-service Docker image + deploy to staging — verify no compile regressions under release mode
  • Re-run the isolation harness (larql-isolation-harness) to confirm the session-scoped patch behavior survived the merge

🤖 Generated with Claude Code

chrishayuk and others added 30 commits April 15, 2026 00:56
- Implement Q4 scalar fallback for non-ARM targets:
  - Move decode_f16() before #if aarch64 (shared by both paths)
  - Replace empty stub functions with correct scalar implementations
  - q4_0_matvec_c and q4_0_vecmat_c now produce correct results on x86_64
  Affects: larql-compute/csrc/q4_dot.c

Tested on Ubuntu 24 (WSL2, x86_64): cargo build --release and
cargo test --workspace pass with 0 failures.
macOS path untested — preserves accelerate via cfg(target_os)
and requires validation on Apple hardware.
feat(gemma4): Add Gemma 4 GGUF support + fix column-major loading and Q4_K dequantization
fix: non-ARM support — Q4 scalar fallback
Brings in Gemma 4 GGUF support, column-major fix, Q4_K dequant fix
(chrishayuk#24), non-ARM Q4 scalar fallback (chrishayuk#21), plus cherry-picked regression
tests for both.

Conflict in crates/larql-vindex/src/extract/build.rs resolved: kept
arch-b's self.down_top_k refactor while adopting main's NaN-safe
.unwrap_or(Ordering::Equal) in the score comparators.
chrishayuk and others added 28 commits April 18, 2026 17:46
Brings in 31 upstream commits landing Architecture B:
  - GPU-graph + shader-based layer execution (working shaders → distributed grid)
  - Gemma 4 MoE architecture support
  - Binary vindex format + improved publish pull
  - LM head / prefill / decode decoupling improvements
  - Residual-stream capture + benchmarks
  - Remote FFN walk backend (ffn/remote.rs)
  - Renamed patch/core.rs → patch/overlay.rs with apply logic split into
    patch/overlay_apply.rs
  - CLI restructure: primary verbs (run/chat/bench/pull/link/list/show/
    slice/publish/rm/extract) moved to top level; research tools (dev *)
    kept as DevCommand subcommand

Conflict resolutions:
  - crates/larql-cli/src/main.rs
      Main dispatch now only handles top-level Commands variants.
      Our RFC-0001 Crown/Edit/ApplyPatch/Memit are DevCommand variants and
      dispatch through run_dev. Re-added them there.
  - crates/larql-vindex/src/patch/overlay.rs
      Upstream moved apply_patch + rebuild_overrides into
      patch/overlay_apply.rs, which already carries our InsertKnn/DeleteKnn
      handlers line-for-line. Deleted the duplicated HEAD block.
  - crates/larql-vindex/src/patch/overlay_apply.rs (not a conflict, but
    manually patched) Preserved our base.down_overrides/up_overrides clear
    in rebuild_overrides so Phase-1 unlearning revert doesn't leak.
  - crates/larql-inference/src/ffn/mod.rs
      Combined our ablating/injecting additions with upstream's remote
      module. Dropped stale `pub mod experimental` (file never existed on
      our main — pre-existing broken reference).
  - crates/larql-inference/src/lib.rs
      Re-exported both our HighwayFfn/LastPositionAblatingFfn/
      LastPositionInjectingFfn and upstream's RemoteFfn* types.
  - crates/larql-models/src/detect.rs
      Combined our use_double_wide_mlp field with upstream's
      enable_moe_block/top_k_experts/moe_intermediate_size.

Validation: `cargo check --workspace` clean; 166 unit tests pass across
larql-vindex, larql-models, larql-inference; CLI --help shows primary
verbs + dev subcommands including crown/edit/apply-patch/memit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mikeumus mikeumus merged commit bdf7e88 into main Apr 22, 2026
0 of 2 checks passed
@mikeumus mikeumus deleted the sync-upstream-architecture-b branch April 22, 2026 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants