Sync with upstream Architecture B (chrishayuk/larql#30) by mikeumus · Pull Request #13 · Divinci-AI/larql

mikeumus · 2026-04-22T22:21:31Z

Summary

Merges 31 upstream commits from chrishayuk/larql PR chrishayuk#30 "Architecture B" into our Divinci-AI fork.

What Architecture B brings

GPU-graph + shader-based layer execution (distributed grid support)
Gemma 4 MoE architecture
Binary vindex format + improved publish pull
LM head / prefill / decode decoupling
Residual-stream capture + benchmarks
Remote FFN walk backend (ffn/remote.rs)
patch/core.rs → patch/overlay.rs rename, apply logic split into new patch/overlay_apply.rs
CLI restructure: primary verbs (run/chat/bench/pull/link/list/show/slice/publish/rm/extract) promoted to top-level; research tools stay under larql dev *

Conflict resolutions

File	Resolution
`crates/larql-cli/src/main.rs`	Our RFC-0001 `Crown`/`Edit`/`ApplyPatch`/`Memit` are `DevCommand` variants, not `Commands`. Moved their dispatch into `run_dev`.
`crates/larql-vindex/src/patch/overlay.rs`	Upstream moved `apply_patch`/`rebuild_overrides` to new `overlay_apply.rs` — which already has our `InsertKnn`/`DeleteKnn` handlers line-for-line. Deleted the duplicate HEAD block.
`crates/larql-vindex/src/patch/overlay_apply.rs`	(Not a conflict, manual patch.) Preserved our `base.down_overrides` / `base.up_overrides` clear in `rebuild_overrides` so Phase-1 unlearning revert doesn't leak gate/down vectors across patches.
`crates/larql-inference/src/ffn/mod.rs`	Combined our `ablating`/`injecting` modules with upstream's `remote`. Dropped stale `pub mod experimental` (file never existed in our main — pre-existing broken reference).
`crates/larql-inference/src/lib.rs`	Re-exported both sides: our `HighwayFfn`/`LastPositionAblatingFfn`/`LastPositionInjectingFfn` and upstream's `RemoteFfn*` types.
`crates/larql-models/src/detect.rs`	Combined our `use_double_wide_mlp` field with upstream's `enable_moe_block`/`top_k_experts`/`moe_intermediate_size`.

Validation

cargo check --workspace → clean (warnings only, all pre-existing)
166 unit tests pass: 102 in larql-vindex + larql-models, 64 in larql-inference
Smoke-test: larql --help shows primary verbs + dev subcommand; larql dev --help shows crown/edit/apply-patch/memit all registered

Test plan

Spot-check larql dev crown --help / dev edit --help / dev apply-patch --help / dev memit --help args still render correctly
Run a Gate-3 suppression test against a real vindex (Paris→capital DELETE, verify describe rank changes) — confirms Architecture B's new apply/revert path still honors our bug fix
Build the larql-service Docker image + deploy to staging — verify no compile regressions under release mode
Re-run the isolation harness (larql-isolation-harness) to confirm the session-scoped patch behavior survived the merge

🤖 Generated with Claude Code

- Implement Q4 scalar fallback for non-ARM targets: - Move decode_f16() before #if aarch64 (shared by both paths) - Replace empty stub functions with correct scalar implementations - q4_0_matvec_c and q4_0_vecmat_c now produce correct results on x86_64 Affects: larql-compute/csrc/q4_dot.c Tested on Ubuntu 24 (WSL2, x86_64): cargo build --release and cargo test --workspace pass with 0 failures. macOS path untested — preserves accelerate via cfg(target_os) and requires validation on Apple hardware.

feat(gemma4): Add Gemma 4 GGUF support + fix column-major loading and Q4_K dequantization

fix: non-ARM support — Q4 scalar fallback

Brings in Gemma 4 GGUF support, column-major fix, Q4_K dequant fix (chrishayuk#24), non-ARM Q4 scalar fallback (chrishayuk#21), plus cherry-picked regression tests for both. Conflict in crates/larql-vindex/src/extract/build.rs resolved: kept arch-b's self.down_top_k refactor while adopting main's NaN-safe .unwrap_or(Ordering::Equal) in the score comparators.

Architecture b

Brings in 31 upstream commits landing Architecture B: - GPU-graph + shader-based layer execution (working shaders → distributed grid) - Gemma 4 MoE architecture support - Binary vindex format + improved publish pull - LM head / prefill / decode decoupling improvements - Residual-stream capture + benchmarks - Remote FFN walk backend (ffn/remote.rs) - Renamed patch/core.rs → patch/overlay.rs with apply logic split into patch/overlay_apply.rs - CLI restructure: primary verbs (run/chat/bench/pull/link/list/show/ slice/publish/rm/extract) moved to top level; research tools (dev *) kept as DevCommand subcommand Conflict resolutions: - crates/larql-cli/src/main.rs Main dispatch now only handles top-level Commands variants. Our RFC-0001 Crown/Edit/ApplyPatch/Memit are DevCommand variants and dispatch through run_dev. Re-added them there. - crates/larql-vindex/src/patch/overlay.rs Upstream moved apply_patch + rebuild_overrides into patch/overlay_apply.rs, which already carries our InsertKnn/DeleteKnn handlers line-for-line. Deleted the duplicated HEAD block. - crates/larql-vindex/src/patch/overlay_apply.rs (not a conflict, but manually patched) Preserved our base.down_overrides/up_overrides clear in rebuild_overrides so Phase-1 unlearning revert doesn't leak. - crates/larql-inference/src/ffn/mod.rs Combined our ablating/injecting additions with upstream's remote module. Dropped stale `pub mod experimental` (file never existed on our main — pre-existing broken reference). - crates/larql-inference/src/lib.rs Re-exported both our HighwayFfn/LastPositionAblatingFfn/ LastPositionInjectingFfn and upstream's RemoteFfn* types. - crates/larql-models/src/detect.rs Combined our use_double_wide_mlp field with upstream's enable_moe_block/top_k_experts/moe_intermediate_size. Validation: `cargo check --workspace` clean; 166 unit tests pass across larql-vindex, larql-models, larql-inference; CLI --help shows primary verbs + dev subcommands including crown/edit/apply-patch/memit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chrishayuk and others added 30 commits April 15, 2026 00:56

working on arch b, unified insert

dc4e447

working on memit with vindex, and templates

b909da8

memit style

9a9869b

workig on latest memit

169fe23

working on wasm

2ef80f7

working on wasm

1f5d610

Merge remote-tracking branch 'origin/main' into architecture-b

98a7247

cleaned up vindex and larql

2373a66

working on bounded compute script

6719b04

refactored lql

1c6b671

improved refacxtor

3d6cf73

updated executor

d8499e3

gemma 4

5f1ff10

working on compute

e004cf1

improved for gemma 4

c52413b

Merge pull request chrishayuk#24 from Divinci-AI/main

a3ba45d

feat(gemma4): Add Gemma 4 GGUF support + fix column-major loading and Q4_K dequantization

Merge pull request chrishayuk#21 from Reminic/fix/linux-support

fe08c64

fix: non-ARM support — Q4 scalar fallback

test: cherry-pick GGUF shape + Q4 correctness tests from chrishayuk#20

32a1af0

updated examples

59d4024

working through python parity

f0bcc7f

working on q4k tidyup

ce2ec99

improving testing and quantization

9cf1742

improving testing

a5b8392

gemma 4 support

48ef0d3

improved clu

e289d5b

autoregressive generation

7633154

kv cache works

05ecf34

working on shader pipeline

0132585

chrishayuk and others added 28 commits April 18, 2026 17:46

workin through ffn walk performance

7087fe3

working version

b81fac8

modulrized shaders

9d340b6

working on decoupling decode

604e685

working on performance

ad23882

more performance improvements

b4fd32a

improving performance

42157f1

more performance improvments

668f664

working on performance

23a5122

working on distributed grid

14fd729

working on grid

617c68f

improving docs and moe

4aa7e02

working on moe

3d7c113

improved publish pull

68aeb1f

binary format

8193f02

working binary format and performance

6fb231d

updated vindex server specs for binary

28717b6

improved lm_head

1f75d52

improved prefill

d2ee0d2

improved lm head

251fb08

gemma 4 vindex

833c124

working on gemma 4 moe

a7d6cc3

working on cleanup for merge

9adcb7c

fixed issue with select

0c2e680

residual stream

7de5208

working on benchmarks

8896013

Merge pull request chrishayuk#30 from chrishayuk/architecture-b

5b283a9

Architecture b

mikeumus merged commit bdf7e88 into main Apr 22, 2026
0 of 2 checks passed

mikeumus deleted the sync-upstream-architecture-b branch April 22, 2026 22:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync with upstream Architecture B (chrishayuk/larql#30)#13

Sync with upstream Architecture B (chrishayuk/larql#30)#13
mikeumus merged 61 commits intomainfrom
sync-upstream-architecture-b

mikeumus commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mikeumus commented Apr 22, 2026

Summary

What Architecture B brings

Conflict resolutions

Validation

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants