Sync with upstream Architecture B (chrishayuk/larql#30)#13
Merged
Conversation
- Implement Q4 scalar fallback for non-ARM targets: - Move decode_f16() before #if aarch64 (shared by both paths) - Replace empty stub functions with correct scalar implementations - q4_0_matvec_c and q4_0_vecmat_c now produce correct results on x86_64 Affects: larql-compute/csrc/q4_dot.c Tested on Ubuntu 24 (WSL2, x86_64): cargo build --release and cargo test --workspace pass with 0 failures. macOS path untested — preserves accelerate via cfg(target_os) and requires validation on Apple hardware.
feat(gemma4): Add Gemma 4 GGUF support + fix column-major loading and Q4_K dequantization
fix: non-ARM support — Q4 scalar fallback
Brings in Gemma 4 GGUF support, column-major fix, Q4_K dequant fix (chrishayuk#24), non-ARM Q4 scalar fallback (chrishayuk#21), plus cherry-picked regression tests for both. Conflict in crates/larql-vindex/src/extract/build.rs resolved: kept arch-b's self.down_top_k refactor while adopting main's NaN-safe .unwrap_or(Ordering::Equal) in the score comparators.
Architecture b
Brings in 31 upstream commits landing Architecture B:
- GPU-graph + shader-based layer execution (working shaders → distributed grid)
- Gemma 4 MoE architecture support
- Binary vindex format + improved publish pull
- LM head / prefill / decode decoupling improvements
- Residual-stream capture + benchmarks
- Remote FFN walk backend (ffn/remote.rs)
- Renamed patch/core.rs → patch/overlay.rs with apply logic split into
patch/overlay_apply.rs
- CLI restructure: primary verbs (run/chat/bench/pull/link/list/show/
slice/publish/rm/extract) moved to top level; research tools (dev *)
kept as DevCommand subcommand
Conflict resolutions:
- crates/larql-cli/src/main.rs
Main dispatch now only handles top-level Commands variants.
Our RFC-0001 Crown/Edit/ApplyPatch/Memit are DevCommand variants and
dispatch through run_dev. Re-added them there.
- crates/larql-vindex/src/patch/overlay.rs
Upstream moved apply_patch + rebuild_overrides into
patch/overlay_apply.rs, which already carries our InsertKnn/DeleteKnn
handlers line-for-line. Deleted the duplicated HEAD block.
- crates/larql-vindex/src/patch/overlay_apply.rs (not a conflict, but
manually patched) Preserved our base.down_overrides/up_overrides clear
in rebuild_overrides so Phase-1 unlearning revert doesn't leak.
- crates/larql-inference/src/ffn/mod.rs
Combined our ablating/injecting additions with upstream's remote
module. Dropped stale `pub mod experimental` (file never existed on
our main — pre-existing broken reference).
- crates/larql-inference/src/lib.rs
Re-exported both our HighwayFfn/LastPositionAblatingFfn/
LastPositionInjectingFfn and upstream's RemoteFfn* types.
- crates/larql-models/src/detect.rs
Combined our use_double_wide_mlp field with upstream's
enable_moe_block/top_k_experts/moe_intermediate_size.
Validation: `cargo check --workspace` clean; 166 unit tests pass across
larql-vindex, larql-models, larql-inference; CLI --help shows primary
verbs + dev subcommands including crown/edit/apply-patch/memit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Merges 31 upstream commits from chrishayuk/larql PR chrishayuk#30 "Architecture B" into our Divinci-AI fork.
What Architecture B brings
ffn/remote.rs)patch/core.rs→patch/overlay.rsrename, apply logic split into newpatch/overlay_apply.rsrun/chat/bench/pull/link/list/show/slice/publish/rm/extract) promoted to top-level; research tools stay underlarql dev *Conflict resolutions
crates/larql-cli/src/main.rsCrown/Edit/ApplyPatch/MemitareDevCommandvariants, notCommands. Moved their dispatch intorun_dev.crates/larql-vindex/src/patch/overlay.rsapply_patch/rebuild_overridesto newoverlay_apply.rs— which already has ourInsertKnn/DeleteKnnhandlers line-for-line. Deleted the duplicate HEAD block.crates/larql-vindex/src/patch/overlay_apply.rsbase.down_overrides/base.up_overridesclear inrebuild_overridesso Phase-1 unlearning revert doesn't leak gate/down vectors across patches.crates/larql-inference/src/ffn/mod.rsablating/injectingmodules with upstream'sremote. Dropped stalepub mod experimental(file never existed in our main — pre-existing broken reference).crates/larql-inference/src/lib.rsHighwayFfn/LastPositionAblatingFfn/LastPositionInjectingFfnand upstream'sRemoteFfn*types.crates/larql-models/src/detect.rsuse_double_wide_mlpfield with upstream'senable_moe_block/top_k_experts/moe_intermediate_size.Validation
cargo check --workspace→ clean (warnings only, all pre-existing)larql --helpshows primary verbs +devsubcommand;larql dev --helpshowscrown/edit/apply-patch/memitall registeredTest plan
larql dev crown --help/dev edit --help/dev apply-patch --help/dev memit --helpargs still render correctlylarql-serviceDocker image + deploy to staging — verify no compile regressions under release modelarql-isolation-harness) to confirm the session-scoped patch behavior survived the merge🤖 Generated with Claude Code