Fix/gguf shape#20
Closed
codeninja wants to merge 7 commits intochrishayuk:mainfrom
Closed
Conversation
The non-aarch64 branch of csrc/q4_dot.c previously defined q4_0_matvec_c and q4_0_vecmat_c as empty stubs, silently leaving output buffers zero-initialized. Three larql-compute tests that asserted nonzero output failed on every non-ARM host. Port the ARM NEON logic to straight scalar C for the x86 branch, hoist decode_f16 to file scope so both branches share it, and keep the ARM/NEON path untouched. Add a regression test that compares kernel output against a Rust dequantize-and-matmul reference (max relative error ~1.2e-7, cosine similarity 1.0). Scalar only — AVX2 can come later, and the new test will catch numerical drift during that port. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The `cc` crate only emits `cargo:rerun-if-changed` directives for source files explicitly passed to `.file()`. That's a narrow fingerprint: 1. A new `.c` or `.h` file added under `csrc/` won't trigger a rebuild until someone remembers to wire it into `build.rs`. 2. Stale `.o` artifacts from a prior branch can silently mask source changes in the same `target/` directory after a merge or `git pull` — the build script isn't re-run because cargo sees no tracked input change. Widen the fingerprint to the `csrc/` directory and `build.rs` itself so any change in those paths re-invokes the build script and the `cc` crate re-checks its inputs. Belt-and-suspenders over the per-`.file()` tracking — cheap, safe, closes the most common hole. Fixes chrishayuk#14 Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: chrishayuk <chrishayuk@users.noreply.github.com>
Contributor
Author
|
Personal Run |
Owner
|
sweeet. this is super useful.... |
Owner
chrishayuk
added a commit
that referenced
this pull request
Apr 17, 2026
mikeumus
added a commit
to Divinci-AI/larql
that referenced
this pull request
Apr 22, 2026
* working on arch b, unified insert * working on memit with vindex, and templates * memit style * workig on latest memit * working on wasm * working on wasm * cleaned up vindex and larql * fix: Linux support — conditional BLAS and Q4 scalar fallback - Implement Q4 scalar fallback for non-ARM targets: - Move decode_f16() before #if aarch64 (shared by both paths) - Replace empty stub functions with correct scalar implementations - q4_0_matvec_c and q4_0_vecmat_c now produce correct results on x86_64 Affects: larql-compute/csrc/q4_dot.c Tested on Ubuntu 24 (WSL2, x86_64): cargo build --release and cargo test --workspace pass with 0 failures. macOS path untested — preserves accelerate via cfg(target_os) and requires validation on Apple hardware. * working on bounded compute script * refactored lql * improved refacxtor * updated executor * gemma 4 * working on compute * improved for gemma 4 * test: cherry-pick GGUF shape + Q4 correctness tests from chrishayuk#20 * updated examples * working through python parity * working on q4k tidyup * improving testing and quantization * improving testing * gemma 4 support * improved clu * autoregressive generation * kv cache works * working on shader pipeline * working shaders * working on shaders and graph * moved to full graph * workin through ffn walk performance * working version * modulrized shaders * working on decoupling decode * working on performance * more performance improvements * improving performance * more performance improvments * working on performance * working on distributed grid * working on grid * improving docs and moe * working on moe * improved publish pull * binary format * working binary format and performance * updated vindex server specs for binary * improved lm_head * improved prefill * improved lm head * gemma 4 vindex * working on gemma 4 moe * working on cleanup for merge * fixed issue with select * residual stream * working on benchmarks --------- Co-authored-by: chrishayuk <chrishayuk@googlemail.com> Co-authored-by: Remi <remipetiot@hotmail.com> Co-authored-by: chrishayuk <chrishayuk@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #16.
Summary
n0 = cols, n1 = rowsRoot cause
The Gemma 3 safetensors layout is correct. The panic comes from the GGUF loader transposing non-square tensors on load.
build_vindex()expectsdown_projto be[hidden, intermediate]so it can score vocab logits with(vocab, hidden) @ (hidden, chunk). The GGUF loader was building 2D arrays as(dims[0], dims[1]), but ggml stores them asn0 = cols, n1 = rows. That turnsdown_projinto[intermediate, hidden], which matches the incompatible shapes seen in the panic.Validation
cargo test -p larql-modelsNote
load_model_dir()will auto-load a.gguffile if one is present in the input directory, sosafetensors-to-vindexcan hit this GGUF path even though the CLI message says "Loading safetensors".