Skip to content

Fix/gguf shape#20

Closed
codeninja wants to merge 7 commits intochrishayuk:mainfrom
codeninja:fix/gguf-shape
Closed

Fix/gguf shape#20
codeninja wants to merge 7 commits intochrishayuk:mainfrom
codeninja:fix/gguf-shape

Conversation

@codeninja
Copy link
Copy Markdown
Contributor

@codeninja codeninja commented Apr 15, 2026

Fixes #16.

Summary

  • fix GGUF 2D tensor loading to interpret ggml dimensions as n0 = cols, n1 = rows
  • keep the embedding transpose only as a defensive fallback after normalizing GGUF tensors
  • add a regression test that writes a tiny GGUF file and verifies the loaded tensor shape

Root cause

The Gemma 3 safetensors layout is correct. The panic comes from the GGUF loader transposing non-square tensors on load.

build_vindex() expects down_proj to be [hidden, intermediate] so it can score vocab logits with (vocab, hidden) @ (hidden, chunk). The GGUF loader was building 2D arrays as (dims[0], dims[1]), but ggml stores them as n0 = cols, n1 = rows. That turns down_proj into [intermediate, hidden], which matches the incompatible shapes seen in the panic.

Validation

  • cargo test -p larql-models

Note

load_model_dir() will auto-load a .gguf file if one is present in the input directory, so safetensors-to-vindex can hit this GGUF path even though the CLI message says "Loading safetensors".

codeninja and others added 7 commits April 15, 2026 15:41
The non-aarch64 branch of csrc/q4_dot.c previously defined
q4_0_matvec_c and q4_0_vecmat_c as empty stubs, silently leaving
output buffers zero-initialized. Three larql-compute tests that
asserted nonzero output failed on every non-ARM host.

Port the ARM NEON logic to straight scalar C for the x86 branch,
hoist decode_f16 to file scope so both branches share it, and
keep the ARM/NEON path untouched. Add a regression test that
compares kernel output against a Rust dequantize-and-matmul
reference (max relative error ~1.2e-7, cosine similarity 1.0).

Scalar only — AVX2 can come later, and the new test will catch
numerical drift during that port.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The `cc` crate only emits `cargo:rerun-if-changed` directives for source
files explicitly passed to `.file()`. That's a narrow fingerprint:

1. A new `.c` or `.h` file added under `csrc/` won't trigger a rebuild
   until someone remembers to wire it into `build.rs`.
2. Stale `.o` artifacts from a prior branch can silently mask source
   changes in the same `target/` directory after a merge or `git pull`
   — the build script isn't re-run because cargo sees no tracked
   input change.

Widen the fingerprint to the `csrc/` directory and `build.rs` itself
so any change in those paths re-invokes the build script and the `cc`
crate re-checks its inputs. Belt-and-suspenders over the per-`.file()`
tracking — cheap, safe, closes the most common hole.

Fixes chrishayuk#14

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: chrishayuk <chrishayuk@users.noreply.github.com>
@codeninja
Copy link
Copy Markdown
Contributor Author

codeninja commented Apr 16, 2026

Personal Run

68) [v22.20.0][~/larql](fix/gguf-shape)$ cargo run --release -p larql-cli convert safetensors-to-vindex   ~/.cache/huggingface/hub/models--google--gemma-4-E4B-it/snapshots/83df0a889143b1dbfc61b591bbc639540fd9ce4c/   -o vindexes/gemma-e4b-it-from-safetensor.vindex --level infe
rence --f16
    Finished `release` profile [optimized] target(s) in 0.15s
     Running `target/release/larql convert safetensors-to-vindex /home/codeninja/.cache/huggingface/hub/models--google--gemma-4-E4B-it/snapshots/83df0a889143b1dbfc61b591bbc639540fd9ce4c/ -o vindexes/gemma-e4b-it-from-safetensor.vindex --level inference --f16`
Loading safetensors: /home/codeninja/.cache/huggingface/hub/models--google--gemma-4-E4B-it/snapshots/83df0a889143b1dbfc61b591bbc639540fd9ce4c/
Extracting to vindexes/gemma-e4b-it-from-safetensor.vindex
    Whole-word vocab: 135881 tokens (of 262144)
  Wikidata output matching: 0/512 clusters labeled
Done: vindexes/gemma-e4b-it-from-safetensor.vindex
69) [v22.20.0][~/larql](fix/gguf-shape)$ 

@chrishayuk
Copy link
Copy Markdown
Owner

sweeet. this is super useful....
i've just got gemma 4 working on one of my branches already...
i will merge with what i've got and see how it converges
tonight will be a bit of a merge fest

@chrishayuk
Copy link
Copy Markdown
Owner

Fixes superseded by #24 (GGUF) and #21 (Q4 scalar). Tests cherry-picked in follow-up.

@chrishayuk chrishayuk closed this Apr 17, 2026
mikeumus added a commit to Divinci-AI/larql that referenced this pull request Apr 22, 2026
* working on arch b, unified insert

* working on memit with vindex, and templates

* memit style

* workig on latest memit

* working on wasm

* working on wasm

* cleaned up vindex and larql

* fix: Linux support — conditional BLAS and Q4 scalar fallback

- Implement Q4 scalar fallback for non-ARM targets:
  - Move decode_f16() before #if aarch64 (shared by both paths)
  - Replace empty stub functions with correct scalar implementations
  - q4_0_matvec_c and q4_0_vecmat_c now produce correct results on x86_64
  Affects: larql-compute/csrc/q4_dot.c

Tested on Ubuntu 24 (WSL2, x86_64): cargo build --release and
cargo test --workspace pass with 0 failures.
macOS path untested — preserves accelerate via cfg(target_os)
and requires validation on Apple hardware.

* working on bounded compute script

* refactored lql

* improved refacxtor

* updated executor

* gemma 4

* working on compute

* improved for gemma 4

* test: cherry-pick GGUF shape + Q4 correctness tests from chrishayuk#20

* updated examples

* working through python parity

* working on q4k tidyup

* improving testing and quantization

* improving testing

* gemma 4 support

* improved clu

* autoregressive generation

* kv cache works

* working on shader pipeline

* working shaders

* working on shaders and graph

* moved to full graph

* workin through ffn walk performance

* working version

* modulrized shaders

* working on decoupling decode

* working on performance

* more performance improvements

* improving performance

* more performance improvments

* working on performance

* working on distributed grid

* working on grid

* improving docs and moe

* working on moe

* improved publish pull

* binary format

* working binary format and performance

* updated vindex server specs for binary

* improved lm_head

* improved prefill

* improved lm head

* gemma 4 vindex

* working on gemma 4 moe

* working on cleanup for merge

* fixed issue with select

* residual stream

* working on benchmarks

---------

Co-authored-by: chrishayuk <chrishayuk@googlemail.com>
Co-authored-by: Remi <remipetiot@hotmail.com>
Co-authored-by: chrishayuk <chrishayuk@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error with converting safetensors / gguf to vindex

2 participants