Skip to content

Architecture b#30

Merged
chrishayuk merged 55 commits intomainfrom
architecture-b
Apr 21, 2026
Merged

Architecture b#30
chrishayuk merged 55 commits intomainfrom
architecture-b

Conversation

@chrishayuk
Copy link
Copy Markdown
Owner

No description provided.

@chrishayuk chrishayuk merged commit 5b283a9 into main Apr 21, 2026
mikeumus added a commit to Divinci-AI/larql that referenced this pull request Apr 22, 2026
* working on arch b, unified insert

* working on memit with vindex, and templates

* memit style

* workig on latest memit

* working on wasm

* working on wasm

* cleaned up vindex and larql

* fix: Linux support — conditional BLAS and Q4 scalar fallback

- Implement Q4 scalar fallback for non-ARM targets:
  - Move decode_f16() before #if aarch64 (shared by both paths)
  - Replace empty stub functions with correct scalar implementations
  - q4_0_matvec_c and q4_0_vecmat_c now produce correct results on x86_64
  Affects: larql-compute/csrc/q4_dot.c

Tested on Ubuntu 24 (WSL2, x86_64): cargo build --release and
cargo test --workspace pass with 0 failures.
macOS path untested — preserves accelerate via cfg(target_os)
and requires validation on Apple hardware.

* working on bounded compute script

* refactored lql

* improved refacxtor

* updated executor

* gemma 4

* working on compute

* improved for gemma 4

* test: cherry-pick GGUF shape + Q4 correctness tests from chrishayuk#20

* updated examples

* working through python parity

* working on q4k tidyup

* improving testing and quantization

* improving testing

* gemma 4 support

* improved clu

* autoregressive generation

* kv cache works

* working on shader pipeline

* working shaders

* working on shaders and graph

* moved to full graph

* workin through ffn walk performance

* working version

* modulrized shaders

* working on decoupling decode

* working on performance

* more performance improvements

* improving performance

* more performance improvments

* working on performance

* working on distributed grid

* working on grid

* improving docs and moe

* working on moe

* improved publish pull

* binary format

* working binary format and performance

* updated vindex server specs for binary

* improved lm_head

* improved prefill

* improved lm head

* gemma 4 vindex

* working on gemma 4 moe

* working on cleanup for merge

* fixed issue with select

* residual stream

* working on benchmarks

---------

Co-authored-by: chrishayuk <chrishayuk@googlemail.com>
Co-authored-by: Remi <remipetiot@hotmail.com>
Co-authored-by: chrishayuk <chrishayuk@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@chrishayuk chrishayuk deleted the architecture-b branch April 24, 2026 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant