Architecture b by chrishayuk · Pull Request #30 · chrishayuk/larql

chrishayuk · 2026-04-21T22:00:13Z

No description provided.

Brings in Gemma 4 GGUF support, column-major fix, Q4_K dequant fix (#24), non-ARM Q4 scalar fallback (#21), plus cherry-picked regression tests for both. Conflict in crates/larql-vindex/src/extract/build.rs resolved: kept arch-b's self.down_top_k refactor while adopting main's NaN-safe .unwrap_or(Ordering::Equal) in the score comparators.

* working on arch b, unified insert * working on memit with vindex, and templates * memit style * workig on latest memit * working on wasm * working on wasm * cleaned up vindex and larql * fix: Linux support — conditional BLAS and Q4 scalar fallback - Implement Q4 scalar fallback for non-ARM targets: - Move decode_f16() before #if aarch64 (shared by both paths) - Replace empty stub functions with correct scalar implementations - q4_0_matvec_c and q4_0_vecmat_c now produce correct results on x86_64 Affects: larql-compute/csrc/q4_dot.c Tested on Ubuntu 24 (WSL2, x86_64): cargo build --release and cargo test --workspace pass with 0 failures. macOS path untested — preserves accelerate via cfg(target_os) and requires validation on Apple hardware. * working on bounded compute script * refactored lql * improved refacxtor * updated executor * gemma 4 * working on compute * improved for gemma 4 * test: cherry-pick GGUF shape + Q4 correctness tests from chrishayuk#20 * updated examples * working through python parity * working on q4k tidyup * improving testing and quantization * improving testing * gemma 4 support * improved clu * autoregressive generation * kv cache works * working on shader pipeline * working shaders * working on shaders and graph * moved to full graph * workin through ffn walk performance * working version * modulrized shaders * working on decoupling decode * working on performance * more performance improvements * improving performance * more performance improvments * working on performance * working on distributed grid * working on grid * improving docs and moe * working on moe * improved publish pull * binary format * working binary format and performance * updated vindex server specs for binary * improved lm_head * improved prefill * improved lm head * gemma 4 vindex * working on gemma 4 moe * working on cleanup for merge * fixed issue with select * residual stream * working on benchmarks --------- Co-authored-by: chrishayuk <chrishayuk@googlemail.com> Co-authored-by: Remi <remipetiot@hotmail.com> Co-authored-by: chrishayuk <chrishayuk@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chrishayuk added 30 commits April 15, 2026 00:56

working on arch b, unified insert

dc4e447

working on memit with vindex, and templates

b909da8

memit style

9a9869b

workig on latest memit

169fe23

working on wasm

2ef80f7

working on wasm

1f5d610

Merge remote-tracking branch 'origin/main' into architecture-b

98a7247

cleaned up vindex and larql

2373a66

working on bounded compute script

6719b04

refactored lql

1c6b671

improved refacxtor

3d6cf73

updated executor

d8499e3

gemma 4

5f1ff10

working on compute

e004cf1

improved for gemma 4

c52413b

updated examples

59d4024

working through python parity

f0bcc7f

working on q4k tidyup

ce2ec99

improving testing and quantization

9cf1742

improving testing

a5b8392

gemma 4 support

48ef0d3

improved clu

e289d5b

autoregressive generation

7633154

kv cache works

05ecf34

working on shader pipeline

0132585

working shaders

532db60

working on shaders and graph

398b160

moved to full graph

002cca1

workin through ffn walk performance

7087fe3

chrishayuk added 25 commits April 18, 2026 20:08

working version

b81fac8

modulrized shaders

9d340b6

working on decoupling decode

604e685

working on performance

ad23882

more performance improvements

b4fd32a

improving performance

42157f1

more performance improvments

668f664

working on performance

23a5122

working on distributed grid

14fd729

working on grid

617c68f

improving docs and moe

4aa7e02

working on moe

3d7c113

improved publish pull

68aeb1f

binary format

8193f02

working binary format and performance

6fb231d

updated vindex server specs for binary

28717b6

improved lm_head

1f75d52

improved prefill

d2ee0d2

improved lm head

251fb08

gemma 4 vindex

833c124

working on gemma 4 moe

a7d6cc3

working on cleanup for merge

9adcb7c

fixed issue with select

0c2e680

residual stream

7de5208

working on benchmarks

8896013

chrishayuk merged commit 5b283a9 into main Apr 21, 2026

mikeumus mentioned this pull request Apr 22, 2026

Sync with upstream Architecture B (chrishayuk/larql#30) Divinci-AI/larql#13

Merged

4 tasks

chrishayuk deleted the architecture-b branch April 24, 2026 16:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture b#30

Architecture b#30
chrishayuk merged 55 commits intomainfrom
architecture-b

chrishayuk commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chrishayuk commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant