kakeyaturbo: monomorphic Rust implementation of the KakeyaTurbo RDO codec with ≥99.76% UT coverage#4
Merged
cursor[bot] merged 2 commits intomainfrom Apr 18, 2026
Conversation
First three modules of the Rust monomorphic kernel for KakeyaTurbo: - distortion: trait + zero-sized MSE / InnerProduct / LInf types. All methods are #[inline(always)] so R::d is inlined to raw arithmetic at each call site, eliminating any runtime dispatch. NormMode is a compile-time constant per metric. - wht: Walsh-Hadamard transform + deterministic sign-flip pattern derived from a u32 seed. Rotates input to Gaussianise residuals; inverse is a single WHT with the same seed. - quantize: Lloyd-Max codebooks for N(0,1) at 1..=4 bits, generic nearest-centroid quantiser parametrised by Distortion, and a bit-packing layer (LSB-first) for the 1..=8 bit range. All three modules ship with dense unit tests: correctness of basic math, boundary / panic cases, round-trips, and monomorphisation contract (zero-size types, no dyn). 59 tests, all passing. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
Full monomorphic-kernel implementation of KakeyaTurbo following the
RDO unification design from the previous conversation rounds.
New modules:
- pca.rs: weighted PCA truncated by variance_ratio (nalgebra SVD)
- kmeans.rs: weighted spherical K-means with farthest-first init,
sign-aware update (supports anti-aligned rows)
- skeleton.rs: block-level metadata container
- codec.rs: the single encode_block / decode_block kernel,
parametrised by <R: Distortion> and runtime CodecParams
Tests:
- 125 unit tests covering every public function, panic path, and
numerical invariant
- 5 integration tests end-to-end (realistic synthetic blocks,
weight effects, inner-product preservation, multi-shape robustness)
- 6 property-based tests via proptest (shape invariants, determinism,
finiteness, uniform-scale invariance)
Test & coverage totals:
- 136 tests passing
- cargo llvm-cov: 99.76% line coverage (100% of production code;
the 2 uncovered lines are assertion-failure messages inside tests
that never fire), 100% function coverage
- cargo clippy: 0 errors, remaining warnings are cosmetic style
choices in numerical loops
- grep verifies no 'dyn', no 'Box<dyn>', no 'unsafe' in src/
Also removes 765 tracked build artefacts from target/ that were
accidentally committed in the previous stub commit.
Contract verified as grep-able: see src/lib.rs design-contract
section and the distortion_trait_is_object_unsafe_as_intended test.
Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
cursor Bot
pushed a commit
that referenced
this pull request
Apr 22, 2026
Buckets on the HF (+7.82%) vs vLLM (+35.33%) 27-pp gap: #1 Engine baseline shift ~10 pp (clean-model PPL disagreement; 0.145 KL; 18% top-1 disagreement) #2 Codec residual magnitude ~0 (codec is engine- agnostic; mse ratio 1.01) #3 Noise-sensitivity curve HF MORE sensitive per \u03c3 in linear regime; not the cause #4 Boundary layers already skipped +69 pp saved by SPRINT_CLOSEOUT boundary policy #5 Cross-layer non-linear compound +39 pp (joint-cell - \u03a3 singletons over 22 quiet layers) Localised root cause: vLLM's single-forward bf16 residual-stream accumulation through Flash-Attention compounds per-layer codec residuals ~39 pp above their sum, while HF eager's f32-accumulate + teacher-force over DynamicCache compounds them less aggressively. Each per-layer residual is small on both engines (Phase 4 matched); what differs is the accumulation path. Deployment recommendations: 1. Extend vLLM boundary skip to {2, 6, 11} on top of the existing {0,1,7,14,26,27}; cuts ~10-15 pp off the joint Delta-ppl. 2. Adaptive per-layer bit-width: K b=4 on the hot layers, b=3 elsewhere; preserves 19/28 of the ratio benefit. Phase 3 ran only on vLLM (reused production harness); the HF per- layer curve is left as a follow-up if someone wants to confirm that HF's cross-layer interaction is the ~+10 pp we infer here. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Rust crate at
kakeyaturbo/implementing the KakeyaTurbo codec designed in the earlier conversation rounds: a single monomorphicencode_block/decode_blockkernel parametrised byR: Distortion(the lossρ)&[f32]weights arraywCodecParamsstruct (variance_ratio, K, bit_width, rotation_seed)No plugins, no
dyn, no extension points. Every(R, n, d, d_eff, K, B)combination compiles to its own specialised machine-code function.Modules
src/lib.rssrc/distortion.rsDistortiontrait +MSE,InnerProduct,LInfZSTssrc/wht.rssrc/quantize.rssrc/pca.rsd_effsrc/kmeans.rssrc/skeleton.rssrc/codec.rsencode_block/decode_blockkernelQuality gates
cargo buildcargo testcargo clippy --all-targetscargo llvm-covline coveragecargo llvm-covfunction coverage#![forbid(unsafe_code)]grep -rn "dyn " src/grep -rn "Box<" src/Per-module coverage (
cargo llvm-cov --summary-only)The only uncovered source lines are 2
assert!(cond, "...")format-string messages inside test code (by definition unreachable while tests pass):src/pca.rs:338—"captured variance not monotone: prev={prev} new={}"src/quantize.rs:228—"MSE must decrease with more bits: prev={prev} new={mse}..."Together with continuation lines, these account for the 4 uncovered line slots. Every line of production code is covered.
Test inventory
Unit tests (125)
NORM_MODEcorrectness, zero-size verification.H₂/H₄comparison,WHT²=N·I, power-of-2 panic paths, deterministic sign patterns, seed-determinism, zero-input, linearity, rotate/inverse round-trip, seed mismatch detection.k = nidentity, unit-norm centres, seed determinism, two-cluster recovery, zero-norm row skip, zero-weight-row skip, misshaped input panics, assign_and_project correctness (including anti-aligned input), residual subtraction, centre view helpers.next_pow2/pad_zero/l2_normhelpers.Integration tests (5)
Real end-to-end blocks: 64×32 MSE reconstruction, compression-ratio bound, weighted-row priority, inner-product preservation, multi-shape robustness.
Property tests (6, via
proptest)Random blocks + weights: shape invariance, determinism, finite reconstruction, all-bit-width handling, all-K handling, uniform-weight-scale invariance.
Design contract verification
The philosophy "one kernel, one code path, no dispatch" is grep-verifiable:
The design-contract block in
src/lib.rsmakes this explicit, and#![forbid(unsafe_code)]at the crate root compiles unsafe out.Example
Repro
Environment
Rust 1.83.0 stable. Dependencies:
nalgebra 0.33(for weighted SVD),half 2.4(fp16 scalars in Codes),rand 0.8(SmallRng seeded sign patterns). Dev deps:approx 0.5,proptest 1.5.0.