Skip to content

v2.8.21

Choose a tag to compare

@Pantelis23 Pantelis23 released this 29 Apr 03:09

Three IR codegen wins compounding to a 5.2% bootstrap shrink (1.24 MB → 1.18 MB) and 30% sort runtime drop (153 → 108 ms — krc now beats gcc -O2 on bubble-sort by 2.5×).

What changed

  1. 6th register colour (rbp). Graph-colouring regalloc gained one more callee-saved register, dropping spill rate compiler-wide. rbp had been left out historically; the lz4 / fat-archive paths surfaced an off-by-one in stack-arg overflow loads — replaced `ir_frame_size + 48` (hardcoded "5 pushes + ret addr") with `ir_frame_size + ir_callee_save_bytes + 8`.

  2. Per-function used-callee-save prologue. Functions push only the colours regalloc actually assigned. fib's prologue dropped from 5 pushes to 3; leaf-ish helpers often drop to 0-1. Variable alignment math (push_count parity decides frame_size +8) keeps SP 16-aligned at every CALL.

  3. Cross-register spill-reload peephole. `store rax,V; load rcx,V` (different reg) now emits `mov rcx, rax` instead of a stack roundtrip. Catches matmul-style intermediate-vreg flows through different scratch regs.

Runtime delta (Ryzen 9 7900X)

bench v2.8.20 v2.8.21 gcc -O2 krc Δ
fib 442 ms 427 ms 78 ms -3%
sort 153 ms 108 ms 270 ms -30%, 2.5× ahead of gcc -O2
sieve 3 ms 3 ms 2 ms tied
matmul 34 ms 33 ms 4 ms -3%

Verified: bootstrap fixed point at 1,176,168 bytes; 439/439 tests pass.

Next on the optimization roadmap

  • matmul still 8× behind gcc -O2 — needs loop strength reduction to remove per-iter address recomputation (~10 of ~16 inner-loop insns are `(i*N+j)*8` calculations the compiler should hoist as a running pointer).
  • fib still 5× behind — gcc -O2 inlines fib 4-5 levels deep before materialising leaves as real `call`. Recursive inlining at the IR level.

Full Changelog: v2.8.20...v2.8.21

Full Changelog: v2.8.20...v2.8.21