Conversation
… nbody correctness
Benchmark Results (Linux x86-64)
CLI Tool Benchmarks
|
Correction on the performance numbersI should have been clearer in the PR body: the 7.7× matmul speedup table was measured on Apple Silicon (M-series) with The correctness fix is universal and lands on both architectures:
The performance improvement, however, is architecture-dependent:
The CI's auto-posted "Benchmark Results (Linux x86-64)" comment on this PR showing matmul ~0.992s is therefore correct for what it's measuring (generic x86-64). The 7.7× number in my PR body was Apple-Silicon-specific and I should have labeled it as such. Follow-up: to get the speedup on Linux x86-64 as well, the target CPU in Sorry for the misleading framing. |
Summary
Numeric literals written with a decimal point or exponent (
0.0,1.5,3e10) were being classified as integer literals by the narrowing analyzer. The check usedvalue % 1 === 0, which returns true for mathematically-integer values regardless of source form — so0.0(value0) was indistinguishable from0.For a function-scoped
let sum = 0.0, this causedsumto be allocated asi64instead ofdouble. The inner loop of a numeric accumulator then emitted:every iteration, which simultaneously (a) silently truncated the fractional part of every partial result and (b) blocked loop vectorization of the reduction (loop-carried dep through the fptosi).
Minimal repro
Before:
0After:
1(well,0.9999999999999999, as expected from binary float representation of 0.1)Explicit
let s: number = 0.0was also incorrectly narrowed — the annotation was ignored.Fix
Added an optional
isFloat?: booleanfield toNumberNode, set at parse time based on the presence of./e/Ein the literal's raw source text. The two parsers (TS-based and tree-sitter-based native) populate the flag; four narrowing/codegen sites gate integer classification onisFloat !== true.Files:
src/ast/types.ts—NumberNode.isFloat?: booleansrc/parser-ts/handlers/expressions.ts—transformNumericLiteralsetsisFloatfromnode.getText()src/parser-native/transformer.ts— newtransformNumberNodehelper, stays self-hosting-compatible (no Set, no for...of, no regex — plainindexOfon.,e,E)src/codegen/infrastructure/integer-analysis.ts—isIntegerLiteralgates onisFloat !== truesrc/codegen/llvm-generator.ts— i64 literal emission path gates onisFloat !== truesrc/codegen/expressions/operators/binary.ts—isKnownIntegergates onisFloat !== truesrc/codegen/expressions/literals.ts—generateNumbernow takesisFloatparamsrc/codegen/expressions/expression-dispatch.ts— threadsisFloatthrough togenerateNumberCorrectness fixes
Both of these were silent bugs that produced compiling, running, wrong output:
matmul (
benchmarks/matmul/chadscript.ts)Check: 44634215Check: 44634424.32The 512×512 matmul inner loop's
sum += a[row*N+k] * b[k*N+col]was losing the.32fractional tail 512 times per cell, accumulated error ~209.nbody (
benchmarks/nbody/chadscript.ts)Energy: 0(both before and after the 25M-step simulation)Energy: -0.169065129117806(initial) /Energy: -0.169046651628287(after 25M steps)The
energy()function starts withlet e = 0.0and accumulates kinetic + potential terms. Before this fix,ewas typedi64and every fadd result was truncated to0through fptosi. The benchmark was emitting zero for both the initial and final energy, effectively measuring nothing. Energy conservation after the fix is ~0.00002, consistent with a 25M-step double-precision leapfrog integrator.Measurements
Apple Silicon M-series, macOS ARM64, best of 5 runs, both sides built from the same worktree with identical linked libraries (vendor/bdwgc/libgc.a, c_bridges/*.o) — only the compiler source differs.
matmul is the only substantial speedup — the fptosi/sitofp round-trip removal unlocks LLVM's NEON vectorizer on the inner reduction. The other benchmarks are unchanged because their hot loops don't involve a function-scoped float accumulator.
Tests
npm test— 774/774 pass, 0 failures, 37 suites../e/E), and only literals written with.or exponent are newly classified as float.IR check
Pre-fix IR for the
sumFloatsloop:Post-fix:
No more round-trip through i64. The loop body is now a pure-double reduction, and LLVM's LoopVectorizer can reorder the accumulator (the fast-math
reassocflag is already present).Risk
Low. The change is strict-additive at the classification level (new
isFloat === trueshort-circuits the existing "is integer?" check, never widens integer to float). Parser changes stay self-hosting-compatible. 774/774 tests pass with no updates.