fix float-literal typing: 0.0 as double not integer, fixes matmul and nbody correctness by cs01 · Pull Request #499 · cs01/ChadScript

cs01 · 2026-04-13T15:01:12Z

Summary

Numeric literals written with a decimal point or exponent (0.0, 1.5, 3e10) were being classified as integer literals by the narrowing analyzer. The check used value % 1 === 0, which returns true for mathematically-integer values regardless of source form — so 0.0 (value 0) was indistinguishable from 0.

For a function-scoped let sum = 0.0, this caused sum to be allocated as i64 instead of double. The inner loop of a numeric accumulator then emitted:

%loaded  = load i64, i64* %sum
%as_d    = sitofp i64 %loaded to double
%updated = fadd double %as_d, %product
%back    = fptosi double %updated to i64  ; TRUNCATES fractional part
store i64 %back, i64* %sum

every iteration, which simultaneously (a) silently truncated the fractional part of every partial result and (b) blocked loop vectorization of the reduction (loop-carried dep through the fptosi).

Minimal repro

function sumFloats(): number {
  let s = 0.0;
  let i = 0;
  while (i < 10) { s = s + 0.1; i = i + 1; }
  return s;
}
console.log(sumFloats());

Before: 0
After: 1 (well, 0.9999999999999999, as expected from binary float representation of 0.1)

Explicit let s: number = 0.0 was also incorrectly narrowed — the annotation was ignored.

Fix

Added an optional isFloat?: boolean field to NumberNode, set at parse time based on the presence of . / e / E in the literal's raw source text. The two parsers (TS-based and tree-sitter-based native) populate the flag; four narrowing/codegen sites gate integer classification on isFloat !== true.

Files:

src/ast/types.ts — NumberNode.isFloat?: boolean
src/parser-ts/handlers/expressions.ts — transformNumericLiteral sets isFloat from node.getText()
src/parser-native/transformer.ts — new transformNumberNode helper, stays self-hosting-compatible (no Set, no for...of, no regex — plain indexOf on ., e, E)
src/codegen/infrastructure/integer-analysis.ts — isIntegerLiteral gates on isFloat !== true
src/codegen/llvm-generator.ts — i64 literal emission path gates on isFloat !== true
src/codegen/expressions/operators/binary.ts — isKnownInteger gates on isFloat !== true
src/codegen/expressions/literals.ts — generateNumber now takes isFloat param
src/codegen/expressions/expression-dispatch.ts — threads isFloat through to generateNumber

Correctness fixes

Both of these were silent bugs that produced compiling, running, wrong output:

matmul (benchmarks/matmul/chadscript.ts)

Before: Check: 44634215
After: Check: 44634424.32

The 512×512 matmul inner loop's sum += a[row*N+k] * b[k*N+col] was losing the .32 fractional tail 512 times per cell, accumulated error ~209.

nbody (benchmarks/nbody/chadscript.ts)

Before: Energy: 0 (both before and after the 25M-step simulation)
After: Energy: -0.169065129117806 (initial) / Energy: -0.169046651628287 (after 25M steps)

The energy() function starts with let e = 0.0 and accumulates kinetic + potential terms. Before this fix, e was typed i64 and every fadd result was truncated to 0 through fptosi. The benchmark was emitting zero for both the initial and final energy, effectively measuring nothing. Energy conservation after the fix is ~0.00002, consistent with a 25M-step double-precision leapfrog integrator.

Measurements

Apple Silicon M-series, macOS ARM64, best of 5 runs, both sides built from the same worktree with identical linked libraries (vendor/bdwgc/libgc.a, c_bridges/*.o) — only the compiler source differs.

bench	baseline	fix	delta
matmul	834 ms	108 ms	-87%, 7.7× faster
fibonacci	792 ms	797 ms	~tie (+0.6%)
sieve	11 ms	12 ms	~tie
sorting	136 ms	135 ms	~tie
montecarlo	266 ms	266 ms	tie
nbody	817 ms	819 ms	~tie
binarytrees	615 ms	614 ms	tie

matmul is the only substantial speedup — the fptosi/sitofp round-trip removal unlocks LLVM's NEON vectorizer on the inner reduction. The other benchmarks are unchanged because their hot loops don't involve a function-scoped float accumulator.

Tests

npm test — 774/774 pass, 0 failures, 37 suites.
No test had to be updated. The narrowing change is strict-additive: literals that used to be classified integer are still classified integer (because they have no . / e / E), and only literals written with . or exponent are newly classified as float.

IR check

Pre-fix IR for the sumFloats loop:

%s.addr.0 = alloca i64
store i64 0, i64* %s.addr.0
...
%loaded   = load i64, i64* %s.addr.0
%as_d     = sitofp i64 %loaded to double
%updated  = fadd nnan ninf nsz arcp contract reassoc afn double %as_d, 0x3FB999999999999A
%back     = fptosi double %updated to i64
store i64 %back, i64* %s.addr.0

Post-fix:

%s.addr.0 = alloca double
store double 0.0, double* %s.addr.0
...
%loaded   = load double, double* %s.addr.0
%updated  = fadd nnan ninf nsz arcp contract reassoc afn double %loaded, 0x3FB999999999999A
store double %updated, double* %s.addr.0

No more round-trip through i64. The loop body is now a pure-double reduction, and LLVM's LoopVectorizer can reorder the accumulator (the fast-math reassoc flag is already present).

Risk

Low. The change is strict-additive at the classification level (new isFloat === true short-circuits the existing "is integer?" check, never widens integer to float). Parser changes stay self-hosting-compatible. 774/774 tests pass with no updates.

… nbody correctness

github-actions · 2026-04-13T15:04:35Z

Benchmark Results (Linux x86-64)

Benchmark	C	ChadScript	Go	Node	Bun	Place
Binary Trees	1.357s	1.210s	2.742s	1.167s	0.981s	🥉
Cold Start	0.9ms	0.9ms	1.2ms	28.1ms	10.0ms	🥇
Fibonacci	0.814s	1.468s	1.563s	3.164s	1.992s	🥈
File I/O	0.118s	0.089s	0.086s	0.199s	0.172s	🥈
JSON Parse/Stringify	0.004s	0.005s	0.017s	0.016s	0.007s	🥈
Matrix Multiply	0.442s	0.992s	0.611s	0.356s	0.325s	#5
Monte Carlo Pi	0.389s	0.410s	0.406s	2.250s	6.052s	🥉
N-Body Simulation	1.676s	2.122s	2.204s	2.404s	3.259s	🥈
Quicksort	0.215s	0.244s	0.213s	0.261s	0.226s	#4
SQLite	0.349s	0.402s	—	0.449s	0.400s	🥉
Sieve of Eratosthenes	0.015s	0.028s	0.020s	0.040s	0.036s	🥉
String Manipulation	0.008s	0.046s	0.016s	0.037s	0.028s	#5

CLI Tool Benchmarks

Benchmark	ChadScript	grep	node	xxd	Place
Hex Dump	0.425s	—	0.917s	0.130s	🥈
Recursive Grep	0.018s	0.009s	0.095s	—	🥈

cs01 · 2026-04-13T16:05:36Z

Correction on the performance numbers

I should have been clearer in the PR body: the 7.7× matmul speedup table was measured on Apple Silicon (M-series) with --target-cpu=native, not on the Linux x86-64 CI that drives the published dashboard.

The correctness fix is universal and lands on both architectures:

matmul Check: 44634215 → Check: 44634424.32 ✅ (Linux x86-64 CI confirms)
nbody Energy: 0, 0 → Energy: -0.169065..., -0.169046... ✅

The performance improvement, however, is architecture-dependent:

Apple Silicon / --target-cpu=native: ~834ms → ~108ms (~7.7×). LLVM's NEON loop vectorizer splits the inner reduction into parallel accumulator lanes once the sitofp/fptosi round-trip is gone.
Linux x86-64 / --target-cpu=x86-64 (what CI uses): roughly unchanged at ~0.99s. The generic x86-64 target is baseline SSE2 only; LLVM's cost model at that level doesn't get the same vectorization lift, so the fix is "just" removing the per-iteration i64↔double round-trip — measurable but not the dramatic gain.

The CI's auto-posted "Benchmark Results (Linux x86-64)" comment on this PR showing matmul ~0.992s is therefore correct for what it's measuring (generic x86-64). The 7.7× number in my PR body was Apple-Silicon-specific and I should have labeled it as such.

Follow-up: to get the speedup on Linux x86-64 as well, the target CPU in .github/workflows/ci.yml:62 (--target-cpu=x86-64) probably wants to be x86-64-v3 (enables AVX2) or the Linux runner's actual CPU. That's a separate PR — wanted to land the correctness fix first.

Sorry for the misleading framing.

fix float-literal typing: 0.0 as double not integer, fixes matmul and…

1a197ae

… nbody correctness

cs01 merged commit 0404da0 into main Apr 13, 2026
13 checks passed

cs01 deleted the fix/float-literal-narrowing branch April 13, 2026 15:24

This was referenced Apr 13, 2026

perf: i64 ABI specialization for integer-pure numeric functions #501

Merged

perf: benchmark infrastructure — N=10 sampling, 95% bootstrap CI, Apple Silicon baseline #503

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix float-literal typing: 0.0 as double not integer, fixes matmul and nbody correctness#499

fix float-literal typing: 0.0 as double not integer, fixes matmul and nbody correctness#499
cs01 merged 1 commit intomainfrom
fix/float-literal-narrowing

cs01 commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

Uh oh!

cs01 commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cs01 commented Apr 13, 2026

Summary

Minimal repro

Fix

Correctness fixes

Measurements

Tests

IR check

Risk

Uh oh!

github-actions bot commented Apr 13, 2026

Benchmark Results (Linux x86-64)

CLI Tool Benchmarks

Uh oh!

Uh oh!

cs01 commented Apr 13, 2026

Correction on the performance numbers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant