Skip to content

fix float-literal typing: 0.0 as double not integer, fixes matmul and nbody correctness#499

Merged
cs01 merged 1 commit intomainfrom
fix/float-literal-narrowing
Apr 13, 2026
Merged

fix float-literal typing: 0.0 as double not integer, fixes matmul and nbody correctness#499
cs01 merged 1 commit intomainfrom
fix/float-literal-narrowing

Conversation

@cs01
Copy link
Copy Markdown
Owner

@cs01 cs01 commented Apr 13, 2026

Summary

Numeric literals written with a decimal point or exponent (0.0, 1.5, 3e10) were being classified as integer literals by the narrowing analyzer. The check used value % 1 === 0, which returns true for mathematically-integer values regardless of source form — so 0.0 (value 0) was indistinguishable from 0.

For a function-scoped let sum = 0.0, this caused sum to be allocated as i64 instead of double. The inner loop of a numeric accumulator then emitted:

%loaded  = load i64, i64* %sum
%as_d    = sitofp i64 %loaded to double
%updated = fadd double %as_d, %product
%back    = fptosi double %updated to i64  ; TRUNCATES fractional part
store i64 %back, i64* %sum

every iteration, which simultaneously (a) silently truncated the fractional part of every partial result and (b) blocked loop vectorization of the reduction (loop-carried dep through the fptosi).

Minimal repro

function sumFloats(): number {
  let s = 0.0;
  let i = 0;
  while (i < 10) { s = s + 0.1; i = i + 1; }
  return s;
}
console.log(sumFloats());

Before: 0
After: 1 (well, 0.9999999999999999, as expected from binary float representation of 0.1)

Explicit let s: number = 0.0 was also incorrectly narrowed — the annotation was ignored.

Fix

Added an optional isFloat?: boolean field to NumberNode, set at parse time based on the presence of . / e / E in the literal's raw source text. The two parsers (TS-based and tree-sitter-based native) populate the flag; four narrowing/codegen sites gate integer classification on isFloat !== true.

Files:

  • src/ast/types.tsNumberNode.isFloat?: boolean
  • src/parser-ts/handlers/expressions.tstransformNumericLiteral sets isFloat from node.getText()
  • src/parser-native/transformer.ts — new transformNumberNode helper, stays self-hosting-compatible (no Set, no for...of, no regex — plain indexOf on ., e, E)
  • src/codegen/infrastructure/integer-analysis.tsisIntegerLiteral gates on isFloat !== true
  • src/codegen/llvm-generator.ts — i64 literal emission path gates on isFloat !== true
  • src/codegen/expressions/operators/binary.tsisKnownInteger gates on isFloat !== true
  • src/codegen/expressions/literals.tsgenerateNumber now takes isFloat param
  • src/codegen/expressions/expression-dispatch.ts — threads isFloat through to generateNumber

Correctness fixes

Both of these were silent bugs that produced compiling, running, wrong output:

matmul (benchmarks/matmul/chadscript.ts)

  • Before: Check: 44634215
  • After: Check: 44634424.32

The 512×512 matmul inner loop's sum += a[row*N+k] * b[k*N+col] was losing the .32 fractional tail 512 times per cell, accumulated error ~209.

nbody (benchmarks/nbody/chadscript.ts)

  • Before: Energy: 0 (both before and after the 25M-step simulation)
  • After: Energy: -0.169065129117806 (initial) / Energy: -0.169046651628287 (after 25M steps)

The energy() function starts with let e = 0.0 and accumulates kinetic + potential terms. Before this fix, e was typed i64 and every fadd result was truncated to 0 through fptosi. The benchmark was emitting zero for both the initial and final energy, effectively measuring nothing. Energy conservation after the fix is ~0.00002, consistent with a 25M-step double-precision leapfrog integrator.

Measurements

Apple Silicon M-series, macOS ARM64, best of 5 runs, both sides built from the same worktree with identical linked libraries (vendor/bdwgc/libgc.a, c_bridges/*.o) — only the compiler source differs.

bench baseline fix delta
matmul 834 ms 108 ms -87%, 7.7× faster
fibonacci 792 ms 797 ms ~tie (+0.6%)
sieve 11 ms 12 ms ~tie
sorting 136 ms 135 ms ~tie
montecarlo 266 ms 266 ms tie
nbody 817 ms 819 ms ~tie
binarytrees 615 ms 614 ms tie

matmul is the only substantial speedup — the fptosi/sitofp round-trip removal unlocks LLVM's NEON vectorizer on the inner reduction. The other benchmarks are unchanged because their hot loops don't involve a function-scoped float accumulator.

Tests

  • npm test774/774 pass, 0 failures, 37 suites.
  • No test had to be updated. The narrowing change is strict-additive: literals that used to be classified integer are still classified integer (because they have no . / e / E), and only literals written with . or exponent are newly classified as float.

IR check

Pre-fix IR for the sumFloats loop:

%s.addr.0 = alloca i64
store i64 0, i64* %s.addr.0
...
%loaded   = load i64, i64* %s.addr.0
%as_d     = sitofp i64 %loaded to double
%updated  = fadd nnan ninf nsz arcp contract reassoc afn double %as_d, 0x3FB999999999999A
%back     = fptosi double %updated to i64
store i64 %back, i64* %s.addr.0

Post-fix:

%s.addr.0 = alloca double
store double 0.0, double* %s.addr.0
...
%loaded   = load double, double* %s.addr.0
%updated  = fadd nnan ninf nsz arcp contract reassoc afn double %loaded, 0x3FB999999999999A
store double %updated, double* %s.addr.0

No more round-trip through i64. The loop body is now a pure-double reduction, and LLVM's LoopVectorizer can reorder the accumulator (the fast-math reassoc flag is already present).

Risk

Low. The change is strict-additive at the classification level (new isFloat === true short-circuits the existing "is integer?" check, never widens integer to float). Parser changes stay self-hosting-compatible. 774/774 tests pass with no updates.

@github-actions
Copy link
Copy Markdown
Contributor

Benchmark Results (Linux x86-64)

Benchmark C ChadScript Go Node Bun Place
Binary Trees 1.357s 1.210s 2.742s 1.167s 0.981s 🥉
Cold Start 0.9ms 0.9ms 1.2ms 28.1ms 10.0ms 🥇
Fibonacci 0.814s 1.468s 1.563s 3.164s 1.992s 🥈
File I/O 0.118s 0.089s 0.086s 0.199s 0.172s 🥈
JSON Parse/Stringify 0.004s 0.005s 0.017s 0.016s 0.007s 🥈
Matrix Multiply 0.442s 0.992s 0.611s 0.356s 0.325s #5
Monte Carlo Pi 0.389s 0.410s 0.406s 2.250s 6.052s 🥉
N-Body Simulation 1.676s 2.122s 2.204s 2.404s 3.259s 🥈
Quicksort 0.215s 0.244s 0.213s 0.261s 0.226s #4
SQLite 0.349s 0.402s 0.449s 0.400s 🥉
Sieve of Eratosthenes 0.015s 0.028s 0.020s 0.040s 0.036s 🥉
String Manipulation 0.008s 0.046s 0.016s 0.037s 0.028s #5

CLI Tool Benchmarks

Benchmark ChadScript grep node xxd Place
Hex Dump 0.425s 0.917s 0.130s 🥈
Recursive Grep 0.018s 0.009s 0.095s 🥈

@cs01 cs01 merged commit 0404da0 into main Apr 13, 2026
13 checks passed
@cs01 cs01 deleted the fix/float-literal-narrowing branch April 13, 2026 15:24
@cs01
Copy link
Copy Markdown
Owner Author

cs01 commented Apr 13, 2026

Correction on the performance numbers

I should have been clearer in the PR body: the 7.7× matmul speedup table was measured on Apple Silicon (M-series) with --target-cpu=native, not on the Linux x86-64 CI that drives the published dashboard.

The correctness fix is universal and lands on both architectures:

  • matmul Check: 44634215Check: 44634424.32 ✅ (Linux x86-64 CI confirms)
  • nbody Energy: 0, 0Energy: -0.169065..., -0.169046...

The performance improvement, however, is architecture-dependent:

  • Apple Silicon / --target-cpu=native: ~834ms → ~108ms (~7.7×). LLVM's NEON loop vectorizer splits the inner reduction into parallel accumulator lanes once the sitofp/fptosi round-trip is gone.
  • Linux x86-64 / --target-cpu=x86-64 (what CI uses): roughly unchanged at ~0.99s. The generic x86-64 target is baseline SSE2 only; LLVM's cost model at that level doesn't get the same vectorization lift, so the fix is "just" removing the per-iteration i64↔double round-trip — measurable but not the dramatic gain.

The CI's auto-posted "Benchmark Results (Linux x86-64)" comment on this PR showing matmul ~0.992s is therefore correct for what it's measuring (generic x86-64). The 7.7× number in my PR body was Apple-Silicon-specific and I should have labeled it as such.

Follow-up: to get the speedup on Linux x86-64 as well, the target CPU in .github/workflows/ci.yml:62 (--target-cpu=x86-64) probably wants to be x86-64-v3 (enables AVX2) or the Linux runner's actual CPU. That's a separate PR — wanted to land the correctness fix first.

Sorry for the misleading framing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant