perf: i64 ABI specialization for integer-pure numeric functions by cs01 · Pull Request #501 · cs01/ChadScript

cs01 · 2026-04-13T16:22:08Z

Summary

Adds a whole-module analysis pass that detects integer-pure numeric functions — functions where every parameter is integer-valued, every return value is integer-valued, every intermediate expression is integer-preserving, and the function is never used as a first-class value — and specializes them to an i64 ABI instead of the default double ABI.

The canonical motivating case is naive-recursive fib:

function fib(n: number): number {
  if (n <= 1) return n;
  return fib(n - 1) + fib(n - 2);
}

Before this PR, chad correctly narrowed the body operations to sub i64 / icmp sle i64 (the existing integer-analysis pass), but the function signature stayed define double @_cs_fib(double %arg0). Every recursive call paid fptosi double→i64 on entry + sitofp i64→double on arg prep, and the combine was fadd double %a, %b (3-cycle latency) instead of add i64 %a, %b (1-cycle). The whole recursion chain serialized on the fadd latency wall.

After this PR, fib compiles to:

define i64 @_cs_fib(i64 %arg0) {
entry:
  %0 = alloca i64
  store i64 %arg0, i64* %0
  %2 = load i64, i64* %0
  %4 = icmp sle i64 %2, 1
  ...
  %9 = sub i64 %7, 1
  %11 = call i64 @_cs_fib(i64 %9)
  %14 = sub i64 %12, 2
  %16 = call i64 @_cs_fib(i64 %14)
  %17 = add i64 %11, %16
  ret i64 %17
}

No float ops, no round-trip conversions, 1-cycle integer arithmetic the whole way down.

How it works

New file src/codegen/infrastructure/int-specialization-detector.ts (~540 lines) runs markIntSpecializedFunctions(ast) before codegen. A function is eligible iff:

All parameters are declared number (or untyped — chad defaults numeric params to number).
Return type is number or unspecified.
No optional params, no defaults, no async, no declare.
Body has no try / throw / await / for…of / switch (keeps the analysis tractable and avoids edge cases where value type could change inside a handler).
Body calls no other function except itself recursively. No method calls. No calls to stdlib. This is the conservative "leaf or self-recursive" restriction — a later PR can widen it to "calls other intSpecialized functions too."
Every param proves integer-valued via the existing findI64EligibleVariables analysis (started from an integer literal, mutated only through ++/-- or integer-preserving binary ops).
Every return statement produces an integer-shaped expression — integer literal, integer-eligible local read, or a binary op of integer-shaped operands where the op is +/-/*/%/bitwise. Division (/) is explicitly excluded because TS number division can produce non-integers.
The function name does not appear as a first-class value anywhere in the program (see "Escape analysis" below).

When all 8 conditions hold, func.intSpecialized = true is set, and the function-generator at src/codegen/infrastructure/function-generator.ts picks that up to emit the i64 ABI: every double param becomes i64, the return type becomes i64, and the entry-block fptosi %arg → i64 on numeric params is skipped (since the arg already arrives as i64).

Call-site lowering was updated at src/codegen/expressions/calls.ts so that when the callee is intSpecialized, numeric args are passed as i64 directly rather than fptosi'd at the call site. This closes the loop — calls to intSpecialized functions use the i64 ABI end-to-end.

Escape analysis (the critical correctness guard)

The first draft of this pass (without escape analysis) silently miscompiled programs like:

function add(acc: number, x: number): number { return acc + x; }
const sum = [1, 2, 3, 4, 5].reduce(add, 0);  // → returned garbage

add looks eligible by every local criterion (pure integer, no foreign calls, integer-shaped return), so the detector marked it specialized. But reduce(add, 0) passes add as a callback — the runtime reduce stored it as a function pointer with the canonical double(double, double) signature and called it with double arguments. Those double bits got reinterpreted as i64 when the specialized add body read them, producing garbage.

The fix: a whole-AST walker (collectEscapedFunctionNames) that scans every expression position in every function body, every class method body, every top-level statement, and every top-level expression, collecting all VariableNode names. Any top-level function whose name appears in that set is "escaped" and gets excluded from specialization. The walker also treats MethodCallNode.method as an escape reference — chad's method-call lowering falls back to calling a top-level function when the receiver has no matching class method, so obj.add(5, 7) (where obj is an object literal and add is a top-level function) goes through the canonical double ABI and must not be specialized.

The walker handles every expression type in chad's AST: variable, call, method_call, new, binary, unary, member_access, index_access, array, object, map, set, template_literal, conditional, await, member_access_assignment, index_access_assignment, type_assertion, spread_element, arrow_function (both expression-body and block-body shapes). Statement walker handles variable_declaration, assignment, return, if, while, do_while, for, for_of, throw, try, switch, block, and expression-as-statement.

The analysis is conservative — a local variable named the same as a top-level function triggers a false positive (the function won't be specialized even though the local shadows it). That's fine — specialization is an optimization, losing it on a corner case is strictly safer than miscompiling.

Correctness verification

Repro for the original callback bug (would miscompile pre-escape-analysis):

function add(acc: number, x: number): number { return acc + x; }
const nums: number[] = [1, 2, 3, 4, 5];
const sum = nums.reduce(add, 0);
console.log(sum);  // Must print 15

Confirmed: prints 15 after this PR (ran the existing tests/fixtures/arrays/array-reduce.ts fixture which exercises this exact shape through TEST_PASSED).

Repro for the method-call dispatch bug (fix/object-method in commit):

function add(a, b) { return a + b; }
function testMethod() {
  const obj = { add: 0 };
  return obj.add(5, 7);  // chad falls back to top-level `add`
}
process.exit(testMethod());  // Must exit 12

Confirmed: exits 12 after this PR.

Generated IR for fib:

define i64 @_cs_fib(i64 %arg0) { ... }   ; was: define double @_cs_fib(double %arg0)

Entry no longer has an fptosi on the param. Combine is add i64, not fadd double.

Generated IR for add when used as callback:

define double @_cs_add(double %arg0, double %arg1) { ... }   ; unchanged, escape caught

Measurements

Apple Silicon M-series, macOS ARM64, best of 3, chad built from this branch (rebased onto current origin/main which includes #499).

bench	baseline	this PR	delta
fibonacci	792 ms	509 ms	-36%, 1.56× faster
matmul	112 ms	111 ms	tie (within noise)
sieve	12 ms	12 ms	tie
sorting	140 ms	140 ms	tie
montecarlo	266 ms	265 ms	tie
nbody	827 ms	827 ms	tie
binarytrees	620 ms	620 ms	tie

Only fibonacci moves because it's the only benchmark with an integer-pure recursive numeric function that meets all 8 eligibility criteria. All other benchmarks are unchanged, including ones with integer-heavy code (sieve, sorting, montecarlo) — their hot-path functions either escape (passed as callbacks), call stdlib methods, or have non-integer intermediate expressions, so the detector correctly leaves them on the double ABI.

Architecture-independence: the fix replaces fadd double (2-3 cycle latency on both arm64 and x86-64) with add i64 (1 cycle on both), plus removes the per-call fptosi/sitofp conversions. There's no NEON/AVX2 dependence, so the same ~35-50% improvement should land on Linux x86-64 CI as well — I'll be watching the auto-posted benchmark comment to confirm.

Comparison against C and Go on Apple Silicon

After both this PR and #499 (float-literal narrowing), chad's numbers against C (clang -O2 -march=native) and Go (1.26, native):

bench	C (ms)	chad (ms)	Go (ms)	chad vs C	chad vs Go
binarytrees	848	620	814	0.73	24% faster
fibonacci	433	509	579	1.17	12% faster
montecarlo	266	265	256	1.00	tied
nbody	777	827	788	1.06	tied (+5%)
matmul	102	111	103	1.09	tied (+8%)
sorting	122	140	125	1.15	tied (+12%)
sieve	8	12	11	1.51	tied

Chad now beats Go on fibonacci and binarytrees, ties C on montecarlo, and is within 10% of C on matmul, nbody, montecarlo. The fib win over Go is what this PR unlocks.

Tests

npm test — 774/774 pass, 0 failures, 37 suites, 90.9s duration.
Updated one assertion in tests/compiler.test.ts:152 that was checking for define double @_cs_add to also accept define i64 @_cs_add — the test was written pre-specialization and the fixture simple-add.js (function add(a, b) { return a + b; } + process.exit(add(5, 7))) is exactly the shape that gets specialized. The test author had already defensively handled the i64 case for the add instruction assertion later in the same test, just missed this one.

No other test changes. Every fixture that produces TEST_PASSED or a specific exit code still does.

Scope / follow-ups

Not in this PR:

Specialization across modules (requires cross-module signature propagation).
Functions that call other intSpecialized functions (currently rejected as "foreign call"). This would widen coverage meaningfully once added.
Class methods (only top-level functions are detected).
Functions with for…of / switch / try (rejected by the body-shape gate). Most of these are real limitations that can be relaxed later with more careful analysis.
Mixed-mode specialization: a function that sometimes returns an integer and sometimes a float (currently rejected).

These are all strict widenings of the current pass that won't affect already-specialized functions, and can be done in follow-up PRs driven by specific use cases.

Risk: the primary risk is undetected escapes. The walker handles all 21 expression types and 14 statement types in the AST; if a new AST node is added and the walker isn't updated, the new node's children won't be scanned for escapes and a function could be wrongly specialized. Mitigation: the walker's fallback for expression-as-statement dispatches to collectEscapedVarRefsExpr, so new expression types will at least get the full expression walker. New statement types need to be added to collectEscapedVarRefsStmts explicitly.

… fib

github-actions · 2026-04-13T16:25:35Z

Benchmark Results (Linux x86-64)

Benchmark	C	ChadScript	Go	Node	Bun	Place
Binary Trees	1.575s	1.267s	2.763s	1.189s	0.969s	🥉
Cold Start	1.0ms	0.8ms	1.2ms	28.7ms	9.8ms	🥇
Fibonacci	0.815s	0.815s	1.562s	3.203s	2.018s	🥇
File I/O	0.118s	0.092s	0.084s	0.199s	0.181s	🥈
JSON Parse/Stringify	0.004s	0.005s	0.018s	0.015s	0.007s	🥈
Matrix Multiply	0.449s	0.999s	0.638s	0.379s	0.335s	#5
Monte Carlo Pi	0.389s	0.410s	0.405s	2.248s	6.068s	🥉
N-Body Simulation	1.668s	2.126s	2.206s	2.391s	3.265s	🥈
Quicksort	0.215s	0.245s	0.213s	0.262s	0.228s	#4
SQLite	0.348s	0.400s	—	0.444s	0.400s	🥈
Sieve of Eratosthenes	0.016s	0.029s	0.018s	0.040s	0.037s	🥉
String Manipulation	0.008s	0.046s	0.016s	0.035s	0.028s	#5

CLI Tool Benchmarks

Benchmark	ChadScript	grep	node	xxd	Place
Hex Dump	0.437s	—	0.995s	0.134s	🥈
Recursive Grep	0.019s	0.010s	0.097s	—	🥈

perf: i64 abi specialization for integer-pure numeric functions, 1.5x…

2d41577

… fib

cs01 merged commit eb1b25f into main Apr 13, 2026
13 checks passed

cs01 deleted the feat/int-specialization branch April 13, 2026 16:48

cs01 mentioned this pull request Apr 13, 2026

perf: benchmark infrastructure — N=10 sampling, 95% bootstrap CI, Apple Silicon baseline #503

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: i64 ABI specialization for integer-pure numeric functions#501

perf: i64 ABI specialization for integer-pure numeric functions#501
cs01 merged 1 commit intomainfrom
feat/int-specialization

cs01 commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cs01 commented Apr 13, 2026

Summary

How it works

Escape analysis (the critical correctness guard)

Correctness verification

Measurements

Comparison against C and Go on Apple Silicon

Tests

Scope / follow-ups

Uh oh!

github-actions bot commented Apr 13, 2026

Benchmark Results (Linux x86-64)

CLI Tool Benchmarks

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant