perf: i64 ABI specialization for integer-pure numeric functions#501
Merged
perf: i64 ABI specialization for integer-pure numeric functions#501
Conversation
Contributor
Benchmark Results (Linux x86-64)
CLI Tool Benchmarks
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a whole-module analysis pass that detects integer-pure numeric functions — functions where every parameter is integer-valued, every return value is integer-valued, every intermediate expression is integer-preserving, and the function is never used as a first-class value — and specializes them to an i64 ABI instead of the default double ABI.
The canonical motivating case is naive-recursive
fib:Before this PR, chad correctly narrowed the body operations to
sub i64/icmp sle i64(the existing integer-analysis pass), but the function signature stayeddefine double @_cs_fib(double %arg0). Every recursive call paidfptosi double→i64on entry +sitofp i64→doubleon arg prep, and the combine wasfadd double %a, %b(3-cycle latency) instead ofadd i64 %a, %b(1-cycle). The whole recursion chain serialized on thefaddlatency wall.After this PR, fib compiles to:
No float ops, no round-trip conversions, 1-cycle integer arithmetic the whole way down.
How it works
New file
src/codegen/infrastructure/int-specialization-detector.ts(~540 lines) runsmarkIntSpecializedFunctions(ast)before codegen. A function is eligible iff:number(or untyped — chad defaults numeric params tonumber).numberor unspecified.async, nodeclare.try/throw/await/for…of/switch(keeps the analysis tractable and avoids edge cases where value type could change inside a handler).findI64EligibleVariablesanalysis (started from an integer literal, mutated only through++/--or integer-preserving binary ops).+/-/*/%/bitwise. Division (/) is explicitly excluded because TSnumberdivision can produce non-integers.When all 8 conditions hold,
func.intSpecialized = trueis set, and the function-generator atsrc/codegen/infrastructure/function-generator.tspicks that up to emit the i64 ABI: everydoubleparam becomesi64, the return type becomesi64, and the entry-blockfptosi %arg → i64on numeric params is skipped (since the arg already arrives as i64).Call-site lowering was updated at
src/codegen/expressions/calls.tsso that when the callee is intSpecialized, numeric args are passed as i64 directly rather than fptosi'd at the call site. This closes the loop — calls to intSpecialized functions use the i64 ABI end-to-end.Escape analysis (the critical correctness guard)
The first draft of this pass (without escape analysis) silently miscompiled programs like:
addlooks eligible by every local criterion (pure integer, no foreign calls, integer-shaped return), so the detector marked it specialized. Butreduce(add, 0)passesaddas a callback — the runtimereducestored it as a function pointer with the canonicaldouble(double, double)signature and called it with double arguments. Those double bits got reinterpreted as i64 when the specializedaddbody read them, producing garbage.The fix: a whole-AST walker (
collectEscapedFunctionNames) that scans every expression position in every function body, every class method body, every top-level statement, and every top-level expression, collecting allVariableNodenames. Any top-level function whose name appears in that set is "escaped" and gets excluded from specialization. The walker also treatsMethodCallNode.methodas an escape reference — chad's method-call lowering falls back to calling a top-level function when the receiver has no matching class method, soobj.add(5, 7)(whereobjis an object literal andaddis a top-level function) goes through the canonical double ABI and must not be specialized.The walker handles every expression type in chad's AST:
variable,call,method_call,new,binary,unary,member_access,index_access,array,object,map,set,template_literal,conditional,await,member_access_assignment,index_access_assignment,type_assertion,spread_element,arrow_function(both expression-body and block-body shapes). Statement walker handlesvariable_declaration,assignment,return,if,while,do_while,for,for_of,throw,try,switch,block, and expression-as-statement.The analysis is conservative — a local variable named the same as a top-level function triggers a false positive (the function won't be specialized even though the local shadows it). That's fine — specialization is an optimization, losing it on a corner case is strictly safer than miscompiling.
Correctness verification
Repro for the original callback bug (would miscompile pre-escape-analysis):
Confirmed: prints
15after this PR (ran the existingtests/fixtures/arrays/array-reduce.tsfixture which exercises this exact shape throughTEST_PASSED).Repro for the method-call dispatch bug (fix/object-method in commit):
Confirmed: exits 12 after this PR.
Generated IR for fib:
Entry no longer has an
fptosion the param. Combine isadd i64, notfadd double.Generated IR for
addwhen used as callback:Measurements
Apple Silicon M-series, macOS ARM64, best of 3, chad built from this branch (rebased onto current
origin/mainwhich includes #499).Only fibonacci moves because it's the only benchmark with an integer-pure recursive numeric function that meets all 8 eligibility criteria. All other benchmarks are unchanged, including ones with integer-heavy code (sieve, sorting, montecarlo) — their hot-path functions either escape (passed as callbacks), call stdlib methods, or have non-integer intermediate expressions, so the detector correctly leaves them on the double ABI.
Architecture-independence: the fix replaces
fadd double(2-3 cycle latency on both arm64 and x86-64) withadd i64(1 cycle on both), plus removes the per-callfptosi/sitofpconversions. There's no NEON/AVX2 dependence, so the same ~35-50% improvement should land on Linux x86-64 CI as well — I'll be watching the auto-posted benchmark comment to confirm.Comparison against C and Go on Apple Silicon
After both this PR and #499 (float-literal narrowing), chad's numbers against C (clang -O2 -march=native) and Go (1.26, native):
Chad now beats Go on fibonacci and binarytrees, ties C on montecarlo, and is within 10% of C on matmul, nbody, montecarlo. The fib win over Go is what this PR unlocks.
Tests
npm test— 774/774 pass, 0 failures, 37 suites, 90.9s duration.tests/compiler.test.ts:152that was checking fordefine double @_cs_addto also acceptdefine i64 @_cs_add— the test was written pre-specialization and the fixturesimple-add.js(function add(a, b) { return a + b; }+process.exit(add(5, 7))) is exactly the shape that gets specialized. The test author had already defensively handled the i64 case for theaddinstruction assertion later in the same test, just missed this one.No other test changes. Every fixture that produces
TEST_PASSEDor a specific exit code still does.Scope / follow-ups
Not in this PR:
for…of/switch/try(rejected by the body-shape gate). Most of these are real limitations that can be relaxed later with more careful analysis.These are all strict widenings of the current pass that won't affect already-specialized functions, and can be done in follow-up PRs driven by specific use cases.
Risk: the primary risk is undetected escapes. The walker handles all 21 expression types and 14 statement types in the AST; if a new AST node is added and the walker isn't updated, the new node's children won't be scanned for escapes and a function could be wrongly specialized. Mitigation: the walker's fallback for expression-as-statement dispatches to
collectEscapedVarRefsExpr, so new expression types will at least get the full expression walker. New statement types need to be added tocollectEscapedVarRefsStmtsexplicitly.