Skip to content

infer: hoist per-module visibility + aliasCache (~19% faster compile)#2849

Merged
borisbat merged 3 commits into
masterfrom
bbatkin/infer-overload-hoist-aliascache
May 24, 2026
Merged

infer: hoist per-module visibility + aliasCache (~19% faster compile)#2849
borisbat merged 3 commits into
masterfrom
bbatkin/infer-overload-hoist-aliascache

Conversation

@borisbat
Copy link
Copy Markdown
Collaborator

Summary

Two independent infer-phase perf wins, identified via PerfView CPU sampling on a slow compile case (dasImgui imgui_demo/main.das, 13621 functions, ~30s compile-only).

Commit 1 — hoist per-module visibility out of overload resolution inner loops

In findFuncAddr / findMatchingFunctions / findMatchingFunctionsAndGenerics, the per-candidate isVisibleFunc(inWhichModule, getFunctionVisModule(pFn)) check was called once per function in mod->functionsByName[h]. Since pFn->module == mod by construction (Module::addFunction sets it), for non-generic candidates the visibility module is constant across the inner loop and the check is loop-invariant.

Hoist modVis = isVisibleFunc(inWhichModule, mod) and modVisFromThis = thisModule->isVisibleDirectly(mod) to the top of each non-empty bucket; per-candidate check now only fires for fromGeneric candidates (whose visibility module is pFn->getOrigin()->module).

5 call sites patched in ast_infer_type_function.cpp (findFuncAddr, two findMatchingFunctions overloads, two findMatchingFunctionsAndGenerics overloads).

Commit 2 — lazy 2-flag subtree cache for TypeDecl::findAlias

findAlias walks firstType / secondType / argTypes checking each node's alias field. For TypeDecl subtrees that contain no aliases anywhere (the common case for non-generic / non-auto types), the walk yields nothing but still pays the recursion cost on every call.

Added a 2-bit cache in spare slots of the existing flags union:

  • aliasCacheValid — set once computeAliasCache() has run on this node
  • aliasCacheHasAlias — meaningful only when valid; true iff subtree contains any alias (name-independent, allowAuto-independent)

computeAliasCache() does one eager full walk and populates the flags on every visited node. findAlias() then bails in O(1) at the root when the cache says "no aliases anywhere". For subtrees with aliases, behavior is unchanged.

Cache bits are NOT cloned — both the copy ctor and the in-place TypeDecl::clone reset them so clones start fresh. Stored in spare bitfield slots (uint32_t flags had room) rather than a separate byte, so sizeof(TypeDecl) is unchanged and the shared_module ABI is preserved.

Measurements

dasImgui imgui_demo/main.das compile-only, 13621 functions, Release build:

Total Infer
Baseline 29.08s 24.43s
+ hoist 24.87s 20.19s
+ aliasCache ~23.5s ~18.8s
Cumulative -19% -22%

PerfView CPU sampling (post-hoist): requireModule.find exclusive samples dropped from 9.23% → 1.38% (-85%); InferTypes::isVisibleFunc dropped out of the top 20 hotspots.

Test plan

  • dastest -- --test tests: 9316 / 9316 passed (51s)
  • JIT smoke (tests/jit_tests/array.das, tests/decs/test_bulk_create.das): no verifier errors
  • test_aot.exe -use-aot dastest/dastest.das -- --use-aot --test tests: 8660 / 8660 passed (45s)
  • Runtime smoke: imgui_demo --headless-frames=1 runs clean
  • No .das changes — lint / format / detect-dupe / docs N/A

🤖 Generated with Claude Code

borisbat and others added 2 commits May 23, 2026 21:55
…oops

In findFuncAddr / findMatchingFunctions / findMatchingFunctionsAndGenerics,
the per-candidate isVisibleFunc(inWhichModule, getFunctionVisModule(pFn))
check was called once per function in mod->functionsByName[h]. Since
pFn->module == mod by construction (Module::addFunction sets it), for
non-generic candidates the visibility module is constant across the inner
loop and the check is loop-invariant.

Hoist modVis = isVisibleFunc(inWhichModule, mod) and
modVisFromThis = thisModule->isVisibleDirectly(mod) to the top of each
non-empty bucket; per-candidate check now only fires for fromGeneric
candidates (whose visibility module is pFn->getOrigin()->module).

Measured on dasImgui imgui_demo (full main.das, 13621 functions):
  compile-only total: 29.08s -> 24.87s (-14.5%)
  infer:              24.43s -> 20.19s (-17.3%)

In PerfView CPU sampling, requireModule.find dropped from 9.23% -> 1.38%
exclusive samples (-85% relative); InferTypes::isVisibleFunc dropped out
of the top 20 hotspots.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
findAlias walks the TypeDecl tree (firstType / secondType / argTypes)
checking each node's alias field. For TypeDecl subtrees that contain
no aliases anywhere (the common case for non-generic / non-auto types),
the walk yields nothing but still pays the recursion cost on every call.

Add a 2-bit cache stored in two spare bits of the existing flags union:
  aliasCacheValid       set once computeAliasCache() has run on this node
  aliasCacheHasAlias    meaningful only when valid; true iff the subtree
                        contains any alias at all (name-independent,
                        allowAuto-independent)

computeAliasCache() does one eager full walk and populates the flags on
every visited node. findAlias() then bails in O(1) at the root when the
cached state says "no aliases anywhere". For subtrees with aliases the
behavior is unchanged - original recursive walk runs.

The cache bits are NOT cloned - both the copy ctor and the in-place
TypeDecl::clone reset them so clones start fresh and recompute lazily.
Stored in spare bitfield slots (uint32_t flags already had room) rather
than a separate byte, so sizeof(TypeDecl) is unchanged and the
shared_module ABI is preserved.

Measured on dasImgui imgui_demo (full main.das, 13621 functions,
on top of the prior infer-visibility hoist):
  compile-only total:  24.87s -> ~23.5s (-5-7%)
  infer:               20.19s -> ~18.8s (-7%)

Cumulative vs original baseline (29.08s / 24.43s infer):
  total: -19%
  infer: -22%

Full dastest suite: 9302 tests, 9302 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 24, 2026 05:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Performance-focused update to the infer/type-resolution phase, aiming to reduce compile-time cost in large modules by removing loop-invariant visibility checks during overload resolution and adding a lazy subtree cache to avoid repeated alias-tree walks in TypeDecl::findAlias.

Changes:

  • Hoist per-module visibility checks out of inner overload-resolution loops in findFuncAddr / findMatchingFunctions*.
  • Add a 2-bit TypeDecl::findAlias subtree cache (aliasCacheValid / aliasCacheHasAlias) plus a computeAliasCache() traversal.
  • Ensure alias cache bits are reset on TypeDecl copy/clone so clones recompute lazily.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/ast/ast_typedecl.cpp Adds computeAliasCache() and uses it to short-circuit findAlias() when no aliases exist in the subtree; resets cache bits on copy/clone.
include/daScript/ast/ast_typedecl.h Declares cache bits in TypeDecl flags and exposes computeAliasCache().
src/ast/ast_infer_type_function.cpp Hoists module-level visibility checks out of per-candidate loops during function/generic matching.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/ast/ast_typedecl.cpp
Comment on lines +951 to +958
bool TypeDecl::computeAliasCache() {
// Eager full walk independent of name/allowAuto. Sets aliasCache* on every visited node.
// alias-type subtrees are dead ends for findAlias, so treated as NoAlias.
if (baseType == Type::alias) {
aliasCacheValid = true;
aliasCacheHasAlias = false;
return false;
}
Mirrors findAlias's own line 970 pattern so a parent's eager walk doesn't
re-recurse into already-computed children (when findAlias was called on
an intermediate node before the root). Premise as Copilot stated it
(shared subtrees) doesn't apply to AST nodes per gc_node unique-ownership;
but the call-on-child-before-parent case is real, and the one-liner is
self-consistent with the file's pattern.

Perf-neutral on imgui_demo.das compile (8-run avg 20.30s vs 3-run pre
20.51s, within ~1s noise spread). Land for stylistic consistency, not
a measured speedup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment on lines +288 to 291
bool aliasCacheValid : 1; // findAlias subtree cache validity flag
bool aliasCacheHasAlias : 1; // findAlias subtree cache result (only meaningful when aliasCacheValid)
};
uint32_t flags = 0;
Comment thread src/ast/ast_typedecl.cpp
Comment on lines 970 to 973
TypeDecl * TypeDecl::findAlias ( const string & name, bool allowAuto ) {
if (!aliasCacheValid) computeAliasCache();
if (!aliasCacheHasAlias) return nullptr; // proven no aliases anywhere
if (baseType == Type::alias) {
@borisbat borisbat merged commit cc6ad7e into master May 24, 2026
30 checks passed
pull Bot pushed a commit to forksnd/daScript that referenced this pull request May 24, 2026
PR GaijinEntertainment#2849 added aliasCacheValid / aliasCacheHasAlias bits to the
TypeDecl::flags union for the lazy findAlias subtree cache. They are
runtime scratch state set on first findAlias call, not part of the
type's semantic identity. They were leaking into getLookupHash,
getSemanticHash, and getOwnSemanticHash via hashmix(flags) /
hb.update(flags).

Within a single compile this is deterministic (same input -> same
cache lifecycle -> same bit state). Across two semantically equivalent
TypeDecls whose findAlias-call lifecycles differ (one cached, one
not), the hashes diverge -- risky for type interning and AOT semantic
hashing across module boundaries.

Add flagsWithoutAliasCache(td) -- save the two bits via const_cast,
clear them, read flags, restore. Use it at the 3 hash sites.

No ABI change. dastest 9341 pass. AOT test_aot 8685 pass. JIT smoke
clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@borisbat borisbat deleted the bbatkin/infer-overload-hoist-aliascache branch May 30, 2026 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants