infer: hoist per-module visibility + aliasCache (~19% faster compile)#2849
Merged
Conversation
…oops In findFuncAddr / findMatchingFunctions / findMatchingFunctionsAndGenerics, the per-candidate isVisibleFunc(inWhichModule, getFunctionVisModule(pFn)) check was called once per function in mod->functionsByName[h]. Since pFn->module == mod by construction (Module::addFunction sets it), for non-generic candidates the visibility module is constant across the inner loop and the check is loop-invariant. Hoist modVis = isVisibleFunc(inWhichModule, mod) and modVisFromThis = thisModule->isVisibleDirectly(mod) to the top of each non-empty bucket; per-candidate check now only fires for fromGeneric candidates (whose visibility module is pFn->getOrigin()->module). Measured on dasImgui imgui_demo (full main.das, 13621 functions): compile-only total: 29.08s -> 24.87s (-14.5%) infer: 24.43s -> 20.19s (-17.3%) In PerfView CPU sampling, requireModule.find dropped from 9.23% -> 1.38% exclusive samples (-85% relative); InferTypes::isVisibleFunc dropped out of the top 20 hotspots. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
findAlias walks the TypeDecl tree (firstType / secondType / argTypes)
checking each node's alias field. For TypeDecl subtrees that contain
no aliases anywhere (the common case for non-generic / non-auto types),
the walk yields nothing but still pays the recursion cost on every call.
Add a 2-bit cache stored in two spare bits of the existing flags union:
aliasCacheValid set once computeAliasCache() has run on this node
aliasCacheHasAlias meaningful only when valid; true iff the subtree
contains any alias at all (name-independent,
allowAuto-independent)
computeAliasCache() does one eager full walk and populates the flags on
every visited node. findAlias() then bails in O(1) at the root when the
cached state says "no aliases anywhere". For subtrees with aliases the
behavior is unchanged - original recursive walk runs.
The cache bits are NOT cloned - both the copy ctor and the in-place
TypeDecl::clone reset them so clones start fresh and recompute lazily.
Stored in spare bitfield slots (uint32_t flags already had room) rather
than a separate byte, so sizeof(TypeDecl) is unchanged and the
shared_module ABI is preserved.
Measured on dasImgui imgui_demo (full main.das, 13621 functions,
on top of the prior infer-visibility hoist):
compile-only total: 24.87s -> ~23.5s (-5-7%)
infer: 20.19s -> ~18.8s (-7%)
Cumulative vs original baseline (29.08s / 24.43s infer):
total: -19%
infer: -22%
Full dastest suite: 9302 tests, 9302 passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Performance-focused update to the infer/type-resolution phase, aiming to reduce compile-time cost in large modules by removing loop-invariant visibility checks during overload resolution and adding a lazy subtree cache to avoid repeated alias-tree walks in TypeDecl::findAlias.
Changes:
- Hoist per-module visibility checks out of inner overload-resolution loops in
findFuncAddr/findMatchingFunctions*. - Add a 2-bit
TypeDecl::findAliassubtree cache (aliasCacheValid/aliasCacheHasAlias) plus acomputeAliasCache()traversal. - Ensure alias cache bits are reset on
TypeDeclcopy/clone so clones recompute lazily.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/ast/ast_typedecl.cpp |
Adds computeAliasCache() and uses it to short-circuit findAlias() when no aliases exist in the subtree; resets cache bits on copy/clone. |
include/daScript/ast/ast_typedecl.h |
Declares cache bits in TypeDecl flags and exposes computeAliasCache(). |
src/ast/ast_infer_type_function.cpp |
Hoists module-level visibility checks out of per-candidate loops during function/generic matching. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+951
to
+958
| bool TypeDecl::computeAliasCache() { | ||
| // Eager full walk independent of name/allowAuto. Sets aliasCache* on every visited node. | ||
| // alias-type subtrees are dead ends for findAlias, so treated as NoAlias. | ||
| if (baseType == Type::alias) { | ||
| aliasCacheValid = true; | ||
| aliasCacheHasAlias = false; | ||
| return false; | ||
| } |
Mirrors findAlias's own line 970 pattern so a parent's eager walk doesn't re-recurse into already-computed children (when findAlias was called on an intermediate node before the root). Premise as Copilot stated it (shared subtrees) doesn't apply to AST nodes per gc_node unique-ownership; but the call-on-child-before-parent case is real, and the one-liner is self-consistent with the file's pattern. Perf-neutral on imgui_demo.das compile (8-run avg 20.30s vs 3-run pre 20.51s, within ~1s noise spread). Land for stylistic consistency, not a measured speedup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment on lines
+288
to
291
| bool aliasCacheValid : 1; // findAlias subtree cache validity flag | ||
| bool aliasCacheHasAlias : 1; // findAlias subtree cache result (only meaningful when aliasCacheValid) | ||
| }; | ||
| uint32_t flags = 0; |
Comment on lines
970
to
973
| TypeDecl * TypeDecl::findAlias ( const string & name, bool allowAuto ) { | ||
| if (!aliasCacheValid) computeAliasCache(); | ||
| if (!aliasCacheHasAlias) return nullptr; // proven no aliases anywhere | ||
| if (baseType == Type::alias) { |
4 tasks
pull Bot
pushed a commit
to forksnd/daScript
that referenced
this pull request
May 24, 2026
PR GaijinEntertainment#2849 added aliasCacheValid / aliasCacheHasAlias bits to the TypeDecl::flags union for the lazy findAlias subtree cache. They are runtime scratch state set on first findAlias call, not part of the type's semantic identity. They were leaking into getLookupHash, getSemanticHash, and getOwnSemanticHash via hashmix(flags) / hb.update(flags). Within a single compile this is deterministic (same input -> same cache lifecycle -> same bit state). Across two semantically equivalent TypeDecls whose findAlias-call lifecycles differ (one cached, one not), the hashes diverge -- risky for type interning and AOT semantic hashing across module boundaries. Add flagsWithoutAliasCache(td) -- save the two bits via const_cast, clear them, read flags, restore. Use it at the 3 hash sites. No ABI change. dastest 9341 pass. AOT test_aot 8685 pass. JIT smoke clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two independent infer-phase perf wins, identified via PerfView CPU sampling on a slow compile case (dasImgui
imgui_demo/main.das, 13621 functions, ~30s compile-only).Commit 1 — hoist per-module visibility out of overload resolution inner loops
In
findFuncAddr/findMatchingFunctions/findMatchingFunctionsAndGenerics, the per-candidateisVisibleFunc(inWhichModule, getFunctionVisModule(pFn))check was called once per function inmod->functionsByName[h]. SincepFn->module == modby construction (Module::addFunctionsets it), for non-generic candidates the visibility module is constant across the inner loop and the check is loop-invariant.Hoist
modVis = isVisibleFunc(inWhichModule, mod)andmodVisFromThis = thisModule->isVisibleDirectly(mod)to the top of each non-empty bucket; per-candidate check now only fires forfromGenericcandidates (whose visibility module ispFn->getOrigin()->module).5 call sites patched in
ast_infer_type_function.cpp(findFuncAddr, twofindMatchingFunctionsoverloads, twofindMatchingFunctionsAndGenericsoverloads).Commit 2 — lazy 2-flag subtree cache for
TypeDecl::findAliasfindAliaswalksfirstType/secondType/argTypeschecking each node'saliasfield. For TypeDecl subtrees that contain no aliases anywhere (the common case for non-generic / non-auto types), the walk yields nothing but still pays the recursion cost on every call.Added a 2-bit cache in spare slots of the existing
flagsunion:aliasCacheValid— set oncecomputeAliasCache()has run on this nodealiasCacheHasAlias— meaningful only when valid; true iff subtree contains any alias (name-independent, allowAuto-independent)computeAliasCache()does one eager full walk and populates the flags on every visited node.findAlias()then bails in O(1) at the root when the cache says "no aliases anywhere". For subtrees with aliases, behavior is unchanged.Cache bits are NOT cloned — both the copy ctor and the in-place
TypeDecl::clonereset them so clones start fresh. Stored in spare bitfield slots (uint32_t flags had room) rather than a separate byte, sosizeof(TypeDecl)is unchanged and the shared_module ABI is preserved.Measurements
dasImgui
imgui_demo/main.dascompile-only, 13621 functions, Release build:PerfView CPU sampling (post-hoist):
requireModule.findexclusive samples dropped from 9.23% → 1.38% (-85%);InferTypes::isVisibleFuncdropped out of the top 20 hotspots.Test plan
dastest -- --test tests: 9316 / 9316 passed (51s)tests/jit_tests/array.das,tests/decs/test_bulk_create.das): no verifier errorstest_aot.exe -use-aot dastest/dastest.das -- --use-aot --test tests: 8660 / 8660 passed (45s)--headless-frames=1runs clean.daschanges — lint / format / detect-dupe / docs N/A🤖 Generated with Claude Code