fix(extraction): recognize common third-party C++ inline macros, not just UE#1101
Merged
Conversation
…just UE Extend blankCppInlineMacros beyond Unreal Engine's FORCEINLINE family to the inline/linkage macros that vendored third-party libraries define and that mangle function names the same way: - pugixml: PUGI__FN / PUGI__FN_NO_INLINE (before the return type) and PUGIXML_FUNCTION (linkage macro, between return type and name — the blank mechanism handles both positions). - Godot: _FORCE_INLINE_ / _ALWAYS_INLINE_. - Boost: BOOST_FORCEINLINE / BOOST_NOINLINE. - Generic cross-ecosystem hints: ALWAYS_INLINE / FORCE_INLINE / NOINLINE. The list now drives a single generated alternation (longest-token-first), so adding a codebase's macro is a one-line change. Still curated exact tokens in specifier position only — a real all-caps return type like `HRESULT DoIt()` is never touched (verified by controls). Validated on CARLA (large UE project, 1131 C++/h files): function-name mangles 440 -> 16 (428 fixed). The 16 residual and 7 clean->mangled shifts are all in third-party vendored files — chiefly pugixml.cpp, a 12k-line macro amalgamation where error recovery is non-local, so blanking one of several *stacked* macros (PUGI__FN + PUGI__UNSIGNED_OVERFLOW …) shifts an already-imperfect extraction. Normal C++/UE code (ActionRoguelike, ALS) sees zero regressions — blanking a macro there only helps. Chasing pugixml's internal attribute macros is left out of scope. Seven regression tests added. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
colbymchenry
added a commit
that referenced
this pull request
Jul 1, 2026
…names (#1102) * feat(extraction): universal recovery of macro-mangled C/C++ function names The curated inline-macro blank list (#1100/#1101) can't enumerate every library's macro. Add a universal post-parse net so a function is findable by name regardless of which macro decorates it, plus a batch of common libraries to the curated list for full name+return-type recovery. - recoverMangledCppName: after extraction, recover the real identifier from a name still mangled by an un-blanked macro (`MACRO Ret name(…)` misparses to "Ret name"). It's a new `recoverMangledName` extractor hook wired only onto C/C++, applied to every name they produce. Safe by construction: it only touches an already-mangled name (an internal space that isn't a legit `operator …`/destructor), so a clean name is returned unchanged; guarded against the `Ret (name)` parenthesized-name idiom and bare primitives. Scoped to C/C++ so Kotlin/Scala backtick identifiers (which legitimately contain spaces) are never touched. - Curated list extended past UE/pugixml/Godot/Boost to Qt (Q_INVOKABLE, …), Folly, Abseil, LLVM, V8, Eigen, and rapidjson. Validated on CARLA (large UE project, 1131 C++/h files) vs the pre-fix baseline: function-name mangles 440 -> 6, 431 fixed, and — critically — 0 regressions (the salvage also recovers names that the pre-parse's own non-local error-recovery shifts would otherwise re-mangle, erasing the 7 shifts seen in #1101). The 6 residual are all the moodycamel `Ret (name)` idiom, correctly left alone. On a made-up macro with no list entry (`WEBKIT_EXPORT WTFString compute()`), the name `compute` is still recovered. Full suite green; eleven regression/safety tests added. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(changelog): note universal C++ macro-mangled name recovery (#1102) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
Follow-up to #1100 (UE
FORCEINLINErecovery), extending it to the inline/linkage macros that vendored third-party C++ libraries define. Surfaced while validating the C++ name-extraction arc on a large UE project (carla-simulator/carla), whose residual name-mangles were third-party library macros (pugixmlPUGI__FN, etc.).Change
blankCppInlineMacrosnow covers, in addition to UE'sFORCEINLINE/FORCENOINLINE/FORCEINLINE_DEBUGGABLE:PUGI__FN,PUGI__FN_NO_INLINE(before the return type) andPUGIXML_FUNCTION(linkage macro, between return type and name — the blank mechanism handles both positions)._FORCE_INLINE_,_ALWAYS_INLINE_.BOOST_FORCEINLINE,BOOST_NOINLINE.ALWAYS_INLINE,FORCE_INLINE,NOINLINE.The list now drives a single generated regex (longest-token-first), so adding a codebase's macro is a one-line change. Still curated exact tokens in specifier position only — a real all-caps return type like
HRESULT DoIt()is never blanked (control-tested).Validation on CARLA (1131 C++/h files)
Re-indexed with the pre-fix baseline vs this change (kind-agnostic, by source position):
All 16 residual and all 7 shifts are in third-party vendored files (9 pugixml, 6 moodycamel, 1 Qt) — zero in CARLA's own LibCarla/Unreal code. The 7 shifts are inside
pugixml.cpp, a 12k-line macro amalgamation where tree-sitter's error recovery is non-local: blanking one of several stacked macros (PUGI__FN PUGI__UNSIGNED_OVERFLOW char_t* integer_to_string(…)) shifts an already-imperfect extraction to a differently-imperfect one. Of the 7: 1 is a genuine utility regression (a template helper behind a second, unlisted pugixml-internal attribute macro), 5 are lateral (conversion operators that were never correctly named), 1 is a Qt function-pointer typedef. Chasing pugixml's internal attribute macros is deliberately out of scope.On normal C++/UE code, blanking a known macro only ever helps — the earlier ActionRoguelike/ALS validations show 0 regressions.
Tests
Full suite green (1876 passed). Seven regression cases added covering pugixml/Godot/Boost/generic macros plus the
HRESULT/offset/word-boundary safety controls.To cover another codebase's inline macro, add its exact token to the list.
🤖 Generated with Claude Code