fix(extraction): recover C++ function names prefixed by an inline-specifier macro#1100
Merged
Merged
Conversation
…cifier macro An unknown inline-specifier macro before a function's return type (`FORCEINLINE FString GetName(…)`) threw tree-sitter into error recovery: the macro was read as the return type and — for a non-primitive return — the return type was glued onto the name, so the function was indexed as `"FString GetName"` instead of `GetName`, unfindable by name and with no caller links. This is pervasive in Unreal Engine, where inline helpers are written `FORCEINLINE <ret> <name>(…)` (e.g. ALS's `FORCEINLINE FString GetEnumerationToString`). Add `blankCppInlineMacros`, a preParse that blanks the known UE inline macros (`FORCEINLINE`, `FORCENOINLINE`, `FORCEINLINE_DEBUGGABLE`) with equal-length spaces so byte offsets stay exact and the declaration parses as an ordinary function — recovering both the real name AND the return type. This is the same recover-don't-drop approach as blankCppExportMacros (#946/#1061), and the two are composed into the cppExtractor preParse. Matched tightly (exact known tokens, only in specifier position — followed by the identifier that starts the return type/name), so ordinary identifiers, real all-caps return types (`HRESULT DoIt()`), string literals, expression uses, and longer words (`FORCEINLINE_COUNT`) are untouched — verified by controls. C++-only; Kotlin/Scala re-index byte-for-byte identical. Five regression tests added. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…1100) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This was referenced Jul 1, 2026
colbymchenry
added a commit
that referenced
this pull request
Jul 1, 2026
…names (#1102) * feat(extraction): universal recovery of macro-mangled C/C++ function names The curated inline-macro blank list (#1100/#1101) can't enumerate every library's macro. Add a universal post-parse net so a function is findable by name regardless of which macro decorates it, plus a batch of common libraries to the curated list for full name+return-type recovery. - recoverMangledCppName: after extraction, recover the real identifier from a name still mangled by an un-blanked macro (`MACRO Ret name(…)` misparses to "Ret name"). It's a new `recoverMangledName` extractor hook wired only onto C/C++, applied to every name they produce. Safe by construction: it only touches an already-mangled name (an internal space that isn't a legit `operator …`/destructor), so a clean name is returned unchanged; guarded against the `Ret (name)` parenthesized-name idiom and bare primitives. Scoped to C/C++ so Kotlin/Scala backtick identifiers (which legitimately contain spaces) are never touched. - Curated list extended past UE/pugixml/Godot/Boost to Qt (Q_INVOKABLE, …), Folly, Abseil, LLVM, V8, Eigen, and rapidjson. Validated on CARLA (large UE project, 1131 C++/h files) vs the pre-fix baseline: function-name mangles 440 -> 6, 431 fixed, and — critically — 0 regressions (the salvage also recovers names that the pre-parse's own non-local error-recovery shifts would otherwise re-mangle, erasing the 7 shifts seen in #1101). The 6 residual are all the moodycamel `Ret (name)` idiom, correctly left alone. On a made-up macro with no list entry (`WEBKIT_EXPORT WTFString compute()`), the name `compute` is still recovered. Full suite green; eleven regression/safety tests added. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(changelog): note universal C++ macro-mangled name recovery (#1102) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
Follow-up to the C++ name-extraction work found while validating #1093/#1096 against real Unreal Engine repos (
tomlooman/ActionRoguelike,PanicPetal/ALS-Community). This closes the last name-mangle case surfaced there.Bug
An unknown inline-specifier macro before the return type (
FORCEINLINE FString GetName(…)) throws tree-sitter into error recovery: it reads the macro as the return type and — for a non-primitive return — glues the return type onto the name, so the function is indexed as"FString GetName"instead ofGetName. It's then unfindable by name and its callers don't link. This is pervasive in Unreal Engine, where inline helpers are writtenFORCEINLINE <ret> <name>(…)— e.g. ALS'sFORCEINLINE FString GetEnumerationToString(...).Fix
blankCppInlineMacros— a preParse that blanks the standard UE inline macros (FORCEINLINE,FORCENOINLINE,FORCEINLINE_DEBUGGABLE) with equal-length spaces, so byte offsets (line/column) stay exact and the declaration parses as an ordinary function. This recovers both the real name and the return type, the same recover-don't-drop approach asblankCppExportMacros(#946/#1061) — the two are composed into the cppExtractor preParse.Scope / safety
Matched tightly: only the exact known tokens, only in specifier position (immediately followed by the identifier that starts the return type/name). Controls confirm it does not touch:
FString GetName()unchanged),HRESULT DoIt()keepsHRESULT— the curated list never blanks an arbitrary uppercase type),"FORCEINLINE"), expression uses (x = FORCEINLINE + 1), longer words (FORCEINLINE_COUNT).C++-only — Kotlin (okhttp) and Scala (cats) re-index byte-for-byte identical (815 / 3036 type nodes unchanged).
Validation
#662daemon test is unrelated and passes on retry). Five regression tests added.GetEnumerationToStringnow carries the correct name +FStringreturn type; the name-mangle scan across both repos drops to zero.To cover another codebase's inline macro (e.g.
ALWAYS_INLINE), add its exact token to the list.🤖 Generated with Claude Code