Skip to content

fix(extraction): recover C++ function names prefixed by an inline-specifier macro#1100

Merged
colbymchenry merged 2 commits into
mainfrom
fix/cpp-forceinline-macro-function-names
Jul 1, 2026
Merged

fix(extraction): recover C++ function names prefixed by an inline-specifier macro#1100
colbymchenry merged 2 commits into
mainfrom
fix/cpp-forceinline-macro-function-names

Conversation

@colbymchenry

Copy link
Copy Markdown
Owner

Context

Follow-up to the C++ name-extraction work found while validating #1093/#1096 against real Unreal Engine repos (tomlooman/ActionRoguelike, PanicPetal/ALS-Community). This closes the last name-mangle case surfaced there.

Bug

An unknown inline-specifier macro before the return type (FORCEINLINE FString GetName(…)) throws tree-sitter into error recovery: it reads the macro as the return type and — for a non-primitive return — glues the return type onto the name, so the function is indexed as "FString GetName" instead of GetName. It's then unfindable by name and its callers don't link. This is pervasive in Unreal Engine, where inline helpers are written FORCEINLINE <ret> <name>(…) — e.g. ALS's FORCEINLINE FString GetEnumerationToString(...).

Fix

blankCppInlineMacros — a preParse that blanks the standard UE inline macros (FORCEINLINE, FORCENOINLINE, FORCEINLINE_DEBUGGABLE) with equal-length spaces, so byte offsets (line/column) stay exact and the declaration parses as an ordinary function. This recovers both the real name and the return type, the same recover-don't-drop approach as blankCppExportMacros (#946/#1061) — the two are composed into the cppExtractor preParse.

Scope / safety

Matched tightly: only the exact known tokens, only in specifier position (immediately followed by the identifier that starts the return type/name). Controls confirm it does not touch:

  • ordinary functions (FString GetName() unchanged),
  • real all-caps return types (HRESULT DoIt() keeps HRESULT — the curated list never blanks an arbitrary uppercase type),
  • string literals ("FORCEINLINE"), expression uses (x = FORCEINLINE + 1), longer words (FORCEINLINE_COUNT).

C++-only — Kotlin (okhttp) and Scala (cats) re-index byte-for-byte identical (815 / 3036 type nodes unchanged).

Validation

  • Full suite: 1875 tests pass (the lone flaky #662 daemon test is unrelated and passes on retry). Five regression tests added.
  • Re-index of both UE repos: GetEnumerationToString now carries the correct name + FString return type; the name-mangle scan across both repos drops to zero.

To cover another codebase's inline macro (e.g. ALWAYS_INLINE), add its exact token to the list.

🤖 Generated with Claude Code

colbymchenry and others added 2 commits July 1, 2026 08:41
…cifier macro

An unknown inline-specifier macro before a function's return type
(`FORCEINLINE FString GetName(…)`) threw tree-sitter into error recovery: the
macro was read as the return type and — for a non-primitive return — the return
type was glued onto the name, so the function was indexed as
`"FString GetName"` instead of `GetName`, unfindable by name and with no caller
links. This is pervasive in Unreal Engine, where inline helpers are written
`FORCEINLINE <ret> <name>(…)` (e.g. ALS's `FORCEINLINE FString GetEnumerationToString`).

Add `blankCppInlineMacros`, a preParse that blanks the known UE inline macros
(`FORCEINLINE`, `FORCENOINLINE`, `FORCEINLINE_DEBUGGABLE`) with equal-length
spaces so byte offsets stay exact and the declaration parses as an ordinary
function — recovering both the real name AND the return type. This is the same
recover-don't-drop approach as blankCppExportMacros (#946/#1061), and the two
are composed into the cppExtractor preParse.

Matched tightly (exact known tokens, only in specifier position — followed by
the identifier that starts the return type/name), so ordinary identifiers, real
all-caps return types (`HRESULT DoIt()`), string literals, expression uses, and
longer words (`FORCEINLINE_COUNT`) are untouched — verified by controls. C++-only;
Kotlin/Scala re-index byte-for-byte identical. Five regression tests added.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…1100)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@colbymchenry colbymchenry merged commit 9b2ce1c into main Jul 1, 2026
@colbymchenry colbymchenry deleted the fix/cpp-forceinline-macro-function-names branch July 1, 2026 13:42
colbymchenry added a commit that referenced this pull request Jul 1, 2026
…names (#1102)

* feat(extraction): universal recovery of macro-mangled C/C++ function names

The curated inline-macro blank list (#1100/#1101) can't enumerate every
library's macro. Add a universal post-parse net so a function is findable by
name regardless of which macro decorates it, plus a batch of common libraries
to the curated list for full name+return-type recovery.

- recoverMangledCppName: after extraction, recover the real identifier from a
  name still mangled by an un-blanked macro (`MACRO Ret name(…)` misparses to
  "Ret name"). It's a new `recoverMangledName` extractor hook wired only onto
  C/C++, applied to every name they produce. Safe by construction: it only
  touches an already-mangled name (an internal space that isn't a legit
  `operator …`/destructor), so a clean name is returned unchanged; guarded
  against the `Ret (name)` parenthesized-name idiom and bare primitives. Scoped
  to C/C++ so Kotlin/Scala backtick identifiers (which legitimately contain
  spaces) are never touched.
- Curated list extended past UE/pugixml/Godot/Boost to Qt (Q_INVOKABLE, …),
  Folly, Abseil, LLVM, V8, Eigen, and rapidjson.

Validated on CARLA (large UE project, 1131 C++/h files) vs the pre-fix baseline:
function-name mangles 440 -> 6, 431 fixed, and — critically — 0 regressions
(the salvage also recovers names that the pre-parse's own non-local error-recovery
shifts would otherwise re-mangle, erasing the 7 shifts seen in #1101). The 6
residual are all the moodycamel `Ret (name)` idiom, correctly left alone. On a
made-up macro with no list entry (`WEBKIT_EXPORT WTFString compute()`), the name
`compute` is still recovered. Full suite green; eleven regression/safety tests added.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(changelog): note universal C++ macro-mangled name recovery (#1102)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant