Skip to content

fix(extraction): recognize common third-party C++ inline macros, not just UE#1101

Merged
colbymchenry merged 2 commits into
mainfrom
fix/cpp-third-party-inline-macros
Jul 1, 2026
Merged

fix(extraction): recognize common third-party C++ inline macros, not just UE#1101
colbymchenry merged 2 commits into
mainfrom
fix/cpp-third-party-inline-macros

Conversation

@colbymchenry

Copy link
Copy Markdown
Owner

Context

Follow-up to #1100 (UE FORCEINLINE recovery), extending it to the inline/linkage macros that vendored third-party C++ libraries define. Surfaced while validating the C++ name-extraction arc on a large UE project (carla-simulator/carla), whose residual name-mangles were third-party library macros (pugixml PUGI__FN, etc.).

Change

blankCppInlineMacros now covers, in addition to UE's FORCEINLINE/FORCENOINLINE/FORCEINLINE_DEBUGGABLE:

  • pugixml: PUGI__FN, PUGI__FN_NO_INLINE (before the return type) and PUGIXML_FUNCTION (linkage macro, between return type and name — the blank mechanism handles both positions).
  • Godot: _FORCE_INLINE_, _ALWAYS_INLINE_.
  • Boost: BOOST_FORCEINLINE, BOOST_NOINLINE.
  • Generic: ALWAYS_INLINE, FORCE_INLINE, NOINLINE.

The list now drives a single generated regex (longest-token-first), so adding a codebase's macro is a one-line change. Still curated exact tokens in specifier position only — a real all-caps return type like HRESULT DoIt() is never blanked (control-tested).

Validation on CARLA (1131 C++/h files)

Re-indexed with the pre-fix baseline vs this change (kind-agnostic, by source position):

value
function-name mangles 440 → 16
fixed (mangled → clean) 428
clean → mangled shifts 7

All 16 residual and all 7 shifts are in third-party vendored files (9 pugixml, 6 moodycamel, 1 Qt) — zero in CARLA's own LibCarla/Unreal code. The 7 shifts are inside pugixml.cpp, a 12k-line macro amalgamation where tree-sitter's error recovery is non-local: blanking one of several stacked macros (PUGI__FN PUGI__UNSIGNED_OVERFLOW char_t* integer_to_string(…)) shifts an already-imperfect extraction to a differently-imperfect one. Of the 7: 1 is a genuine utility regression (a template helper behind a second, unlisted pugixml-internal attribute macro), 5 are lateral (conversion operators that were never correctly named), 1 is a Qt function-pointer typedef. Chasing pugixml's internal attribute macros is deliberately out of scope.

On normal C++/UE code, blanking a known macro only ever helps — the earlier ActionRoguelike/ALS validations show 0 regressions.

Tests

Full suite green (1876 passed). Seven regression cases added covering pugixml/Godot/Boost/generic macros plus the HRESULT/offset/word-boundary safety controls.

To cover another codebase's inline macro, add its exact token to the list.

🤖 Generated with Claude Code

colbymchenry and others added 2 commits July 1, 2026 09:09
…just UE

Extend blankCppInlineMacros beyond Unreal Engine's FORCEINLINE family to the
inline/linkage macros that vendored third-party libraries define and that
mangle function names the same way:

- pugixml: PUGI__FN / PUGI__FN_NO_INLINE (before the return type) and
  PUGIXML_FUNCTION (linkage macro, between return type and name — the blank
  mechanism handles both positions).
- Godot: _FORCE_INLINE_ / _ALWAYS_INLINE_.
- Boost: BOOST_FORCEINLINE / BOOST_NOINLINE.
- Generic cross-ecosystem hints: ALWAYS_INLINE / FORCE_INLINE / NOINLINE.

The list now drives a single generated alternation (longest-token-first), so
adding a codebase's macro is a one-line change. Still curated exact tokens in
specifier position only — a real all-caps return type like `HRESULT DoIt()` is
never touched (verified by controls).

Validated on CARLA (large UE project, 1131 C++/h files): function-name mangles
440 -> 16 (428 fixed). The 16 residual and 7 clean->mangled shifts are all in
third-party vendored files — chiefly pugixml.cpp, a 12k-line macro amalgamation
where error recovery is non-local, so blanking one of several *stacked* macros
(PUGI__FN + PUGI__UNSIGNED_OVERFLOW …) shifts an already-imperfect extraction.
Normal C++/UE code (ActionRoguelike, ALS) sees zero regressions — blanking a
macro there only helps. Chasing pugixml's internal attribute macros is left out
of scope. Seven regression tests added.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@colbymchenry colbymchenry merged commit a164cea into main Jul 1, 2026
@colbymchenry colbymchenry deleted the fix/cpp-third-party-inline-macros branch July 1, 2026 14:10
colbymchenry added a commit that referenced this pull request Jul 1, 2026
…names (#1102)

* feat(extraction): universal recovery of macro-mangled C/C++ function names

The curated inline-macro blank list (#1100/#1101) can't enumerate every
library's macro. Add a universal post-parse net so a function is findable by
name regardless of which macro decorates it, plus a batch of common libraries
to the curated list for full name+return-type recovery.

- recoverMangledCppName: after extraction, recover the real identifier from a
  name still mangled by an un-blanked macro (`MACRO Ret name(…)` misparses to
  "Ret name"). It's a new `recoverMangledName` extractor hook wired only onto
  C/C++, applied to every name they produce. Safe by construction: it only
  touches an already-mangled name (an internal space that isn't a legit
  `operator …`/destructor), so a clean name is returned unchanged; guarded
  against the `Ret (name)` parenthesized-name idiom and bare primitives. Scoped
  to C/C++ so Kotlin/Scala backtick identifiers (which legitimately contain
  spaces) are never touched.
- Curated list extended past UE/pugixml/Godot/Boost to Qt (Q_INVOKABLE, …),
  Folly, Abseil, LLVM, V8, Eigen, and rapidjson.

Validated on CARLA (large UE project, 1131 C++/h files) vs the pre-fix baseline:
function-name mangles 440 -> 6, 431 fixed, and — critically — 0 regressions
(the salvage also recovers names that the pre-parse's own non-local error-recovery
shifts would otherwise re-mangle, erasing the 7 shifts seen in #1101). The 6
residual are all the moodycamel `Ret (name)` idiom, correctly left alone. On a
made-up macro with no list entry (`WEBKIT_EXPORT WTFString compute()`), the name
`compute` is still recovered. Full suite green; eleven regression/safety tests added.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(changelog): note universal C++ macro-mangled name recovery (#1102)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant