Skip to content

feat(extraction): universal recovery of macro-mangled C/C++ function names#1102

Merged
colbymchenry merged 2 commits into
mainfrom
feat/cpp-universal-macro-name-recovery
Jul 1, 2026
Merged

feat(extraction): universal recovery of macro-mangled C/C++ function names#1102
colbymchenry merged 2 commits into
mainfrom
feat/cpp-universal-macro-name-recovery

Conversation

@colbymchenry

Copy link
Copy Markdown
Owner

Motivation

The curated inline-macro blank list (#1100/#1101) is precise but per-library — it can't enumerate every codebase's macro, so a new library's MACRO ReturnType func() still leaks the return type into the function's name. This makes it effectively universal while keeping the curated list for the common libraries.

Two tiers

  1. Curated pre-parse blank (existing, extended) — blanks known macros before parsing → recovers name and return type. Extended past UE/pugixml/Godot/Boost to Qt (Q_INVOKABLE, …), Folly, Abseil, LLVM, V8, Eigen, rapidjson.
  2. Universal post-parse salvage (new)recoverMangledCppName, a new recoverMangledName extractor hook wired only onto C/C++. After extraction, if a name is still mangled ("WTFString computeThing"), recover the identifier before the params (computeThing). Works for any macro, no list. Recovers the name (return type stays leaked for unlisted macros).

Safe by construction

The salvage only ever touches an already-mangled name — one with an internal space that isn't a legit operator …/destructor — so a clean name is returned unchanged. Guarded against the two mis-pick cases: the Ret (name) parenthesized-name idiom (left as-is) and bare primitives. Scoped to C/C++, so Kotlin/Scala backtick identifiers (`decode simple certificate`, which legitimately contain spaces) are never touched.

Validation on CARLA (1131 C++/h files)

vs the pre-fix baseline, kind-agnostic by source position:

#1101 (curated only) this PR (+ salvage & libs)
function-name mangles 440 → 16 440 → 6
fixed 428 431
clean→mangled regressions 7 0

The salvage also recovers names that the pre-parse's own non-local error-recovery shifts would otherwise re-mangle — erasing the 7 shifts from #1101. The 6 residual are all the moodycamel Ret (name) idiom, deliberately left alone. On a made-up macro with no list entry (WEBKIT_EXPORT WTFString compute()), compute is still recovered. 0 of CARLA's 10,037 clean names touched.

Tests

Full suite green (1880 passed). Eleven cases added: unknown-macro recovery, the recoverMangledCppName guards (operators, destructors, paren-idiom, primitives, non-identifier tails), cross-language safety (Kotlin backtick untouched), and full recovery for each new curated library.

🤖 Generated with Claude Code

colbymchenry and others added 2 commits July 1, 2026 09:28
…names

The curated inline-macro blank list (#1100/#1101) can't enumerate every
library's macro. Add a universal post-parse net so a function is findable by
name regardless of which macro decorates it, plus a batch of common libraries
to the curated list for full name+return-type recovery.

- recoverMangledCppName: after extraction, recover the real identifier from a
  name still mangled by an un-blanked macro (`MACRO Ret name(…)` misparses to
  "Ret name"). It's a new `recoverMangledName` extractor hook wired only onto
  C/C++, applied to every name they produce. Safe by construction: it only
  touches an already-mangled name (an internal space that isn't a legit
  `operator …`/destructor), so a clean name is returned unchanged; guarded
  against the `Ret (name)` parenthesized-name idiom and bare primitives. Scoped
  to C/C++ so Kotlin/Scala backtick identifiers (which legitimately contain
  spaces) are never touched.
- Curated list extended past UE/pugixml/Godot/Boost to Qt (Q_INVOKABLE, …),
  Folly, Abseil, LLVM, V8, Eigen, and rapidjson.

Validated on CARLA (large UE project, 1131 C++/h files) vs the pre-fix baseline:
function-name mangles 440 -> 6, 431 fixed, and — critically — 0 regressions
(the salvage also recovers names that the pre-parse's own non-local error-recovery
shifts would otherwise re-mangle, erasing the 7 shifts seen in #1101). The 6
residual are all the moodycamel `Ret (name)` idiom, correctly left alone. On a
made-up macro with no list entry (`WEBKIT_EXPORT WTFString compute()`), the name
`compute` is still recovered. Full suite green; eleven regression/safety tests added.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@colbymchenry colbymchenry merged commit cb20a3b into main Jul 1, 2026
1 check passed
@colbymchenry colbymchenry deleted the feat/cpp-universal-macro-name-recovery branch July 1, 2026 14:29
colbymchenry added a commit that referenced this pull request Jul 1, 2026
…1103)

* fix(extraction): broaden the curated C++ inline-macro library list

Since #1102 the post-parse salvage already recovers the NAME for any macro, so
adding a library now buys full return-type recovery for it. Extend the curated
list across the major C++ ecosystem: Mozilla/SpiderMonkey, Protobuf, {fmt},
Hedley + nlohmann/json, GLM, Bullet (SIMD_FORCE_INLINE), Skia, OpenCV, EASTL,
Cocos2d-x, Chromium/WebKit (NEVER_INLINE), GLib, SQLite, and the unambiguous
Windows calling conventions (WINAPI / APIENTRY / STDMETHODCALLTYPE / WINAPIV —
which sit between the return type and the name, so blanking them recovers the
return type, e.g. `HRESULT WINAPI Foo()` -> Foo : HRESULT).

Every entry is an exact, curated token matched only in specifier position, so a
real all-caps return type is never touched. Anything still missed keeps its name
via the universal salvage. CARLA control unchanged (440->6 mangles, 0
regressions — none of these libs appear there, confirming no collateral). Eleven
representative full-recovery tests added.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(changelog): note broadened C++ inline-macro library coverage (#1103)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant