feat(extraction): universal recovery of macro-mangled C/C++ function names#1102
Merged
Conversation
…names The curated inline-macro blank list (#1100/#1101) can't enumerate every library's macro. Add a universal post-parse net so a function is findable by name regardless of which macro decorates it, plus a batch of common libraries to the curated list for full name+return-type recovery. - recoverMangledCppName: after extraction, recover the real identifier from a name still mangled by an un-blanked macro (`MACRO Ret name(…)` misparses to "Ret name"). It's a new `recoverMangledName` extractor hook wired only onto C/C++, applied to every name they produce. Safe by construction: it only touches an already-mangled name (an internal space that isn't a legit `operator …`/destructor), so a clean name is returned unchanged; guarded against the `Ret (name)` parenthesized-name idiom and bare primitives. Scoped to C/C++ so Kotlin/Scala backtick identifiers (which legitimately contain spaces) are never touched. - Curated list extended past UE/pugixml/Godot/Boost to Qt (Q_INVOKABLE, …), Folly, Abseil, LLVM, V8, Eigen, and rapidjson. Validated on CARLA (large UE project, 1131 C++/h files) vs the pre-fix baseline: function-name mangles 440 -> 6, 431 fixed, and — critically — 0 regressions (the salvage also recovers names that the pre-parse's own non-local error-recovery shifts would otherwise re-mangle, erasing the 7 shifts seen in #1101). The 6 residual are all the moodycamel `Ret (name)` idiom, correctly left alone. On a made-up macro with no list entry (`WEBKIT_EXPORT WTFString compute()`), the name `compute` is still recovered. Full suite green; eleven regression/safety tests added. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
colbymchenry
added a commit
that referenced
this pull request
Jul 1, 2026
…1103) * fix(extraction): broaden the curated C++ inline-macro library list Since #1102 the post-parse salvage already recovers the NAME for any macro, so adding a library now buys full return-type recovery for it. Extend the curated list across the major C++ ecosystem: Mozilla/SpiderMonkey, Protobuf, {fmt}, Hedley + nlohmann/json, GLM, Bullet (SIMD_FORCE_INLINE), Skia, OpenCV, EASTL, Cocos2d-x, Chromium/WebKit (NEVER_INLINE), GLib, SQLite, and the unambiguous Windows calling conventions (WINAPI / APIENTRY / STDMETHODCALLTYPE / WINAPIV — which sit between the return type and the name, so blanking them recovers the return type, e.g. `HRESULT WINAPI Foo()` -> Foo : HRESULT). Every entry is an exact, curated token matched only in specifier position, so a real all-caps return type is never touched. Anything still missed keeps its name via the universal salvage. CARLA control unchanged (440->6 mangles, 0 regressions — none of these libs appear there, confirming no collateral). Eleven representative full-recovery tests added. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(changelog): note broadened C++ inline-macro library coverage (#1103) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The curated inline-macro blank list (#1100/#1101) is precise but per-library — it can't enumerate every codebase's macro, so a new library's
MACRO ReturnType func()still leaks the return type into the function's name. This makes it effectively universal while keeping the curated list for the common libraries.Two tiers
Q_INVOKABLE, …), Folly, Abseil, LLVM, V8, Eigen, rapidjson.recoverMangledCppName, a newrecoverMangledNameextractor hook wired only onto C/C++. After extraction, if a name is still mangled ("WTFString computeThing"), recover the identifier before the params (computeThing). Works for any macro, no list. Recovers the name (return type stays leaked for unlisted macros).Safe by construction
The salvage only ever touches an already-mangled name — one with an internal space that isn't a legit
operator …/destructor — so a clean name is returned unchanged. Guarded against the two mis-pick cases: theRet (name)parenthesized-name idiom (left as-is) and bare primitives. Scoped to C/C++, so Kotlin/Scala backtick identifiers (`decode simple certificate`, which legitimately contain spaces) are never touched.Validation on CARLA (1131 C++/h files)
vs the pre-fix baseline, kind-agnostic by source position:
The salvage also recovers names that the pre-parse's own non-local error-recovery shifts would otherwise re-mangle — erasing the 7 shifts from #1101. The 6 residual are all the moodycamel
Ret (name)idiom, deliberately left alone. On a made-up macro with no list entry (WEBKIT_EXPORT WTFString compute()),computeis still recovered. 0 of CARLA's 10,037 clean names touched.Tests
Full suite green (1880 passed). Eleven cases added: unknown-macro recovery, the
recoverMangledCppNameguards (operators, destructors, paren-idiom, primitives, non-identifier tails), cross-language safety (Kotlin backtick untouched), and full recovery for each new curated library.🤖 Generated with Claude Code