Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
- C++ methods that return a reference, and user-defined conversion operators, are now indexed under their correct names. An inline getter like `const FGameplayTagContainer& GetActiveTags() const` — everywhere in Unreal Engine headers — was indexed as `& GetActiveTags() const` instead of `GetActiveTags`, and a conversion operator like `operator EALSMovementState() const` kept its trailing `() const` instead of reading `operator EALSMovementState`. In both cases the garbled name meant you couldn't find the symbol by name and its callers weren't linked. Both now read cleanly, matching how pointer-returning and value-returning methods already worked. (#1096)
- C++ functions written with an inline-specifier macro before the return type are now indexed correctly. In Unreal Engine, inline helpers are commonly written `FORCEINLINE FString GetEnumerationToString(...)`; the `FORCEINLINE` macro made the parser read the return type as part of the function's name (`FString GetEnumerationToString` instead of `GetEnumerationToString`) and lose the real return type, so the function couldn't be found by name and its callers weren't linked. CodeGraph now recognizes the standard Unreal inline macros (`FORCEINLINE`, `FORCENOINLINE`, `FORCEINLINE_DEBUGGABLE`), so both the name and the return type are captured. (#1100)
- The same function-name recovery now covers inline macros from common third-party C++ libraries, not just Unreal Engine — including pugixml (`PUGI__FN`, `PUGIXML_FUNCTION`), Godot (`_FORCE_INLINE_`), Boost (`BOOST_FORCEINLINE`), and generic `ALWAYS_INLINE` / `FORCE_INLINE`. Functions decorated with these are now indexed under their real names. On a large Unreal project vendoring these libraries this cleaned up the large majority of remaining function-name garbling. (#1101)
- C++ function names are now recovered even when decorated with a macro CodeGraph doesn't specifically know about. A function written `SOME_LIBRARY_MACRO ReturnType doWork(...)` previously had the macro or return type absorbed into its name whenever the macro wasn't one CodeGraph recognized; now the real name (`doWork`) is recovered regardless of the macro, so it's findable and its callers link — no per-library configuration needed. The recognized-macro list was also broadened (Qt, Folly, Abseil, LLVM, V8, Eigen, rapidjson) so those additionally capture the return type. This only ever cleans up an already-garbled name and is limited to C and C++, so ordinary names — and languages like Kotlin and Scala where identifiers can legitimately contain spaces — are unaffected. (#1102)


## [1.1.6] - 2026-06-30
Expand Down
57 changes: 56 additions & 1 deletion __tests__/extraction.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ import * as os from 'os';
import { CodeGraph } from '../src';
import { extractFromSource, scanDirectory, buildDefaultIgnore, discoverEmbeddedRepoRoots, buildScopeIgnore } from '../src/extraction';
import { detectLanguage, isLanguageSupported, getSupportedLanguages, initGrammars, loadAllGrammars, isSourceFile } from '../src/extraction/grammars';
import { stripCppTemplateArgs, blankCppExportMacros, blankCppInlineMacros } from '../src/extraction/languages/c-cpp';
import { stripCppTemplateArgs, blankCppExportMacros, blankCppInlineMacros, recoverMangledCppName } from '../src/extraction/languages/c-cpp';
import { normalizePath } from '../src/utils';

beforeAll(async () => {
Expand Down Expand Up @@ -2995,6 +2995,61 @@ class APXCharacter { // the one real definition
});
});

describe('C++ universal macro-mangled name recovery', () => {
// Curated pre-parse blanking can't list every library's inline macro, so a
// post-parse salvage recovers the real function name from ANY leftover
// `MACRO Ret name(…)` mangle — no list needed. It only ever touches an
// already-mangled name, so it can't corrupt a clean one.
const namesOf = (code: string, file = 's.cpp') =>
extractFromSource(file, code).nodes
.filter((n) => n.kind === 'method' || n.kind === 'function')
.map((n) => n.name);

it('recovers the name from a completely unknown macro (no list entry)', () => {
expect(namesOf('WEBKIT_EXPORT WTFString computeThing(int x) { return H(x); }')).toContain('computeThing');
expect(namesOf('SOMELIB_INLINE MyResult doWork(int x) { return H(x); }')).toContain('doWork');
expect(namesOf('MZ_FORCEINLINE char_t* to_str(double v) { return H(v); }')).toContain('to_str');
});

it('recoverMangledCppName only touches already-mangled names, with guards', () => {
// Recovered:
expect(recoverMangledCppName('WTFString computeThing')).toBe('computeThing');
expect(recoverMangledCppName('char_t* to_str(double v)')).toBe('to_str');
expect(recoverMangledCppName('unspecified_bool_type() const')).toBe('unspecified_bool_type');
// Left unchanged — clean names, operators, destructors, the `Ret (name)`
// idiom, and non-identifier tails:
expect(recoverMangledCppName('computeThing')).toBe('computeThing');
expect(recoverMangledCppName('operator EALSMovementState')).toBe('operator EALSMovementState');
expect(recoverMangledCppName('~Widget')).toBe('~Widget');
expect(recoverMangledCppName('bool (likely)')).toBe('bool (likely)');
expect(recoverMangledCppName('void (free)')).toBe('void (free)');
expect(recoverMangledCppName('QDockWidget *')).toBe('QDockWidget *');
});

it('does not disturb clean C++ names or non-C++ (Kotlin backtick) names', () => {
expect(namesOf('int foo(int x) { return x; }')).toEqual(['foo']);
// Kotlin backtick identifiers legitimately contain spaces; the salvage is
// C/C++-only, so they are untouched.
const kt = extractFromSource('T.kt', 'class T {\n fun `decode simple cert`() { }\n}').nodes
.filter((n) => n.kind === 'method' || n.kind === 'function')
.map((n) => n.name);
expect(kt).toContain('`decode simple cert`');
});

it('curated list now also covers Qt / Folly / Abseil / LLVM / V8 / Eigen / rapidjson (full recovery)', () => {
const info = (c: string) =>
extractFromSource('x.cpp', c).nodes
.filter((n) => n.kind === 'method' || n.kind === 'function')
.map((n) => ({ name: n.name, ret: n.returnType }));
expect(info('FOLLY_ALWAYS_INLINE Str f(int x) { return H(x); }')).toEqual([{ name: 'f', ret: 'Str' }]);
expect(namesOf('Q_INVOKABLE void onClicked() { H(); }')).toContain('onClicked');
expect(namesOf('ABSL_ATTRIBUTE_ALWAYS_INLINE int hash(int x) { return H(x); }')).toContain('hash');
expect(namesOf('EIGEN_STRONG_INLINE Scalar dot(const V& v) { return H(v); }')).toContain('dot');
expect(namesOf('V8_INLINE MaybeLocal Get(int i) { return H(i); }')).toContain('Get');
expect(namesOf('RAPIDJSON_FORCEINLINE bool Parse(const char* s) { return H(s); }')).toContain('Parse');
});
});

describe('C++ templated base-class inheritance (#1043)', () => {
// Inheriting from a template (`class D : public Base<int>`) recorded the base
// ref as the full instantiation `Base<int>`, which never name-matched the
Expand Down
47 changes: 47 additions & 0 deletions src/extraction/languages/c-cpp.ts
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,8 @@ function extractCppReturnType(node: SyntaxNode, source: string): string | undefi
}

export const cExtractor: LanguageExtractor = {
// Universal net: recover a real name from any macro-mangled function name.
recoverMangledName: recoverMangledCppName,
functionTypes: ['function_definition'],
classTypes: [],
methodTypes: [],
Expand Down Expand Up @@ -275,6 +277,15 @@ const CPP_INLINE_MACROS = [
'_ALWAYS_INLINE_', '_FORCE_INLINE_',
// Boost
'BOOST_FORCEINLINE', 'BOOST_NOINLINE',
// Qt (per-method markers + inline)
'Q_INVOKABLE', 'Q_SCRIPTABLE', 'Q_ALWAYS_INLINE', 'Q_SLOT', 'Q_SIGNAL',
// Folly / Abseil / LLVM / V8 / Eigen / rapidjson
'FOLLY_ALWAYS_INLINE', 'FOLLY_NOINLINE',
'ABSL_ATTRIBUTE_ALWAYS_INLINE', 'ABSL_ATTRIBUTE_NOINLINE',
'LLVM_ATTRIBUTE_ALWAYS_INLINE', 'LLVM_ATTRIBUTE_NOINLINE',
'V8_INLINE', 'V8_NOINLINE',
'EIGEN_STRONG_INLINE', 'EIGEN_ALWAYS_INLINE', 'EIGEN_DEVICE_FUNC',
'RAPIDJSON_FORCEINLINE',
// Common cross-ecosystem inline/attribute hints
'ALWAYS_INLINE', 'FORCE_INLINE', 'NOINLINE',
] as const;
Expand All @@ -288,6 +299,40 @@ export function blankCppInlineMacros(source: string): string {
return source.replace(CPP_INLINE_MACRO_RE, (m) => ' '.repeat(m.length));
}

// Bare C/C++ type/qualifier tokens that must never be taken as a recovered
// function name (guards `recoverMangledCppName` against the `Ret (name)` idiom,
// where the token before the params is the return type, not the name).
const CPP_PRIMITIVE_NAMES = new Set([
'bool', 'void', 'int', 'char', 'short', 'long', 'float', 'double', 'unsigned',
'signed', 'wchar_t', 'char8_t', 'char16_t', 'char32_t', 'char_t', 'size_t',
'auto', 'const', 'struct', 'class', 'enum', 'union', 'typename',
]);

/**
* Universal fallback (any macro, no list) for a C/C++ function name still mangled
* because a macro we don't blank sat in front of the return type: `MACRO Ret
* name(…)` / `Ret MACRO name(…)` misparse so the return type is glued onto the
* name ("Ret name", "char_t* to_str(double v)"). Recover the real identifier —
* the token immediately before the parameter list (or the last token). This runs
* AFTER the curated pre-parse blank, so it only ever sees the residual tail that
* blanking didn't already fix cleanly (which also recovers the return type).
*
* Safe by construction: only touches an ALREADY-mangled name — one with an
* internal space that isn't a legit `operator …`/destructor — so a well-formed
* name is returned unchanged. Guarded against the two ways it could mis-pick:
* the `Ret (name)` parenthesized-name idiom (left as-is, ambiguous), and a token
* that is a bare primitive/keyword rather than a real identifier.
*/
export function recoverMangledCppName(name: string): string {
if (!/\s/.test(name) || name.startsWith('operator') || name.startsWith('~')) return name;
if (/^\S+\s+\([A-Za-z_]\w*\)/.test(name)) return name; // `Ret (name)` idiom — leave alone
const beforeParams = name.includes('(') ? name.slice(0, name.indexOf('(')) : name;
const tokens = beforeParams.trim().split(/\s+/);
const candidate = tokens[tokens.length - 1];
if (!candidate || !/^[A-Za-z_]\w*$/.test(candidate) || CPP_PRIMITIVE_NAMES.has(candidate)) return name;
return candidate;
}

/** C/C++ source pre-processing before tree-sitter: recover both macro-annotated
* class definitions and macro-prefixed function definitions. Offset-preserving. */
function preParseCppSource(source: string): string {
Expand All @@ -299,6 +344,8 @@ export const cppExtractor: LanguageExtractor = {
// #1061/#946) and macro-prefixed functions (`FORCEINLINE FString Foo()`, #1093
// follow-up) that tree-sitter otherwise misparses.
preParse: preParseCppSource,
// Universal net for any macro the curated blank list misses.
recoverMangledName: recoverMangledCppName,
functionTypes: ['function_definition'],
classTypes: ['class_specifier'],
// A bodiless `class_specifier` is a forward declaration (`class Foo;`) or an
Expand Down
10 changes: 10 additions & 0 deletions src/extraction/tree-sitter-types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,16 @@ export interface LanguageExtractor {
/** Override symbol name extraction (e.g. ObjC multi-part selectors). */
resolveName?: (node: SyntaxNode, source: string) => string | undefined;

/**
* Post-process an already-extracted name to recover a real identifier from a
* name still mangled by a macro the pre-parse didn't blank (C/C++:
* `MACRO Ret name(` misparses to the name "Ret name"). Applied to every name
* this extractor produces, so it MUST be a no-op on a well-formed name — only
* C/C++ set it, because a mangled name there is unambiguous (an internal space),
* whereas e.g. Kotlin/Scala backtick identifiers legitimately contain spaces.
*/
recoverMangledName?: (name: string) => string;

/** Extract property name when the generic name walk fails (e.g. ObjC @property). */
extractPropertyName?: (node: SyntaxNode, source: string) => string | null;

Expand Down
8 changes: 8 additions & 0 deletions src/extraction/tree-sitter.ts
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,14 @@ const VUE_STORE_FILE_SIGNAL = /\bdefineStore\b|\bcreateStore\b|\bVuex\b|\bmutati
* Extract the name from a node based on language
*/
function extractName(node: SyntaxNode, source: string, extractor: LanguageExtractor): string {
const name = extractNameRaw(node, source, extractor);
// Universal fallback: recover a real identifier from a name still mangled by a
// macro the pre-parse didn't blank (C/C++ only — see recoverMangledName). A
// no-op on well-formed names, so a clean name is never altered.
return extractor.recoverMangledName ? extractor.recoverMangledName(name) : name;
}

function extractNameRaw(node: SyntaxNode, source: string, extractor: LanguageExtractor): string {
const hookName = extractor.resolveName?.(node, source);
if (hookName) return hookName;

Expand Down