You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a C# constructor's : base(...) or : this(...) initializer is placed on its own wrapped line (a common Allman-style formatting), SymbolExtractor tokenizes that initializer line as a method declaration and emits a phantom function symbol named base or this. The C# method regex's returnType character class includes :, so a line that begins with whitespace + : + base|this + (args) matches the method pattern with returnType=":", name="base"|"this". Only the wrapped form is affected — same-line public Foo() : base(x) { } is covered by a preceding constructor match and doesn't re-match against the initializer on the same line.
Repro
CDIDX=/root/.local/bin/cdidx
mkdir -p /tmp/dogfood/cs-ctor-chain && cat > /tmp/dogfood/cs-ctor-chain/C.cs <<'EOF'namespace CtorChain;public class Base{ public Base() { } public Base(int x) { } public Base(string s, int n) { }}public class Derived : Base{ // `: base(...)` on same line public Derived(int x) : base(x) { } // `: base(...)` on next line public Derived(string s) : base(s, 0) { } // `: this(...)` chain public Derived() : this(0) { } // `: this(...)` on next line public Derived(int a, int b) : this(a) { } // Expression-bodied constructor public Derived(double d) : base((int)d, "d") => System.Console.WriteLine(d);}EOF"$CDIDX" index /tmp/dogfood/cs-ctor-chain
"$CDIDX" symbols --path "cs-ctor-chain/*" --kind function
Observed (phantoms mixed in with real ctors):
function base C.cs:17-19
function this C.cs:26-28
base and this are C# contextual keywords — they are never valid method names. Any function base or function this row is a false positive.
Suspected root cause
src/CodeIndex/Indexer/SymbolExtractor.cs:94 — the C# method-declaration row:
The returnType character class is [\w?.<>\[\],:]+ — it explicitly includes :. On a wrapped initializer line like : base(s, 0):
The leading : is consumed as returnType.
base is consumed as name (no word-boundary exclusion for contextual keywords).
(s, 0) is consumed as paren.
The full method pattern matches.
The reason : ever made it into the char class appears to be to support edge cases like global:: prefixes — but global:: is already covered by the explicit (?:global::)? prefix, so the stray : in the char class is redundant and harmful.
Suggested direction
Two approaches, either sufficient:
(A) Drop : from the returnType char class. Change :94 from [\w?.<>\[\],:]+ to [\w?.<>\[\],]+. The global:: prefix is already handled by the explicit (?:global::)? alternative. A quick audit of the other C# rows shows no other pattern depends on : inside returnType. Simplest, lowest-risk fix.
(B) Skip lines whose first non-whitespace character is :. Add a guard at the extraction loop (:441-452) that short-circuits lines matching ^\s*:. Also catches any future regex that becomes permissive in the same direction. Slightly safer but adds a per-line check.
Preferred: (A). It removes the root cause from the regex where it originated.
An auxiliary hardening worth considering regardless of which approach is picked: exclude base and this (and new, which is valid C# operator-name territory but never a method-name-in-isolation here) from the name capture via a word-boundary negative lookahead — (?<name>(?!(?:base|this)\b)\w+) — so a future accidental-permissive change can't reintroduce this class of phantom.
Cross-language note
Only C# has a method regex with : in the returnType char class; I spot-checked the Java, TypeScript, and Rust rows in SymbolExtractor.cs and none permit : there. Languages whose method syntax genuinely uses : (TypeScript return-type annotations, Python) use dedicated separate regexes, not a returnType char class on the declarator.
Not affected: same-line : base(...) / : this(...) (because the preceding method regex already consumed the constructor on that line — though note this is a happy accident, not a designed interaction).
Expression-bodied constructors with same-line initializer (public Ctor(d) : base(x) => ...) are also not affected — same "already consumed" accident.
Ctors and dtors themselves are captured correctly; this is purely a phantom-emission bug.
Summary
When a C# constructor's
: base(...)or: this(...)initializer is placed on its own wrapped line (a common Allman-style formatting),SymbolExtractortokenizes that initializer line as a method declaration and emits a phantomfunctionsymbol namedbaseorthis. The C# method regex'sreturnTypecharacter class includes:, so a line that begins with whitespace +:+base|this+(args)matches the method pattern withreturnType=":",name="base"|"this". Only the wrapped form is affected — same-linepublic Foo() : base(x) { }is covered by a preceding constructor match and doesn't re-match against the initializer on the same line.Repro
Observed (phantoms mixed in with real ctors):
baseandthisare C# contextual keywords — they are never valid method names. Anyfunction baseorfunction thisrow is a false positive.Suspected root cause
src/CodeIndex/Indexer/SymbolExtractor.cs:94— the C# method-declaration row:@"^\s*(?:(?:public|private|protected|internal|static|virtual|override|sealed|abstract|async|extern|new|unsafe|partial)\s+)*(?<returnType>\([^)]+\)|(?:global::)?[\w?.<>\[\],:]+)\s+(?<name>\w+)\s*(?:<[^>]+>)?\s*\((?<paren>[^)]*)\)"The
returnTypecharacter class is[\w?.<>\[\],:]+— it explicitly includes:. On a wrapped initializer line like: base(s, 0)::is consumed asreturnType.baseis consumed asname(no word-boundary exclusion for contextual keywords).(s, 0)is consumed asparen.The reason
:ever made it into the char class appears to be to support edge cases likeglobal::prefixes — butglobal::is already covered by the explicit(?:global::)?prefix, so the stray:in the char class is redundant and harmful.Suggested direction
Two approaches, either sufficient:
(A) Drop
:from the returnType char class. Change:94from[\w?.<>\[\],:]+to[\w?.<>\[\],]+. Theglobal::prefix is already handled by the explicit(?:global::)?alternative. A quick audit of the other C# rows shows no other pattern depends on:inside returnType. Simplest, lowest-risk fix.(B) Skip lines whose first non-whitespace character is
:. Add a guard at the extraction loop (:441-452) that short-circuits lines matching^\s*:. Also catches any future regex that becomes permissive in the same direction. Slightly safer but adds a per-line check.Preferred: (A). It removes the root cause from the regex where it originated.
An auxiliary hardening worth considering regardless of which approach is picked: exclude
baseandthis(andnew, which is valid C# operator-name territory but never a method-name-in-isolation here) from thenamecapture via a word-boundary negative lookahead —(?<name>(?!(?:base|this)\b)\w+)— so a future accidental-permissive change can't reintroduce this class of phantom.Cross-language note
Only C# has a method regex with
:in the returnType char class; I spot-checked the Java, TypeScript, and Rust rows inSymbolExtractor.csand none permit:there. Languages whose method syntax genuinely uses:(TypeScript return-type annotations, Python) use dedicated separate regexes, not a returnType char class on the declarator.Scope
src/CodeIndex/Indexer/SymbolExtractor.cs:94(C# method regex), downstream consumers (symbols,definition,references,callers,callees,inspect).: base(...)/: this(...)(because the preceding method regex already consumed the constructor on that line — though note this is a happy accident, not a designed interaction).public Ctor(d) : base(x) => ...) are also not affected — same "already consumed" accident.Related
[A, B(args)]leak the 2nd+ attribute names as phantomfunctionsymbols #330 — C# multi-section attributes[A, B(args)]leak phantom functions. Same regex (:94), same root cause family (returnType char class too permissive for,/[/]).+,implicit,explicit,this) #213 — conversion operators / indexer naming. Different symptoms, same file.Task<Result<A, B>>,Dictionary<K, V>) are silently dropped — idiomatic .NET formatting is effectively unindexed #222, C#: methods with pointer / function-pointer return types are dropped from the symbol index (int*,void**,delegate*<...>,int*[]) #234, C#:[assembly: Attr(args)]/[module: Attr(args)]lines falsely indexed as method definitions #219 — other SymbolExtractor C# permissiveness issues.Environment
/root/.local/bin/cdidx)/tmp/dogfood/cs-ctor-chain/C.cs