Skip to content

C# methods / properties with contextual-keyword modifier (partial, required, readonly) + tuple-suffix return type emit phantom function partial / function required / function readonly rows via ctor-regex fallback #349

@Widthdom

Description

@Widthdom

Summary

When a C# method or property has the shape visibility<space>KEYWORD<space>(TupleOrTupleSuffix) Name and the primary method / property regex fails to match (because its returnType slot cannot accept a tuple with a trailing ? / [] suffix — see #328 / #338), the lower-priority constructor regex at SymbolExtractor.cs:97 claims the line with visibility as visibility and the modifier keyword itself (partial, required, readonly) as the constructor name. The result is a phantom function partial / function required / function readonly row whose "column" is the modifier keyword, not any real identifier.

// line 6 → phantom  function partial    (real P1 dropped)
public partial (int, int)? P1();

// line 14 → phantom  function required   (real R1 dropped)
public required (int, int) R1 { get; init; }

// line 35 → phantom  function readonly   (real M dropped)
public readonly (int, int)? M() => null;

This is a new phantom family distinct from the existing method-regex-backtrack phantoms (function static #342, function readonly #336, function delegate #340, function const #346) because the mechanism is different: those come from the method regex visibility-group backtracking inside :94. The phantoms here come from the ctor regex at :97 matching public\s+\w+\s*\( on a line where the method/property regex failed.

Repro

CDIDX=/root/.local/bin/cdidx
mkdir -p /tmp/dogfood/cs-modifier-phantom
cat > /tmp/dogfood/cs-modifier-phantom/M.cs <<'EOF'
namespace ModifierPhantom;

// `partial` + tuple-suffix — method regex fails → ctor regex grabs `partial` as name
public partial class A
{
    public partial (int, int)? P1();
    public partial (int, int)[] P2();
    public partial System.Collections.Generic.List<(int, int)> P3();  // no phantom; see note
}

// `required` + tuple property — property regex fails → ctor regex grabs `required`
public class B
{
    public required (int, int) R1 { get; init; }
    public required (int, int)? R2 { get; init; }
}

// Plain tuple-suffix methods without an extra modifier — silent drop, NO phantom
public class C
{
    public (int, int)? M1() => null;
    public (int, int)[] M2() => null;
    public (int, int) M3() => (0, 0);       // baseline tuple — CAPTURED
}

// `readonly` on method inside `readonly struct` + tuple-suffix — ctor regex grabs `readonly`
public class D
{
    public readonly struct E
    {
        public readonly (int, int)? M() => null;
    }
}
EOF
"$CDIDX" index /tmp/dogfood/cs-modifier-phantom --rebuild
"$CDIDX" symbols --db /tmp/dogfood/cs-modifier-phantom/.cdidx/codeindex.db

Observed:

class      A                                        M.cs:4-9
class      B                                        M.cs:12-16
class      C                                        M.cs:19-27
class      D                                        M.cs:30-37
struct     E                                        M.cs:32-36
function   M3                                       M.cs:26                ← baseline tuple captured
function   partial                                  M.cs:6                 ← phantom (real P1)
function   partial                                  M.cs:7                 ← phantom (real P2)
function   readonly                                 M.cs:35                ← phantom (real M)
function   required                                 M.cs:14                ← phantom (real R1)
function   required                                 M.cs:15                ← phantom (real R2)
namespace  ModifierPhantom                          M.cs:1

Note that P3 (public partial System.Collections.Generic.List<(int, int)> P3();) does not produce a phantom because the ctor regex public\s+\w+\s*\( fails — after partial the next token is System (letter), not (. Phantoms only emit when the character immediately after the modifier keyword's trailing space is (.

Similarly, plain tuple-suffix methods (M1, M2) drop silently without a phantom because their shape is public (...) with no \w+ between public and ( — the ctor regex needs a name token between visibility and the open paren.

Suspected root cause

src/CodeIndex/Indexer/SymbolExtractor.cs:97 (instance constructor regex):

new("function",  new Regex(
    @"^\s*(?<visibility>public|private|protected\s+internal|private\s+protected|protected|internal)\s+(?<name>\w+)\s*\(",
    RegexOptions.Compiled),
    BodyStyle.Brace, "visibility"),

This regex greedily matches any line starting with a visibility keyword + a single identifier token + (. It assumes that at this point in the PatternCache, the earlier, more specific rows (method :94, property :100/:103, indexer :118, explicit interface :116) have been tried and failed.

The problem is that none of those earlier rows' returnType slots accept a tuple with a trailing ? / [] / * suffix. When the tuple-suffix method/property regex fails:

  1. public partial (int, int)? P1(); — method regex :94 tries returnType=(int, int) tuple-alt, then needs \s+name, but the next char is ?. Fails.
  2. Ctor regex :97 tries visibility=public, name=partial, then \s*\( — the space after partial and the ( of the tuple satisfy it. Match wins.
  3. A symbol is emitted with name="partial", visibility="public", BodyStyle.Brace.

The ctor regex has no guard against:

  • Contextual keywords that are not valid constructor names (partial, required, readonly, async, sealed, virtual, override, abstract, new, file, unsafe, extern, static).
  • Cases where the \s*\( is the start of a tuple literal rather than a parameter list (no way to tell from the ctor regex's limited shape).

For the real constructor case this regex is intended to handle (public Foo() { }), the name Foo would be the class name. So an extra guard that rejects contextual keyword names would not break legitimate ctor matches.

Suggested direction

(A) Add a keyword-name negative lookahead to the ctor regex at :97:

new("function",  new Regex(
    @"^\s*(?<visibility>public|private|protected\s+internal|private\s+protected|protected|internal)\s+"
  + @"(?<name>(?!(?:partial|required|readonly|async|sealed|virtual|override|abstract|new|file|unsafe|extern|static|ref|out|in|const|volatile|event|delegate|record|class|struct|interface|enum|namespace|using|return|throw|yield|var|typeof|sizeof|nameof|default|if|for|foreach|while|switch|catch|lock|this|base)\b)\w+)"
  + @"\s*\(",
    RegexOptions.Compiled),
    BodyStyle.Brace, "visibility"),

The negative lookahead rejects modifier keywords and other C# contextual keywords as ctor names. Real ctors use the class identifier, which is never a keyword.

(B) Fix the upstream tuple-suffix support in :94 / :100 / :103. #328 / #338 already track this. Once the method/property regex accepts (TupleAlt)(?:\?|\[\])* as returnType, the failing-fallback path to ctor regex goes away for tuple-suffix cases. This is the structurally correct fix but spans multiple regex rows.

Preferred: (A) in addition to (B). (A) hardens the ctor regex against the class of phantom regardless of which upstream regex happens to fail. (B) fixes the immediate families in #328/#338/#347. Both should ship; (A) is a single-line change that remains useful after (B) lands because other upstream regex failures (e.g. unusual where clauses, C# 14 extension blocks, future syntax) would trigger the same ctor-regex-grabs-keyword pattern.

Why it matters

  • Phantom function partial / function required / function readonly rows pollute symbols, definition, outline, hotspots, and unused. A project with N partial method declarations containing tuple-suffix returns emits N phantoms, each at the declaration line.
  • Real symbols are lost. Alongside each phantom, the real method/property is silently dropped. definition P1 --exact returns No definitions found. on the repro above.
  • callers / callees / impact miss the real symbols. Call graphs terminate prematurely because the real definitions aren't indexed; references still land as raw call-site matches but can't be resolved to a defined symbol.
  • Visible in every modern C# codebase with required members (C# 11+ DTOs), partial methods with richer return types (Razor / source generators / minimal APIs), or readonly members on readonly structs.
  • Silent. No warning is emitted; the phantom looks like a normal function row unless the user notices the name is a C# keyword.

Cross-language note

  • C# — documented here.
  • Java — Java's ctor regex shape is similar but Java has no equivalent modifier keywords (partial, required, readonly-as-member-modifier) that combine with tuple-like return syntax. Java tuples don't exist, and Record primary ctors are a class-level construct. Not affected by this specific mechanism.
  • Kotlin / Swift / Rust — different ctor/syntax, not affected.

Scope

Related

Environment

  • cdidx: v1.10.0 (/root/.local/bin/cdidx).
  • Platform: linux-x64.
  • Fixture: /tmp/dogfood/cs-modifier-phantom/M.cs.
  • Filed from a cloud Claude Code session per CLOUD_BOOTSTRAP_PROMPT.md.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions