Skip to content

C# operators with a tuple / spaced-generic return type emit phantom function static rows (and the real operator is silently dropped) — operator regex \S+ can't cross tuple whitespace, method regex backtracks into public→returnType / static→name #342

@Widthdom

Description

@Widthdom

Summary

C# operators whose return type is a tuple ((int, int)), a named tuple ((int a, int b)), a nullable tuple ((int, int)?), or a generic containing a tuple (List<(int, int)>) are silently dropped from the operator slot and leak a phantom function static row at the same line. The phantom pollutes symbols / definition / inspect / outline / hotspots --kind function, and the real operator becomes un-navigable by its operator name (+, -, ==, etc.).

This is distinct from:

The mechanism is the same backtracking family as #336 / #340, but the visible phantom name is static (not delegate / readonly), because on an operator line the next token after the public that backtrack captures as returnType is static, not delegate / readonly.

Repro

CDIDX=/root/.local/bin/cdidx
mkdir -p /tmp/dogfood/cs-op-tuple
cat > /tmp/dogfood/cs-op-tuple/O.cs <<'EOF'
namespace OpTupleProbe;

public struct M
{
    // Tuple-return operator
    public static (int, int) operator +(M a, M b) => (0, 0);

    // Named-tuple return
    public static (int a, int b) operator -(M x, M y) => (0, 0);

    // Plain-return baseline — captured correctly
    public static int operator *(M a, M b) => 0;

    // Tuple-return comparison
    public static (bool, bool) operator ==(M a, M b) => (false, false);
    public static (bool, bool) operator !=(M a, M b) => (false, false);

    // Generic-over-tuple return
    public static System.Collections.Generic.List<(int, int)> operator %(M a, M b) => new();
}
EOF
"$CDIDX" index /tmp/dogfood/cs-op-tuple --rebuild
"$CDIDX" symbols --db /tmp/dogfood/cs-op-tuple/.cdidx/codeindex.db

Actual:

function   *                                        O.cs:12
struct     M                                        O.cs:3-20
namespace  OpTupleProbe                             O.cs:1
function   static                                   O.cs:6
function   static                                   O.cs:9
function   static                                   O.cs:15
function   static                                   O.cs:16

Observations:

  • operator + (L6), operator - (L9), operator == (L15), operator != (L16) → phantom function static at the same line.
  • operator * (L12) → captured correctly as function * (plain-return baseline).
  • operator % (L19) with List<(int, int)> return → silently dropped (not even a phantom; no row at all).

cdidx definition '+' returns zero hits for the tuple-return +, because it is filed under the name static, not +.

Suspected root cause (from reading the source)

src/CodeIndex/Indexer/SymbolExtractor.cs:87 — operator row:

new("function",  new Regex(
    @"^\s*(?:(?<visibility>public|private|protected\s+internal|private\s+protected|protected|internal)\s+)?"
  + @"static\s+\S+\s+operator\s+(?<name>\S+)\s*\(", ...), BodyStyle.Brace, "visibility"),

The return-type slot is \S+ — a single non-whitespace run. For (int, int) there is a space after the comma, so \S+ matches only (int, and then fails at the space before int). For List<(int, int)> the <(int, prefix also stops at whitespace. The operator row never matches tuple-return operators.

src/CodeIndex/Indexer/SymbolExtractor.cs:94 — method row runs next in PatternCache ordering and the extraction loop breaks on the first match. Trace for public static (int, int) operator +(M a, M b) => (0, 0);:

  1. Visibility group (?:(?<visibility>public|...)\s+)? greedily tries public.
  2. Modifier loop (?:(?:static|sealed|...)\s+)* eats static.
  3. returnType tries the tuple alternative \([^)]+\) on (int, int) → matches.
  4. name \w+ matches operator.
  5. \s*(?:<[^>]+>\s*)?\( needs ( but the next non-space char is + → fails.
  6. Regex engine backtracks. Visibility group is optional, so it backs off (empty).
  7. Modifier loop can't match public (not in the list), so it's empty too.
  8. returnType (?:global::)?[\w?.<>\[\],:]+ matches the plain word public.
  9. name \w+ matches static.
  10. \s*(?:<[^>]+>\s*)?\( — after static the cursor is at (int, int)...; \s* eats the space, \( matches (. Successful match with returnType=public, name=static. Phantom emitted.

For the List<(int, int)> variant the method regex's tuple alternative only matches a (...) at the start of the returnType run — once the regex is inside List<...> it falls back to the char class [\w?.<>\[\],:]+ which has no \s, (, or ), so it can't cross (int, int) either. Neither path matches, and no later row rescues it (no operator-specific row tolerates tuples), so the symbol is dropped without even a phantom.

The function static phantom then survives into the pattern cache's break-on-first-match loop (SymbolExtractor.cs:441-452), so the operator row at :87 is never re-tried.

Suggested direction

Two independent regex changes — either alone only partially fixes it; both are needed to (a) stop the phantom and (b) capture the real operator name.

  1. Prevent visibility backtrack into returnType on the method row :94 — same fix proposed for C# readonly fields with tuple types (public readonly (int, int) X;) emit phantom function readonly rows — method regex backtracks public into returnType #336 / C# delegates with tuple return types (public delegate (int, int) MakePair();) are dropped AND emit phantom function delegate rows #340. Add a negative lookahead on the returnType group so visibility keywords cannot be captured as returnType:

    (?<returnType>\([^)]+\)|(?!(?:public|private|protected|internal)\b)(?:global::)?[\w?.<>\[\],:]+)

    After this fix, the method row at :94 no longer matches the tuple-return operator line at all — the phantom function static disappears.

  2. Let the operator row :87 accept tuple and generic-with-space return types — replace \S+ with the same tuple-or-char-class alternation the method row uses (with \s allowed so spaced generics work too):

    static\s+(?:\([^)]+\)|[\w?.<>\[\],:\s]+?)\s+operator\s+(?<name>\S+)\s*\(

    The non-greedy ? on the char class is important because operator would otherwise be consumed as part of returnType.

    Note :84 (conversion operator) doesn't need the same edit for this bug — conversion operators with tuple targets (operator (int, int)(...)) are a distinct case tracked under the C#: operator overloads, conversion operators, and indexers produce non-navigable symbol names (+, implicit, explicit, this) #213 family; here we only need + / - / == / != / % to be captured correctly.

Unit tests to cover: each shape in the repro above, plus (for regression) operator checked + (tracked as #238) and conversion operators (#213) to confirm they still route to their own rows.

Why it matters

  • Phantom function static rows inflate hotspots --kind function and pollute any symbols/definition listing of the type.
  • The real operator is not findable by its operator name (cdidx definition '+' misses it). An AI agent asked "where is operator + defined on Money?" will see zero hits and may conclude the operator doesn't exist.
  • .NET 7+ numeric / math-adjacent APIs (ML.NET, System.Numerics, game-math libraries, matrix libraries) routinely return tuples from operators for multi-component results. These are systematically unindexed today.

Cross-language note

C#-specific. Java doesn't have operator overloading; Kotlin operator functions follow the method-row path. Rust / Swift / Scala route through their own rows. The fix is C#-scoped to :87 and :94.

Scope

Related

Environment

  • cdidx: v1.10.0 (/root/.local/bin/cdidx).
  • Platform: linux-x64.
  • Filed from a cloud Claude Code session per CLOUD_BOOTSTRAP_PROMPT.md.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions