Skip to content

search_code: default regex=false treats | as literal pipe, not alternation (silent 0-match trap) #282

@cryptomaltese

Description

@cryptomaltese

Summary

search_code defaults to regex: false. Under that default, metacharacters like | are treated as literal string characters. A pattern like foo|bar searches for the literal substring "foo|bar" rather than "foo OR bar". The result — an empty match set — is indistinguishable from a legitimate no-match, and easily misdiagnosed as a stale index or broken project.

Because the tool sits alongside ripgrep-style muscle memory (where | means alternation by default) and shares a pattern: parameter with grep/rg, this default is a common foot-gun.

Reproduction

Indexed repo: any Python project with a test file that uses two helper names.

# tool call
mcp__codebase-memory-mcp__search_code(
    project="<project>",
    pattern="make_snapshot|RateSnapshot(",
    file_pattern="tests/dnpm_v2/test_scanner_core.py",
    mode="compact",
    limit=5,
)
# → {"results": [], "total_grep_matches": 0, "raw_match_count": 0}

The file contains 34 occurrences of make_snapshot and 3 of RateSnapshot( (confirmed via rg). But because the default regex: false interprets the pipe literally, rg (or whatever grep backend is used) searches for the literal string make_snapshot|RateSnapshot(, which doesn't exist anywhere → 0 results.

Same invocation with each term alone returns the expected counts:

search_code(pattern="make_snapshot",  ...)   # → 34 grep matches (17 results after graph-dedup)
search_code(pattern="RateSnapshot(", ...)    # →  3 grep matches ( 2 results)

Flipping regex: true works for the non-pathological case:

search_code(pattern="make_snapshot|RateSnapshot", regex=true, ...)  # → 39 grep matches

Why this is a trap

  • The tool's parameter (pattern) and docs ("text patterns via grep") imply grep-style matching. Most practitioners' muscle memory is ripgrep, which defaults to regex ON.
  • The zero-result shape matches a legitimate no-match, so the default behavior is invisible — you only discover it by isolating terms or reading the schema carefully.
  • In my case I spent time re-checking whether my index was stale or the project path was wrong before realizing it was the default.

Suggested fixes (pick one)

  1. Flip the default to regex: true. Matches ripgrep and most users' expectations. Backward-incompatible for users who were passing patterns with literal |/./( and relying on non-regex behavior, but those callers can explicitly set regex: false.
  2. Keep the default but document it loudly in the tool description, with a worked example like "use regex: true for foo|bar alternation; default treats | as a literal pipe".
  3. Accept both a pattern (regex) and a literal_pattern (always literal) param so the choice is explicit at the call site.

My preference is (1) because search_code's description explicitly invokes ripgrep semantics ("graph-augmented grep"), and a ripgrep-ish tool that silently disagrees with ripgrep on alternation is surprising.


Related but distinct (filed separately): under regex: true, a syntactically invalid pattern (e.g. unclosed () also returns {results: [], total: 0} silently rather than surfacing a parse error. That compounds this trap.

Environment

  • MCP package: codebase-memory-mcp (version as surfaced via tool list in Claude Code session, 2026-04-22)
  • Client: Claude Code
  • Repro is platform-agnostic (observed on Linux; issue is in pattern interpretation, not filesystem)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions