Skip to content

mcp: C++ source-search tools (cpp_grep_usage / cpp_find_symbol / cpp_outline / cpp_goto_definition) + with_cpp_source redirect#2602

Merged
borisbat merged 2 commits into
masterfrom
cpp-mcp-tools
May 8, 2026
Merged

mcp: C++ source-search tools (cpp_grep_usage / cpp_find_symbol / cpp_outline / cpp_goto_definition) + with_cpp_source redirect#2602
borisbat merged 2 commits into
masterfrom
cpp-mcp-tools

Conversation

@borisbat
Copy link
Copy Markdown
Collaborator

@borisbat borisbat commented May 8, 2026

Summary

Adds four ast-grep-backed MCP tools for navigating the C++ side of the codebase (src/, include/, modules/), plus a one-call bridge from daslang symbols to their C++ source via a new with_cpp_source flag on the existing find_symbol and goto_definition tools, plus a behavioral rule + .mcp.json knob to make the MCP tools the first thing the assistant reaches for.

Closes the gap where .das symbol lookup had four parse-aware MCP tools but the ~1,400 C++ files had only raw grep — and where the assistant would default to Bash/Grep even for .das lookups because the deferred-schema friction made MCP feel like the slower path.

New tools

Tool Purpose
cpp_grep_usage Parse-aware identifier-leaf search across .cpp/.cc/.h/.hpp via sg scan + a runtime-generated YAML rule covering all four C++ identifier-leaf kinds (identifier, type_identifier, namespace_identifier, field_identifier). Hits dedup by (file, line).
cpp_find_symbol C++ symbol declarations by name + kind (function / class / struct / enum / union / typedef / namespace / macro). Substring or =exact. Result cap configurable via CPP_MAX_FIND_RESULTS (default 50).
cpp_outline v2: function signatures (multi-line preserved + const/noexcept/override qualifiers), template specializations as distinct entries (Trait<int> vs Trait<float>), class/namespace nesting via containment forest, qualified names for in-class declarations (Outer::method), DAS_API false-positive filtering, anonymous-noise filtering. Auto / tree / flat render modes.
cpp_goto_definition Approximate "where is this defined" — best-effort, no scope resolution; ranks candidates (same-file > same-dir > shorter path). Candidate cap configurable via CPP_MAX_GOTO_CANDIDATES (default 5).

All four are gated on ast_grep_available. Default search scope is src/ include/ modules/ with build*/, _deps/, 3rdparty/ always excluded; folders containing a .git file/dir (submodules, daspkg, FetchContent) auto-excluded. Implementation is rule-yaml + sg scan (the sg run -p pattern path is unreliable for C++ kind queries). Rule file tree-sitter-daslang/cpp_outline_rules.yml covers eight productive C++ kinds plus the field_declaration → function_declarator path for in-class method declarations.

with_cpp_source redirect

find_symbol and goto_definition accept with_cpp_source : bool = false. When true, results that have a non-empty cppName (builtin functions, addExtern-registered functions, handled types via MAKE_TYPE_FACTORY) get a resolved C++ source location appended via a lazily-built cppName → array<CppMatch> index. First call costs ~2s (one full scan); subsequent calls are ~150ms (a git-state staleness signature: rev-parse HEAD + filtered git status + per-file mtimes + cpp_search_config.das mtime). The index rebuilds automatically when relevant .cpp/.cc/.h/.hpp files change, when HEAD moves, or when the search config is edited. Default off.

The same index now backs cpp_goto_definition — same amortized cost, then O(1) per goto.

When the index is unavailable (ast-grep missing, scan crash, etc.), find_symbol / goto_definition / cpp_goto_definition all surface the underlying reason via cpp_index_status() so users can distinguish "no match" from "infrastructure error".

Configuration

Tool behavior is centralized in utils/mcp/cpp_search_config.das:

Constant Default Purpose
CPP_SEARCH_DIRS ["src", "include", "modules"] Folders walked recursively
CPP_SEARCH_INCLUDE_GLOBS ["*.cpp", "*.cc", "*.h", "*.hpp"] File extensions to scan
CPP_SEARCH_ALWAYS_EXCLUDE **/build*/**, **/_deps/**, **/3rdparty/**, … Always-excluded paths
CPP_SEARCH_INCLUDE_OVERRIDES [] Force-include folders that contain a .git file/dir
CPP_MAX_FIND_RESULTS 50 cpp_find_symbol declaration cap
CPP_MAX_GOTO_CANDIDATES 5 cpp_goto_definition ranked-candidate cap

Edit and restart the MCP server (or touch any tracked source) to pick up changes — config mtime is part of the index staleness signature.

"MCP-first search" rule + .mcp.json defer_loading

CLAUDE.md gets a section directing the assistant: before reaching for Bash/Grep/Read for any symbol or usage lookup in the repo, ToolSearch and call the matching MCP tool. Table covers all 9 search-class tools (find_symbol / cpp_find_symbol / grep_usage / cpp_grep_usage / outline / cpp_outline / find_references / goto_definition / cpp_goto_definition) with the with_cpp_source flag called out for the daslang→C++ bridge.

.mcp.json (gitignored — per-user) gains "defer_loading": false on the daslang server entry. The flag is documented but reportedly broken upstream; harmless to set, falls back to deferred-tool path otherwise. Documented in skills/mcp_tools.md, install/skills/mcp_tools.md, and the RST.

Critical correctness prerequisite

sgconfig.yml.{linux,osx,windows} get a languageGlobs: { cpp: ["*.h", "*.hpp"] } block. Without it ast-grep classifies .h files as C (not C++) and 375 headers — 24% of the C++ surface — silently produce zero matches. The lint pass against include/daScript/simulate/fs_file_info.h is the regression test in test_tools.das.

Caveats

  • Best-effort name extraction: complex templates and function-pointer typedefs may report partial names
  • Macro-expanded declarations (e.g. DAS_BIND_FN(foo)) are invisible to ast-grep
  • cpp_goto_definition is approximate — no scope resolution, no overload disambiguation. For substring/usage searches, prefer cpp_grep_usage. A clangd-backed precise mode is on the v2 roadmap.

Tests

utils/mcp/test_tools.das — total now 259/259 green. Coverage:

  • Per-tool happy path (addExternFunc in src/ast/ast_interop.cpp)
  • .h languageGlobs regression (FsFileSystem in include/daScript/simulate/fs_file_info.h)
  • Kind filter rejects wrong kinds
  • No-match returns clean empty result, not error
  • Error paths (missing args, bad directory, bad line/column)
  • with_cpp_source on/off behavior for both find_symbol and goto_definition
  • cpp_outline v2: function signatures, template specializations distinct, anonymous noise filtered, DAS_API misparse filtered, qualified names for in-class declarations, nesting (auto/tree/flat modes)
  • cpp_grep_usage: type-position hits + (file, line) dedup

Synthetic fixture at utils/mcp/tests/_fixture_cpp_outline.h exercises every TODO item.

Test plan

  • bin/daslang dastest/dastest.das -- --test utils/mcp/test_tools.das → 259/259 PASS
  • mcp__daslang__lint on all .das files in the diff → 0 issues
  • mcp__daslang__format_file on all .das files → already_formatted
  • Sphinx HTML build → build succeeded.
  • Manual smoke: sg scan -r tree-sitter-daslang/cpp_outline_rules.yml include/daScript/simulate/fs_file_info.h --json returns matches with language: Cpp
  • find_symbol("=get_das_root", with_cpp_source=true)→ cpp: src/builtin/module_builtin_runtime.cpp:1531 in one call
  • Copilot review fixes through round 4 (commit 27d0c0d4d); threads resolved
  • CI green

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 8, 2026 00:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds ast-grep/tree-sitter-cpp–backed MCP navigation tools for the repository’s C++ sources, and extends existing daslang symbol navigation tools to optionally append resolved C++ source locations.

Changes:

  • Introduces four new C++ navigation tools (cpp_grep_usage, cpp_find_symbol, cpp_outline, cpp_goto_definition) powered by sg scan/run + C++ rules.
  • Adds with_cpp_source to find_symbol and goto_definition to resolve cppName → file:line via a lazily-built index.
  • Updates MCP protocol wiring, docs, tests, and sgconfig templates (notably .h/.hpp classified as C++).

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
utils/mcp/tools/goto_definition.das Adds cppName output for builtins and optional with_cpp_source C++ location append.
utils/mcp/tools/find_symbol.das Adds with_cpp_source redirect output using the lazy C++ index.
utils/mcp/tools/cpp_common.das Shared helpers for C++ scanning plus a lazy cppName → location index.
utils/mcp/tools/cpp_grep_usage.das New parse-aware C++ identifier usage search via sg run -p.
utils/mcp/tools/cpp_find_symbol.das New C++ declaration search by name/kind via sg scan rules output.
utils/mcp/tools/cpp_outline.das New C++ outline tool to list top-level declarations by file/glob.
utils/mcp/tools/cpp_goto_definition.das New approximate C++ goto-definition tool based on identifier-at-cursor + scan results.
utils/mcp/protocol.das Registers new tools; wires argument extraction/dispatch; adds with_cpp_source params.
utils/mcp/test_tools.das Adds coverage for new C++ tools and with_cpp_source behavior.
tree-sitter-daslang/cpp_outline_rules.yml Adds ast-grep rules for C++ top-level declarations.
sgconfig.yml.linux Adds languageGlobs so .h/.hpp are treated as C++ by ast-grep.
sgconfig.yml.osx Adds languageGlobs so .h/.hpp are treated as C++ by ast-grep.
sgconfig.yml.windows Adds languageGlobs so .h/.hpp are treated as C++ by ast-grep.
skills/mcp_tools.md Documents the new C++ tools and with_cpp_source redirect behavior.
install/skills/mcp_tools.md Mirrors skills doc updates for installed skill bundle.
doc/source/reference/utils/mcp.rst Adds reference docs for C++ tools, with_cpp_source, and sgconfig requirements.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread utils/mcp/tools/cpp_goto_definition.das Outdated
Comment thread utils/mcp/tools/cpp_goto_definition.das Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Comment thread tree-sitter-daslang/cpp_outline_rules.yml
Comment thread tree-sitter-daslang/cpp_outline_rules.yml
Comment thread utils/mcp/tools/cpp_common.das
…eness

Original scope (cpp_* tools + with_cpp_source redirect)
-------------------------------------------------------

Four ast-grep-backed MCP tools for C++ source navigation across
src/, include/, modules/:

- cpp_grep_usage      - parse-aware identifier search (`sg run -p`)
- cpp_find_symbol     - kind-filtered symbol search (`sg scan` + rule-yaml)
- cpp_outline         - top-level declarations in a file or glob
- cpp_goto_definition - approximate where-is-this-defined; up to 5 ranked
                        candidates (same-file > same-dir > shorter path).
                        clangd-backed precise mode on the v2 roadmap.

Plus a `with_cpp_source` opt-in flag on `find_symbol` and
`goto_definition` that resolves daslang symbols' cppName field to a C++
source location via a lazily-built index. Bridges builtins / handled
types / addExtern-registered functions to their C++ implementation in
one tool call instead of two.

Critical correctness prerequisite: `sgconfig.yml.{linux,osx,windows}`
get a `languageGlobs: cpp: ["*.h", "*.hpp"]` block. Without it ast-grep
classifies .h files as C, not C++, and 375 headers (24% of the C++
surface) silently produce zero matches.

Copilot review round 2 (function decls + `using` aliases)
---------------------------------------------------------

`cpp_outline_rules.yml` gains two rules:

- `cpp-outline-functions-decl` matches `kind: declaration` with a
  `has: stopBy: end, kind: function_declarator` constraint, surfacing
  header-only function declarations (`void foo();`, etc.). On
  `include/daScript/ast/ast.h` alone this finds 323 declarations the
  index previously missed.
- `cpp-outline-typedefs-using` matches `alias_declaration`, surfacing
  modern `using X = Y;` aliases under `kind=typedef` alongside legacy
  `typedef X Y;`. `cpp_extract_name_pair` extended for the new shape.

Search-scope configuration (cpp_search_config.das)
--------------------------------------------------

New `utils/mcp/cpp_search_config.das` exports four constants the C++
tools read:

- CPP_SEARCH_DIRS              - folders to scan recursively (default:
                                  src, include, modules)
- CPP_SEARCH_ALWAYS_EXCLUDE    - hard-coded glob excludes; adds
                                  cmake-build-*/ and CMakeFiles/
                                  alongside build*/_deps/3rdparty/.git/
- CPP_SEARCH_INCLUDE_GLOBS     - file-extension lock (default
                                  *.cpp / *.h / *.hpp; covers 99.3% of
                                  the C++ surface per repo audit)
- CPP_SEARCH_INCLUDE_OVERRIDES - re-include paths the auto-exclude
                                  policy would otherwise drop

Folders containing a `.git` file or directory at any depth are
auto-excluded. Covers `modules/.daspkg_cache/` (daspkg's package index
clone) plus future submodules / FetchContent destinations / ad-hoc
clones, with no manual list to maintain.

Git-signature staleness (replaces lazy-once)
--------------------------------------------

`var cpp_index_built : bool` becomes `var cpp_index_signature : string`.
Each `ensure_cpp_index()` call recomputes a cheap signature from
`git rev-parse HEAD` + `git status --porcelain --untracked-files=normal`
(double-filtered to .cpp/.h/.hpp files in the search scope) +
per-dirty-file mtimes + `cpp_search_config.das` mtime, hashed via
daslang's builtin FNV-64 `hash()`. Cache hit when the signature
matches; rebuild when it doesn't.

Naturally fixes Copilot review #3 (silent failure trap): on `sg scan`
failure the signature stays empty, so the next call retries. No
permanent silent fallback to "(not located)".

Per-call cost: ~70-200ms typical. The trade is "always-fresh after C++
edits" vs "build once, restart MCP to refresh" --- explicit choice.

Tests: 11 new cases (236/236 green) for cpp_search_config defaults,
signature stability, .git-folder auto-exclusion (verified against the
local daspkg cache), function-declaration regression, and using-alias
regression. Sphinx clean.

Docs: skills/mcp_tools.md, install/skills/mcp_tools.md,
doc/source/reference/utils/mcp.rst all updated for the config file +
staleness story.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Comment thread utils/mcp/tools/cpp_common.das Outdated
Comment thread utils/mcp/protocol.das Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 8 comments.

Comment thread utils/mcp/tools/cpp_common.das Outdated
Comment thread utils/mcp/tools/cpp_common.das
Comment thread utils/mcp/tools/cpp_goto_definition.das
Comment thread utils/mcp/tools/cpp_outline.das
Comment thread utils/mcp/protocol.das
Comment thread skills/mcp_tools.md
Comment thread doc/source/reference/utils/mcp.rst Outdated
Comment thread install/skills/mcp_tools.md
@borisbat borisbat force-pushed the cpp-mcp-tools branch 2 times, most recently from 30f12e4 to e661aba Compare May 8, 2026 06:33
@borisbat borisbat requested a review from Copilot May 8, 2026 06:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Comment thread utils/mcp/tools/goto_definition.das Outdated
Comment thread utils/mcp/tools/cpp_common.das
Comment thread utils/mcp/tools/cpp_grep_usage.das
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.

Comment thread utils/mcp/tools/cpp_outline.das
Comment thread utils/mcp/tools/cpp_common.das
Comment thread utils/mcp/tools/cpp_common.das
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Comment thread utils/mcp/tools/goto_definition.das Outdated
Comment thread utils/mcp/protocol.das Outdated
@borisbat borisbat force-pushed the cpp-mcp-tools branch 2 times, most recently from 39b5adf to 5cc1498 Compare May 8, 2026 08:07
@borisbat borisbat requested a review from Copilot May 8, 2026 08:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Comment thread utils/mcp/tools/cpp_common.das
Comment thread utils/mcp/tools/cpp_common.das
Adds five MCP tools for navigating C++ source via ast-grep + tree-sitter-cpp:

  - cpp_grep_usage      — parse-aware identifier-leaf usage search across
                          multiple AST kinds (identifier, type_identifier,
                          namespace_identifier, field_identifier).
  - cpp_find_symbol     — declaration lookup by (name, kind).
  - cpp_outline         — function signatures, template specializations,
                          class/namespace nesting, qualified names. Auto/tree/
                          flat render modes.
  - cpp_goto_definition — best-effort approximate goto. Ranks candidates by
                          file proximity; surfaces index-build failures via
                          cpp_index_status().
  - find_symbol / goto_definition — opt-in `with_cpp_source` redirect that
                          resolves builtin/handled-type C++ source locations
                          via the lazily-built cpp index.

Search scope is configurable in `utils/mcp/cpp_search_config.das`
(CPP_SEARCH_DIRS, CPP_SEARCH_INCLUDE_GLOBS, CPP_SEARCH_INCLUDE_OVERRIDES,
CPP_MAX_FIND_RESULTS, CPP_MAX_GOTO_CANDIDATES). Folders containing a `.git`
file/dir (submodules, FetchContent) auto-excluded.

The cpp index is cached in-process and rebuilds when its git-state staleness
signature changes (HEAD + filtered status + per-file mtimes + config mtime).
First call ~2s, subsequent ~150ms.

In a non-git checkout (extracted tarball, manual rm of .git), the no-git
fallback walks CPP_SEARCH_DIRS recursively and folds per-file mtimes into
the signature so source edits still invalidate the cache. ~50–200ms; only
hit when git rev-parse / status fails. The .git-folder auto-exclude logic
is reused so vendored repos and the daspkg cache are pruned consistently.

cpp_run_scan / do_cpp_grep_usage are parse-first: on Windows, sg can exit
non-zero while still emitting valid JSON (warnings on stderr get merged in
by run_and_capture). Try to parse first; only fail if parse itself fails,
and enrich the message with the rc when sg exited non-zero.

goto_definition's `with_cpp_source` redirect now also fires for handled
types (TypeAnnotation.cppName) — previously only builtin functions
populated cppName. Type fallback in resolve_definition gets a new
isHandle branch that mirrors the path find_symbol already takes. Future
gaps (ExprVar of handled type, ExprField on handled-type parent) are
documented inline as a v2 roadmap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@borisbat borisbat merged commit a97a3f9 into master May 8, 2026
29 checks passed
pull Bot pushed a commit to forksnd/daScript that referenced this pull request May 8, 2026
The install/CLAUDE.md and install/skills/ tree had been hand-mirrored
from the top-level skills tree, drifting on every edit (e.g. PR GaijinEntertainment#2602
had to update both copies of mcp_tools.md). This collapses the skills
side to a single source of truth in skills/ + an install/skills.list
manifest that CMake reads to copy named files into the SDK at install
time. install/CLAUDE.md stays as a separate audience-curated head
(Running Scripts, Project files, SDK Directory Layout) — the two
heads' near-identical syntax block is the only remaining duplication.

Structural changes:
- Extract ~32 lines of "Project Overview" / "What and Why" /
  "Designing with macros" prose from CLAUDE.md to a new shipped
  skills/project_overview.md; both heads keep a 1-line pointer.
- Fold skills/clargs_migration.md into clargs_usage.md as a final
  "Migrating from get_command_line_arguments()" section.
- Replace install/skills/ (19 files, ~3100L) with install/skills.list
  (21 entries) + a CMake file(STRINGS) install rule with FATAL_ERROR
  existence check. install_instructions.md rewritten to match.

Per-skill content cleanup (informed by an audit memo against the old
install/skills/ versions, kept where it removed only repo-internal
navigation noise; reverted where install/ over-trimmed substantive
content like Handle<T>/HandleRegistry, daspkg's command table, and
the .das_module C++ binding boilerplate):
- das_macros.md: drop 115L of legacy var inscope/<- AST patterns;
  preserve the substantive [call_macro] entry-guard contract section.
- detect_dupe.md: drop the entire 116L "Maintainer notes" section
  (repo-dev-only); generalize bin/Release/daslang.exe → bin/daslang.
- writing_tests.md: drop "Test index", "AOT tests registration",
  and "options no_aot" sections (all repo-dev infrastructure).
- dynamic_modules.md: drop "Resolution order in getModuleInfo()"
  internals and "Install rules for .das_module" CMake snippet.
- daspkg.md: drop "Package Index" section (index repo navigation);
  generalize bin/Release/daslang.exe → bin/daslang.
- jobque_debugging.md, memory_leak_detection.md: generalize binary
  paths from bin/Debug/daslang.exe → bin/daslang etc.
- linq.md: drop two issue GaijinEntertainment#2505 historical-nav references.
- cpp_integration.md: drop include/daScript/misc/string_writer.h
  internal path from the LogLevel collision caveat.
- filesystem.md: drop in-tree src/ and tests/ Reference pointers;
  keep daslib/, include/, tutorials/ pointers (those all ship).
- make_pr.md: drop step 5.5 (.md stop-rule).
- CLAUDE.md skill table: drop stale gc_use_after_sweep.md row
  (the file never existed); drop clargs_migration row.
- install/CLAUDE.md skill table: add project_overview.md and
  strudel_port.md rows to match the 21-entry ship list.

Verification:
- cmake -B build -S . reconfigures cleanly.
- cmake --install build --prefix /tmp/das-install-test produces
  /tmp/das-install-test/CLAUDE.md (= install/CLAUDE.md, the SDK
  version) and /tmp/das-install-test/skills/ with exactly the 21
  files listed in install/skills.list, all matching skills/ source.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@borisbat borisbat deleted the cpp-mcp-tools branch May 14, 2026 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants