mcp: C++ source-search tools (cpp_grep_usage / cpp_find_symbol / cpp_outline / cpp_goto_definition) + with_cpp_source redirect#2602
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds ast-grep/tree-sitter-cpp–backed MCP navigation tools for the repository’s C++ sources, and extends existing daslang symbol navigation tools to optionally append resolved C++ source locations.
Changes:
- Introduces four new C++ navigation tools (
cpp_grep_usage,cpp_find_symbol,cpp_outline,cpp_goto_definition) powered bysg scan/run+ C++ rules. - Adds
with_cpp_sourcetofind_symbolandgoto_definitionto resolvecppName → file:linevia a lazily-built index. - Updates MCP protocol wiring, docs, tests, and
sgconfigtemplates (notably.h/.hppclassified as C++).
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| utils/mcp/tools/goto_definition.das | Adds cppName output for builtins and optional with_cpp_source C++ location append. |
| utils/mcp/tools/find_symbol.das | Adds with_cpp_source redirect output using the lazy C++ index. |
| utils/mcp/tools/cpp_common.das | Shared helpers for C++ scanning plus a lazy cppName → location index. |
| utils/mcp/tools/cpp_grep_usage.das | New parse-aware C++ identifier usage search via sg run -p. |
| utils/mcp/tools/cpp_find_symbol.das | New C++ declaration search by name/kind via sg scan rules output. |
| utils/mcp/tools/cpp_outline.das | New C++ outline tool to list top-level declarations by file/glob. |
| utils/mcp/tools/cpp_goto_definition.das | New approximate C++ goto-definition tool based on identifier-at-cursor + scan results. |
| utils/mcp/protocol.das | Registers new tools; wires argument extraction/dispatch; adds with_cpp_source params. |
| utils/mcp/test_tools.das | Adds coverage for new C++ tools and with_cpp_source behavior. |
| tree-sitter-daslang/cpp_outline_rules.yml | Adds ast-grep rules for C++ top-level declarations. |
| sgconfig.yml.linux | Adds languageGlobs so .h/.hpp are treated as C++ by ast-grep. |
| sgconfig.yml.osx | Adds languageGlobs so .h/.hpp are treated as C++ by ast-grep. |
| sgconfig.yml.windows | Adds languageGlobs so .h/.hpp are treated as C++ by ast-grep. |
| skills/mcp_tools.md | Documents the new C++ tools and with_cpp_source redirect behavior. |
| install/skills/mcp_tools.md | Mirrors skills doc updates for installed skill bundle. |
| doc/source/reference/utils/mcp.rst | Adds reference docs for C++ tools, with_cpp_source, and sgconfig requirements. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…eness
Original scope (cpp_* tools + with_cpp_source redirect)
-------------------------------------------------------
Four ast-grep-backed MCP tools for C++ source navigation across
src/, include/, modules/:
- cpp_grep_usage - parse-aware identifier search (`sg run -p`)
- cpp_find_symbol - kind-filtered symbol search (`sg scan` + rule-yaml)
- cpp_outline - top-level declarations in a file or glob
- cpp_goto_definition - approximate where-is-this-defined; up to 5 ranked
candidates (same-file > same-dir > shorter path).
clangd-backed precise mode on the v2 roadmap.
Plus a `with_cpp_source` opt-in flag on `find_symbol` and
`goto_definition` that resolves daslang symbols' cppName field to a C++
source location via a lazily-built index. Bridges builtins / handled
types / addExtern-registered functions to their C++ implementation in
one tool call instead of two.
Critical correctness prerequisite: `sgconfig.yml.{linux,osx,windows}`
get a `languageGlobs: cpp: ["*.h", "*.hpp"]` block. Without it ast-grep
classifies .h files as C, not C++, and 375 headers (24% of the C++
surface) silently produce zero matches.
Copilot review round 2 (function decls + `using` aliases)
---------------------------------------------------------
`cpp_outline_rules.yml` gains two rules:
- `cpp-outline-functions-decl` matches `kind: declaration` with a
`has: stopBy: end, kind: function_declarator` constraint, surfacing
header-only function declarations (`void foo();`, etc.). On
`include/daScript/ast/ast.h` alone this finds 323 declarations the
index previously missed.
- `cpp-outline-typedefs-using` matches `alias_declaration`, surfacing
modern `using X = Y;` aliases under `kind=typedef` alongside legacy
`typedef X Y;`. `cpp_extract_name_pair` extended for the new shape.
Search-scope configuration (cpp_search_config.das)
--------------------------------------------------
New `utils/mcp/cpp_search_config.das` exports four constants the C++
tools read:
- CPP_SEARCH_DIRS - folders to scan recursively (default:
src, include, modules)
- CPP_SEARCH_ALWAYS_EXCLUDE - hard-coded glob excludes; adds
cmake-build-*/ and CMakeFiles/
alongside build*/_deps/3rdparty/.git/
- CPP_SEARCH_INCLUDE_GLOBS - file-extension lock (default
*.cpp / *.h / *.hpp; covers 99.3% of
the C++ surface per repo audit)
- CPP_SEARCH_INCLUDE_OVERRIDES - re-include paths the auto-exclude
policy would otherwise drop
Folders containing a `.git` file or directory at any depth are
auto-excluded. Covers `modules/.daspkg_cache/` (daspkg's package index
clone) plus future submodules / FetchContent destinations / ad-hoc
clones, with no manual list to maintain.
Git-signature staleness (replaces lazy-once)
--------------------------------------------
`var cpp_index_built : bool` becomes `var cpp_index_signature : string`.
Each `ensure_cpp_index()` call recomputes a cheap signature from
`git rev-parse HEAD` + `git status --porcelain --untracked-files=normal`
(double-filtered to .cpp/.h/.hpp files in the search scope) +
per-dirty-file mtimes + `cpp_search_config.das` mtime, hashed via
daslang's builtin FNV-64 `hash()`. Cache hit when the signature
matches; rebuild when it doesn't.
Naturally fixes Copilot review #3 (silent failure trap): on `sg scan`
failure the signature stays empty, so the next call retries. No
permanent silent fallback to "(not located)".
Per-call cost: ~70-200ms typical. The trade is "always-fresh after C++
edits" vs "build once, restart MCP to refresh" --- explicit choice.
Tests: 11 new cases (236/236 green) for cpp_search_config defaults,
signature stability, .git-folder auto-exclusion (verified against the
local daspkg cache), function-declaration regression, and using-alias
regression. Sphinx clean.
Docs: skills/mcp_tools.md, install/skills/mcp_tools.md,
doc/source/reference/utils/mcp.rst all updated for the config file +
staleness story.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
30f12e4 to
e661aba
Compare
39b5adf to
5cc1498
Compare
Adds five MCP tools for navigating C++ source via ast-grep + tree-sitter-cpp:
- cpp_grep_usage — parse-aware identifier-leaf usage search across
multiple AST kinds (identifier, type_identifier,
namespace_identifier, field_identifier).
- cpp_find_symbol — declaration lookup by (name, kind).
- cpp_outline — function signatures, template specializations,
class/namespace nesting, qualified names. Auto/tree/
flat render modes.
- cpp_goto_definition — best-effort approximate goto. Ranks candidates by
file proximity; surfaces index-build failures via
cpp_index_status().
- find_symbol / goto_definition — opt-in `with_cpp_source` redirect that
resolves builtin/handled-type C++ source locations
via the lazily-built cpp index.
Search scope is configurable in `utils/mcp/cpp_search_config.das`
(CPP_SEARCH_DIRS, CPP_SEARCH_INCLUDE_GLOBS, CPP_SEARCH_INCLUDE_OVERRIDES,
CPP_MAX_FIND_RESULTS, CPP_MAX_GOTO_CANDIDATES). Folders containing a `.git`
file/dir (submodules, FetchContent) auto-excluded.
The cpp index is cached in-process and rebuilds when its git-state staleness
signature changes (HEAD + filtered status + per-file mtimes + config mtime).
First call ~2s, subsequent ~150ms.
In a non-git checkout (extracted tarball, manual rm of .git), the no-git
fallback walks CPP_SEARCH_DIRS recursively and folds per-file mtimes into
the signature so source edits still invalidate the cache. ~50–200ms; only
hit when git rev-parse / status fails. The .git-folder auto-exclude logic
is reused so vendored repos and the daspkg cache are pruned consistently.
cpp_run_scan / do_cpp_grep_usage are parse-first: on Windows, sg can exit
non-zero while still emitting valid JSON (warnings on stderr get merged in
by run_and_capture). Try to parse first; only fail if parse itself fails,
and enrich the message with the rc when sg exited non-zero.
goto_definition's `with_cpp_source` redirect now also fires for handled
types (TypeAnnotation.cppName) — previously only builtin functions
populated cppName. Type fallback in resolve_definition gets a new
isHandle branch that mirrors the path find_symbol already takes. Future
gaps (ExprVar of handled type, ExprField on handled-type parent) are
documented inline as a v2 roadmap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pull Bot
pushed a commit
to forksnd/daScript
that referenced
this pull request
May 8, 2026
The install/CLAUDE.md and install/skills/ tree had been hand-mirrored from the top-level skills tree, drifting on every edit (e.g. PR GaijinEntertainment#2602 had to update both copies of mcp_tools.md). This collapses the skills side to a single source of truth in skills/ + an install/skills.list manifest that CMake reads to copy named files into the SDK at install time. install/CLAUDE.md stays as a separate audience-curated head (Running Scripts, Project files, SDK Directory Layout) — the two heads' near-identical syntax block is the only remaining duplication. Structural changes: - Extract ~32 lines of "Project Overview" / "What and Why" / "Designing with macros" prose from CLAUDE.md to a new shipped skills/project_overview.md; both heads keep a 1-line pointer. - Fold skills/clargs_migration.md into clargs_usage.md as a final "Migrating from get_command_line_arguments()" section. - Replace install/skills/ (19 files, ~3100L) with install/skills.list (21 entries) + a CMake file(STRINGS) install rule with FATAL_ERROR existence check. install_instructions.md rewritten to match. Per-skill content cleanup (informed by an audit memo against the old install/skills/ versions, kept where it removed only repo-internal navigation noise; reverted where install/ over-trimmed substantive content like Handle<T>/HandleRegistry, daspkg's command table, and the .das_module C++ binding boilerplate): - das_macros.md: drop 115L of legacy var inscope/<- AST patterns; preserve the substantive [call_macro] entry-guard contract section. - detect_dupe.md: drop the entire 116L "Maintainer notes" section (repo-dev-only); generalize bin/Release/daslang.exe → bin/daslang. - writing_tests.md: drop "Test index", "AOT tests registration", and "options no_aot" sections (all repo-dev infrastructure). - dynamic_modules.md: drop "Resolution order in getModuleInfo()" internals and "Install rules for .das_module" CMake snippet. - daspkg.md: drop "Package Index" section (index repo navigation); generalize bin/Release/daslang.exe → bin/daslang. - jobque_debugging.md, memory_leak_detection.md: generalize binary paths from bin/Debug/daslang.exe → bin/daslang etc. - linq.md: drop two issue GaijinEntertainment#2505 historical-nav references. - cpp_integration.md: drop include/daScript/misc/string_writer.h internal path from the LogLevel collision caveat. - filesystem.md: drop in-tree src/ and tests/ Reference pointers; keep daslib/, include/, tutorials/ pointers (those all ship). - make_pr.md: drop step 5.5 (.md stop-rule). - CLAUDE.md skill table: drop stale gc_use_after_sweep.md row (the file never existed); drop clargs_migration row. - install/CLAUDE.md skill table: add project_overview.md and strudel_port.md rows to match the 21-entry ship list. Verification: - cmake -B build -S . reconfigures cleanly. - cmake --install build --prefix /tmp/das-install-test produces /tmp/das-install-test/CLAUDE.md (= install/CLAUDE.md, the SDK version) and /tmp/das-install-test/skills/ with exactly the 21 files listed in install/skills.list, all matching skills/ source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds four ast-grep-backed MCP tools for navigating the C++ side of the codebase (
src/,include/,modules/), plus a one-call bridge from daslang symbols to their C++ source via a newwith_cpp_sourceflag on the existingfind_symbolandgoto_definitiontools, plus a behavioral rule +.mcp.jsonknob to make the MCP tools the first thing the assistant reaches for.Closes the gap where
.dassymbol lookup had four parse-aware MCP tools but the ~1,400 C++ files had only raw grep — and where the assistant would default toBash/Grepeven for.daslookups because the deferred-schema friction made MCP feel like the slower path.New tools
cpp_grep_usage.cpp/.cc/.h/.hppviasg scan+ a runtime-generated YAML rule covering all four C++ identifier-leaf kinds (identifier,type_identifier,namespace_identifier,field_identifier). Hits dedup by(file, line).cpp_find_symbol=exact. Result cap configurable viaCPP_MAX_FIND_RESULTS(default 50).cpp_outlineconst/noexcept/overridequalifiers), template specializations as distinct entries (Trait<int>vsTrait<float>), class/namespace nesting via containment forest, qualified names for in-class declarations (Outer::method), DAS_API false-positive filtering, anonymous-noise filtering. Auto /tree/flatrender modes.cpp_goto_definitionCPP_MAX_GOTO_CANDIDATES(default 5).All four are gated on
ast_grep_available. Default search scope issrc/ include/ modules/withbuild*/,_deps/,3rdparty/always excluded; folders containing a.gitfile/dir (submodules, daspkg, FetchContent) auto-excluded. Implementation is rule-yaml +sg scan(thesg run -ppattern path is unreliable for C++ kind queries). Rule filetree-sitter-daslang/cpp_outline_rules.ymlcovers eight productive C++ kinds plus thefield_declaration → function_declaratorpath for in-class method declarations.with_cpp_sourceredirectfind_symbolandgoto_definitionacceptwith_cpp_source : bool = false. Whentrue, results that have a non-emptycppName(builtin functions,addExtern-registered functions, handled types viaMAKE_TYPE_FACTORY) get a resolved C++ source location appended via a lazily-builtcppName → array<CppMatch>index. First call costs ~2s (one full scan); subsequent calls are ~150ms (a git-state staleness signature:rev-parse HEAD+ filteredgit status+ per-file mtimes +cpp_search_config.dasmtime). The index rebuilds automatically when relevant.cpp/.cc/.h/.hppfiles change, when HEAD moves, or when the search config is edited. Default off.The same index now backs
cpp_goto_definition— same amortized cost, then O(1) per goto.When the index is unavailable (ast-grep missing, scan crash, etc.),
find_symbol/goto_definition/cpp_goto_definitionall surface the underlying reason viacpp_index_status()so users can distinguish "no match" from "infrastructure error".Configuration
Tool behavior is centralized in
utils/mcp/cpp_search_config.das:CPP_SEARCH_DIRS["src", "include", "modules"]CPP_SEARCH_INCLUDE_GLOBS["*.cpp", "*.cc", "*.h", "*.hpp"]CPP_SEARCH_ALWAYS_EXCLUDE**/build*/**,**/_deps/**,**/3rdparty/**, …CPP_SEARCH_INCLUDE_OVERRIDES[].gitfile/dirCPP_MAX_FIND_RESULTS50cpp_find_symboldeclaration capCPP_MAX_GOTO_CANDIDATES5cpp_goto_definitionranked-candidate capEdit and restart the MCP server (or touch any tracked source) to pick up changes — config mtime is part of the index staleness signature.
"MCP-first search" rule +
.mcp.jsondefer_loadingCLAUDE.mdgets a section directing the assistant: before reaching forBash/Grep/Readfor any symbol or usage lookup in the repo, ToolSearch and call the matching MCP tool. Table covers all 9 search-class tools (find_symbol/cpp_find_symbol/grep_usage/cpp_grep_usage/outline/cpp_outline/find_references/goto_definition/cpp_goto_definition) with thewith_cpp_sourceflag called out for the daslang→C++ bridge..mcp.json(gitignored — per-user) gains"defer_loading": falseon thedaslangserver entry. The flag is documented but reportedly broken upstream; harmless to set, falls back to deferred-tool path otherwise. Documented inskills/mcp_tools.md,install/skills/mcp_tools.md, and the RST.Critical correctness prerequisite
sgconfig.yml.{linux,osx,windows}get alanguageGlobs: { cpp: ["*.h", "*.hpp"] }block. Without it ast-grep classifies.hfiles as C (not C++) and 375 headers — 24% of the C++ surface — silently produce zero matches. The lint pass againstinclude/daScript/simulate/fs_file_info.his the regression test intest_tools.das.Caveats
DAS_BIND_FN(foo)) are invisible to ast-grepcpp_goto_definitionis approximate — no scope resolution, no overload disambiguation. For substring/usage searches, prefercpp_grep_usage. A clangd-backed precise mode is on the v2 roadmap.Tests
utils/mcp/test_tools.das— total now 259/259 green. Coverage:addExternFuncinsrc/ast/ast_interop.cpp).hlanguageGlobs regression (FsFileSystemininclude/daScript/simulate/fs_file_info.h)with_cpp_sourceon/off behavior for bothfind_symbolandgoto_definitioncpp_outlinev2: function signatures, template specializations distinct, anonymous noise filtered, DAS_API misparse filtered, qualified names for in-class declarations, nesting (auto/tree/flat modes)cpp_grep_usage: type-position hits +(file, line)dedupSynthetic fixture at
utils/mcp/tests/_fixture_cpp_outline.hexercises every TODO item.Test plan
bin/daslang dastest/dastest.das -- --test utils/mcp/test_tools.das→ 259/259 PASSmcp__daslang__linton all.dasfiles in the diff → 0 issuesmcp__daslang__format_fileon all.dasfiles → already_formattedbuild succeeded.sg scan -r tree-sitter-daslang/cpp_outline_rules.yml include/daScript/simulate/fs_file_info.h --jsonreturns matches withlanguage: Cppfind_symbol("=get_das_root", with_cpp_source=true)→→ cpp: src/builtin/module_builtin_runtime.cpp:1531in one call27d0c0d4d); threads resolved🤖 Generated with Claude Code