feat(repository):代码库探索能力重构 — 从 runtime 自动注入到模型主动工具调用#568
Merged
phantom5099 merged 15 commits into1024XEngineer:mainfrom May 7, 2026
Merged
feat(repository):代码库探索能力重构 — 从 runtime 自动注入到模型主动工具调用#568phantom5099 merged 15 commits into1024XEngineer:mainfrom
phantom5099 merged 15 commits into1024XEngineer:mainfrom
Conversation
将 NeoCode 的代码库/工作区探索能力从「runtime 猜测用户需求并自动注入」重构为 「模型主动调用 git_* / codebase_* 工具」的方案。同时引入 Tree-sitter 跨语言 符号索引,使 codebase_search_symbol 支持 Python/Java/TypeScript/Rust 等语言。 BREAKING: internal/context/repository 整包删除,所有引用迁移到 internal/repository
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
This comment was marked as resolved.
This comment was marked as resolved.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
This comment was marked as low quality.
This comment was marked as low quality.
This comment was marked as resolved.
This comment was marked as resolved.
Generated with [codeagent](https://github.com/qbox/codeagent) Co-authored-by: phantom5099 <245659304+phantom5099@users.noreply.github.com>
This comment was marked as low quality.
This comment was marked as low quality.
This comment was marked as resolved.
This comment was marked as resolved.
Generated with [codeagent](https://github.com/qbox/codeagent) Co-authored-by: phantom5099 <245659304+phantom5099@users.noreply.github.com>
fix(tools): honor relative workdir in effective root
# Conflicts: # go.sum
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
背景问题
当前系统存在以下核心问题:
入口层:Runtime 替模型猜需求
当前 runtime 基于用户最新消息做正则启发式判定(路径锚点、符号锚点、引号文本锚点),猜测用户是否需要代码库信息。这种模糊匹配策略召回率偏弱,且"一轮最多打一类 retrieval"的限制使模型在复杂场景下无法灵活探索。
归属层:代码库能力挂在错误的位置
internal/context/repository承载了 Git 扫描、文件检索、安全过滤、结果裁剪等完整领域能力,但这些不是 prompt 组装逻辑。context的职责是消费已经准备好的投影结果并渲染 prompt section,而不是长期承载代码库探索的实现。这种错位还强化了"代码库检索是 prompt 隐式注入附属物"的错误心智。工具层:
codebase_*完全缺失,filesystem_*语义不等价通用
filesystem_*工具虽然能读文件和搜文本,但不提供 changed-files、结构化 retrieval 和 workspace 级安全裁剪语义。当前 prompt assets 仍在强化filesystem_*优先的策略,进一步削弱了代码库探索路径。符号检索层:只有 Go-first 实现,跨语言盲区
codebase_search_symbol只有 Go-first 的正则匹配,面对 Python/Java/TypeScript/Rust 等仓库退化为纯文本搜索,精度和结构化程度大幅下降。放弃逐个语言引入外部解析库(Python ast、JavaParser、tsc API)是因为会导致运行时灾难(JVM/Node 依赖)、API 完全不统一、无增量更新。closes: #535
方案
本 PR 落地以下方案:
1. 领域层迁出
新建
internal/repository作为中性领域层,将 Git 事实、定向检索、安全过滤、Fingerprint(从 checkpoint 迁入)、FileChangeKind/FileChangeEntry 等能力统一收拢。internal/context/repository整包删除,所有引用切到新包。checkpoint 的 per-edit snapshot 版本链功能不迁移。2. 工具入口
新增 3 个专职工具:
codebase_read/codebase_search_text/codebase_search_symbol工具输出采用结构化优先格式(字段名固定),
codebase_search_text不返回代码体,codebase_search_symbol仅返回path/line_hint/kind/signature(声明头 ≤ 512 字符),硬约束确保模型必须调用codebase_read才能获取实现内容。3. Runtime 退出自动检索
删除全部正则锚点提取、changed-files 启发式判定、auto retrieval query 构造链路。保留迁移期最小 Git Summary 注入(
branch/dirty/ahead/behind),其余仓库信息由模型主动通过工具获取。4. Tree-sitter 跨语言索引
引入纯 Go 的
github.com/odvcencio/gotreesitter(CGO-free,206 种 grammar),构建codebase_search_symbol三层 fallback 架构:索引器设计:惰性初始化(首次搜索时构建,不增加启动时间)、
sync.RWMutex线程安全、文件 mtime+size 增量更新(仅重解析变更文件)。5. Prompt 策略同步
tool_usage.md中明确git_* / codebase_*优先于filesystem_*的代码库探索路径,并写入硬约束规则。涉及变更
新增模块(
internal/repository/)中性领域层,聚合 Summary、ChangedFiles、Inspect、Retrieve、Fingerprint 及 Tree-sitter 索引能力。不直接暴露给模型,仅作为 tools 的底层依赖。
新增工具(
internal/tools/git/、internal/tools/codebase/)6 个工具各自独立文件,含 schema 定义、参数校验、结构化输出。注册到 tool registry 和 compact 管理链路。每个工具附带单元测试(共 22 用例)。
删除模块(
internal/context/repository/)整包删除 6 文件 ~4000 行。原有能力已完整迁移到
internal/repository。核心重构(
internal/runtime/repository_context.go)删除约 280 行自动注入逻辑(正则锚点提取、changed-files 启发式、retrieval query 构造),精简为仅返回最小 Git Summary。
Fingerprint 迁移(
internal/checkpoint→internal/repository)ScanWorkdir、DiffFingerprints、FileChangeKind、FileChangeEntry迁出,checkpoint 的 PerEditSnapshotStore 版本链功能不受影响。Prompt Assets(
tool_usage.md)新增 repository exploration 指引章节,废弃旧的 bash git 操作优先提示。
设计约束
verify/git_diff不变:继续服务 final verify 阶段,不并入推理期工具链。预期收益
codebase_*处理代码库探索,filesystem_*处理通用文件操作。验证