feat(checkpoint): 统一 Checkpoint、代码回退与运行恢复#550
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
/code 修复编译报错:ok neo-code/cmd/neocode-gateway 0.100s coverage: 100.0% of statements neo-code/internal/gateway/adapters/urlscheme [neo-code/internal/gateway/adapters/urlscheme.test]Error: internal/gateway/adapters/urlscheme/dispatcher_integration_unix_test.go:37:40: cannot use runtimeStub (variable of type *urlschemeIntegrationRuntimeStub) as gateway.RuntimePort value in argument to server.Serve: *urlschemeIntegrationRuntimeStub does not implement gateway.RuntimePort (missing method CheckpointDiff) |
完成已修复
已执行:
改动已提交并推送到当前 PR 分支 |
Generated with [codeagent](https://github.com/qbox/codeagent) Co-authored-by: phantom5099 <245659304+phantom5099@users.noreply.github.com>
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Generated with [codeagent](https://github.com/qbox/codeagent) Co-authored-by: phantom5099 <245659304+phantom5099@users.noreply.github.com>
Generated with [codeagent](https://github.com/qbox/codeagent) Co-authored-by: phantom5099 <245659304+phantom5099@users.noreply.github.com>
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Generated with [codeagent](https://github.com/qbox/codeagent) Co-authored-by: phantom5099 <245659304+phantom5099@users.noreply.github.com>
test(checkpoint): increase restore coverage
This comment was marked as resolved.
This comment was marked as resolved.
Generated with [codeagent](https://github.com/qbox/codeagent) Co-authored-by: phantom5099 <245659304+phantom5099@users.noreply.github.com>
test(checkpoint): cover bridge and filesystem edge cases
|
没问题,加的工具必要就行,如果测试过能用我没啥问题了 |
问题
NeoCode 缺乏代码回退、会话回滚与运行恢复的统一机制:
sed -i、mv、rm等 shell 命令的文件副作用此前未纳入回退范围。关联issue:feat(session):代码、上下文回退 #521
方案
采用三层恢复模型,代码快照后端选用 Per-Edit Snapshot Store(纯文件版本化增量历史),不依赖 Git。
三层模型
PerEditSnapshotStore:按pathHash@vN.bin/.meta存储版本,checkpoint 只存(pathHash -> version)映射SessionCheckpoint:SQLite 事务内写入HeadJSON+MessagesJSONResumeCheckpoint:SQLite,每个 session 仅保留最新一条代码快照关键设计
恢复算法:对 checkpoint 中的每个
(pathHash, v_A),查找版本链中下一个版本v_next,把v_next.bin写回 workdir(Existed=false时删除)。不在 checkpoint 映射中的文件保持不变。捕获机制:
filesystem_write_file/edit/move/copy/delete/create_dir/remove_dir)在调用前即时CapturePreWrite。BashLikelyWritesFiles启发式识别写命令,提取源文件路径批量 capture;同时记录 fingerprint,执行后对比发现未覆盖变更时 emitEventBashSideEffect。Checkpoint 创建时机
pre_write):固化上一轮 pending capture。有代码快照则创建完整 checkpoint;无 pending 则退化为 session-only checkpoint(CodeCheckpointRef为空)。end_of_turn):若本轮有 workspace write,再次 Finalize 固化。compact):创建 session-only checkpoint(有 pending writes 则同时固化代码快照)。pre_restore_guard):自动创建 guard 快照,供 undo 使用。CheckpointReasonPlanMode和CheckpointReasonManual已预留,runtime 尚未触发。Restore 流程
available+restorable+ session 匹配)。pre_restore_guard快照。Restore还原 snapshot 覆盖的文件。restored。跨存储原子性
file-history 与 SQLite 是两个独立存储,采用"先写 file-history -> 再写 DB -> 失败则补偿"的两阶段协议。启动时扫描
status=creating的残留记录,有session_checkpoint_ref则更新为available,无则删除孤儿记录。保留策略
manualcheckpoint 和pre_restore_guard始终可恢复(manual已预留,runtime 未触发)。pruned,关联SessionCheckpoint删除,file-history 保留。不使用 Git 影子仓库:
NeoCode 选择基于文件的 Per-Edit Snapshot Store 而非 Git影子仓库,主要出于以下技术考量:
增加系统工具
"增加工具种类"有两个层面的必要性:Agent 本身的能力完整性,以及 Checkpoint 精确恢复的数据完整性。
仅靠 read_file / write_file / edit 无法覆盖真实开发中的常见操作:
如果缺少这些专用工具,Agent 只能通过 bash 执行 cp / mv / mkdir / rm 等命令。但 bash 的问题是:
SourceFilesInWorkdir),经常漏捕或误捕
专用工具让 Agent 能直接表达意图,同时返回结构化的 paths / bytes / overwrite 等元数据。
如果不区分工具种类,统一按"单路径 pre-write capture"处理:
正是为了支持精确的目录树恢复,我们才必须识别 remove_dir 并做递归 pre-capture + CapturePostDelete。
专用工具在 Execute 内部会做统一的路径校验(resolvePath / tools.ResolveWorkspaceTarget),确保操作不逃出 Workspace。bash
命令很难做这种细粒度控制。新增的工具越多,Agent 越不需要依赖 bash,整体安全性越高。
修改范围
核心新增
internal/checkpoint/per_edit_snapshot.go—PerEditSnapshotStore(Capture / Finalize / Restore / Diff / ChangedFiles)internal/checkpoint/fingerprint.go— workdir 指纹扫描与 diffinternal/checkpoint/bash_capture.go— bash 启发式写命令识别与路径提取internal/runtime/checkpoint_restore.go— Restore / Undo / Guard / CheckpointDiffinternal/runtime/checkpoint_gate.go— start_of_turn / end_of_turn / compact checkpoint 创建internal/runtime/checkpoint_resume.go—updateResumeCheckpointinternal/runtime/file_snapshot.go— 工具执行前后快照与 unified diff 计算集成修改
internal/runtime/run.go— turn 循环插入 checkpoint 创建、工具 diff emit、resume checkpoint 更新internal/runtime/toolexec.go— 写工具前置 capture + bash 启发式 capture + fingerprint 兜底 + diff 计算internal/runtime/events.go— 新增EventCheckpointCreated/EventCheckpointWarning/EventCheckpointRestored/EventCheckpointUndoRestore/EventToolDiff/EventBashSideEffectinternal/session/sqlite_store.go— schema v6 迁移,新增三张 checkpoint 表internal/app/bootstrap.go— 注入SQLiteCheckpointStore+PerEditSnapshotStore,启动时补偿恢复internal/gateway/— 注册 checkpoint 查询 / restore / undo / diff 的 RPC handlerinternal/tools/filesystem/— 新增copy_file/create_dir/delete_file/move_file/remove_dirTUI 接入
1. 实时观测本轮修改
订阅 runtime 事件流:
EventToolDiff— 每个写工具执行后,展示文件变更和 unified diff。EventBashSideEffect— bash 产生未捕获变更时,提示未覆盖文件路径。EventCheckpointCreated— 可选,用于展示 checkpoint 时间线。ToolDiffPayload同时提供单文件兼容字段(FilePath/Diff/WasNew)和多文件字段(Files+Diffs),TUI 按Files长度判断场景。2. Checkpoint 列表与 Diff 预览
通过 Gateway RPC:
ListCheckpoints(sessionID)— 获取可恢复 checkpoint 列表,按Reason分组展示,标记包含代码快照的项(CodeCheckpointRef非空)。CheckpointDiff(sessionID, checkpointID)— 查询目标 checkpoint 相对于上一个代码 checkpoint 的端到端差异,返回Files(Added/Deleted/Modified 分类)和Patch(unified diff)。建议在 restore 确认弹窗中展示,帮助用户决策。3. Restore 与 Undo
通过 Gateway RPC:
checkpoint_restore— 输入session_id+checkpoint_id+force。checkpoint_undo_restore— 输入session_id,恢复到最近一次 restore 前的 guard 点。Restore 成功后 TUI 会收到
EventCheckpointRestored,应刷新消息列表、todo、plan、以及已打开的文件内容。4. 运行时快照刷新(自动)
Restore 成功后
runtimeSnapshots缓存被删除,TUI 下次调用GetRuntimeSnapshot时自动从 DB 重新加载恢复后的状态,无需额外处理。预期收益
EventToolDiff和CheckpointDiff使用户直观看到每轮改了什么。ResumeCheckpoint记录精确 phase/turn,进程重启后不再猜测。风险
plan_mode/manual预留未实现:类型层已定义,runtime 未触发。TranscriptRevision未填充:schema 已预留,运行时未赋值,resume 一致性校验尚未启用。pruned不清理.bin/.meta,长期运行可能磁盘膨胀,需后续引入独立 GC。