Session resume breaks after force-interrupt during tool execution (400 tool_call_ids missing)

### What version of Kimi Code is running?

0.6.0

### Which open platform/subscription were you using?

Kimi

### Which model were you using?

kimi-for-coding

### What platform is your computer?

Linux x64 (WSL2 Ubuntu-22.04), Node.js v24.15.0, WarpTerminal

### What issue are you seeing?

在一次会话中，AI 正在等待一个后台 Agent 任务的结果（调用了 `TaskOutput(block=true)`），我在这个过程中**强制中断了程序**（直接关闭终端）。

重新进入同一会话后，无论发送什么消息（例如"继续"），都会持续报错：

```
400 an assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'.
The following tool_call_ids did not have response messages: TaskOutput:79
```

手动触发 Compaction 也会失败，切换 provider（kimi / deepseek）后错误相同：

```
APIStatusError: 400 an assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids did not have response messages: TaskOutput:79
```

### What steps can reproduce the bug?

1. 启动一个会话，让 AI 调用一个需要阻塞等待的工具（例如 `TaskOutput(block=true)` 或后台 Agent）
2. **在工具执行过程中**强制中断（如直接关闭终端窗口）
3. 重新进入同一会话（自动恢复）
4. 发送任意新消息，例如"继续"
5. 观察到 LLM 请求返回 400，会话无法继续

Session ID: `session_62a574ff-660e-4fa5-ac1a-a9d6efed9e21`

### What is the expected behavior?

用户强制中断是一个完全合理的操作，软件应该能优雅恢复：

- 恢复时检测到未完成的 tool exchange，自动补全缺失的 `tool.result`（标记为中断/错误）和 `step.end`
- 或者 `project()` 在构建 LLM 消息时过滤掉不完整的工具调用序列
- 恢复后能够正常继续对话

### Additional information

我导出了该会话的 debug zip 并检查了其中的 wire log，发现以下关键证据：

**1. wire.jsonl 中存在未配对的 tool.call**

在 `agents/main/wire.jsonl` 中，turn 4 step 32 的记录如下：

- **entry 472**: `step.begin`（uuid=`8f932678-57bd-4cbc-845d-75a9944f1bb9`）
- **entry 473-474**: `content.part`（思考 + 文本）
- **entry 475**: `tool.call` — `TaskOutput`，`toolCallId=tool_iq3KtmRW2h0g5fZpAxmE76r2`
- **entry 476**: 后台 Agent 丢失的通知消息（被 defer）
- **entry 477-478**: 用户输入"继续"（被 defer）

**但是该 tool.call 之后没有任何 `tool.result` 记录，也没有对应的 `step.end`。**

**2. ContextMemory 恢复后处于脏状态**

根据源码逻辑（`packages/agent-core/src/agent/context/index.ts`）：

- `tool.call` 会把 toolCallId 加入 `pendingToolResultIds`
- `tool.result` 会将其移除
- `step.end` 会关闭 `openSteps`

由于强制中断，`tool.result` 和 `step.end` 都未能写入 wire log。恢复重播后：

- `pendingToolResultIds` 永远卡着该 id
- `openSteps` 永远有一个未关闭的 step
- `hasOpenToolExchange()` 永远返回 `true`

这导致两个连锁问题：

1. **后续所有新消息被无限 defer**：`appendUserMessage()` → `appendMessage()` 发现 `hasOpenToolExchange()` 为 true → 消息被塞进 `deferredMessages`，永远不进入 `_history`。所以发"继续"后 LLM 根本看不到这条消息。

2. **`project()` 把不完整的 assistant 消息发给了 LLM API**：`packages/agent-core/src/agent/context/projector.ts` 的过滤逻辑只排除了"空 assistant 消息"，但**没有检查** assistant 的 `toolCalls` 是否在后面都有对应的 `tool` 消息。于是带着 `toolCalls` 但缺少 `tool` 结果的非法消息序列直接发给了 API，触发 400。




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Session resume breaks after force-interrupt during tool execution (400 tool_call_ids missing) #269

What version of Kimi Code is running?

Which open platform/subscription were you using?

Which model were you using?

What platform is your computer?

What issue are you seeing?

What steps can reproduce the bug?

What is the expected behavior?

Additional information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Session resume breaks after force-interrupt during tool execution (400 tool_call_ids missing) #269

Description

What version of Kimi Code is running?

Which open platform/subscription were you using?

Which model were you using?

What platform is your computer?

What issue are you seeing?

What steps can reproduce the bug?

What is the expected behavior?

Additional information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions