Skip to content

fix(session): 串行化 SQLite 会话写入#631

Merged
phantom5099 merged 1 commit into
1024XEngineer:mainfrom
Yumiue:codex/fix-sqlite-session-concurrency
May 11, 2026
Merged

fix(session): 串行化 SQLite 会话写入#631
phantom5099 merged 1 commit into
1024XEngineer:mainfrom
Yumiue:codex/fix-sqlite-session-concurrency

Conversation

@Yumiue
Copy link
Copy Markdown
Collaborator

@Yumiue Yumiue commented May 11, 2026

背景

运行中同一会话可能因为多个工具结果并发回灌、Gateway 管理写入口绕过 Runtime 会话锁等路径,同时竞争 SQLite 写事务,偶发触发 database is locked 并中断当前会话。

当前修改

  • SQLiteStore 内增加单 writer 写锁,并将 SQLite 连接池收敛为单连接,避免同一工作区数据库多 writer 抢锁。
  • 保留工具执行并发,但将 tool result 的 transcript 持久化改为按原始 tool call 顺序串行回灌。
  • 增加 Runtime 会话 mutation 边界,Gateway 的删除、重命名、模型切换、provider/model 同步优先走 Runtime 会话锁。
  • 修复两个 Windows 环境敏感测试点:卷根 .git 误判、slash-rooted 路径解析。
  • 补充 SQLite 并发写、工具结果顺序回灌、Gateway mutation 边界路由相关测试。

验证

  • git diff --check
  • go test ./...
  • CGO_ENABLED=1 go test -race ./internal/session ./internal/runtime

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Copy link
Copy Markdown

@fennoai fennoai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found 2 noteworthy issues in the changed code. One is a behavior regression in runtime event delivery for concurrent tool calls, and the other is a lock-granularity regression that can stall unrelated session writes during large asset saves.

@@ -94,6 +98,14 @@ func (s *Service) executeAssistantToolCalls(
if !ok {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium: workerWG.Wait() now completes before any call to persistCompletedToolCallResult, which means EventToolResult, todo/facts updates, and remembered-state updates are all held back until every tool finishes. With calls like [slow, fast], the fast tool no longer surfaces its result when it completes; it is delayed behind the slow one. The PR description says only transcript persistence should become ordered, but this change also batches user-visible runtime events and state propagation for the entire tool phase.

@@ -601,6 +623,9 @@ func (s *SQLiteStore) SaveAsset(ctx context.Context, sessionID string, r io.Read
if err := validateStorageID("session id", sessionID); err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium: SaveAsset takes the global writeMu before the full asset write path, including io.Copy, Sync, Close, and the rename. Because r can be slow and assets can be large, one upload now blocks every other session write in the workspace for the entire file I/O duration. The original bug was SQLite writer contention, but this lock scope serializes unrelated non-SQLite work as well and can noticeably stall transcript/title/state updates under load.

@Yumiue Yumiue force-pushed the codex/fix-sqlite-session-concurrency branch from 26ef0a3 to e2a18d4 Compare May 11, 2026 13:06
@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

@phantom5099 phantom5099 merged commit 882fad7 into 1024XEngineer:main May 11, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants