Skip to content

feat(adk): improve TurnLoop stop cleanup and add StopOption controls#925

Merged
meguminnnnnnnnn merged 3 commits intoalpha/09from
fix/turnloop_late_items
Apr 8, 2026
Merged

feat(adk): improve TurnLoop stop cleanup and add StopOption controls#925
meguminnnnnnnnn merged 3 commits intoalpha/09from
fix/turnloop_late_items

Conversation

@shentongmartin
Copy link
Copy Markdown
Contributor

@shentongmartin shentongmartin commented Mar 31, 2026

TurnLoop stop cleanup and explicit StopOption controls

Problem

TurnLoop had several cleanup and control gaps after Stop().

Cleanup blind spots (existing):

First, pushes that arrived after the loop was already stopped only returned false; the caller had no structured way to recover those items later.

Second, checkpoint persistence status was folded back into general exit handling. That made it hard to answer two different questions independently:

  • did the loop exit cleanly?
  • was a checkpoint attempted and did it succeed?

The stale-checkpoint cleanup path also relied on an undocumented fallback behavior for stores without CheckPointDeleter, which made the ownership of checkpoint lifecycle unclear.

Missing stop controls (new):

Third, callers had no way to tell the TurnLoop "do not persist a checkpoint for this stop" — useful when the business knows it will not resume in the future. Every Stop with pending state would produce a checkpoint write, even when the data would never be consumed.

Fourth, there was no way to attach a business-level reason to Stop. The existing ExitReason carries the technical error (context cancellation, *CancelError, callback error), but callers could not distinguish why the stop was requested at the business level.

Solution

Part 1: Late-stop recovery and checkpoint observability (existing commits)

TurnLoopExitState now carries:

  • Checkpointed to indicate whether cleanup attempted a checkpoint save
  • CheckpointErr to report the save result separately from ExitReason
  • TakeLateItems() to retrieve inputs that were pushed after stop and therefore never entered the normal unhandled-item path

On the write path, failed post-stop Push() calls append to a dedicated late-item buffer. On the cleanup path, checkpoint saves are gated to genuine Stop-caused exits, so concurrent callback errors do not produce resumable state from an inconsistent turn.

The checkpoint deletion contract is also made explicit: stores that want automatic stale-checkpoint cleanup must implement CheckPointDeleter; otherwise lifecycle management belongs to the store owner.

Part 2: Explicit StopOption controls (new commit)

Two new StopOption variants for TurnLoop.Stop():

WithSkipCheckpoint() — tells the TurnLoop not to persist a checkpoint on this stop. The flag is sticky: once any Stop() call sets it, subsequent escalation calls cannot undo it. When set, the shouldSaveCheckpoint condition in cleanup() evaluates to false, and any stale checkpoint from a prior run is deleted.

WithStopCause(cause string) — attaches a business-supplied reason string. The cause is surfaced in two places:

  • TurnLoopExitState.StopCause — available after Wait() returns
  • TurnContext.StopCause() — available inside OnAgentEvents after <-tc.Stopped closes

Uses first-non-empty-wins semantics: the first Stop() call with a cause sets it; subsequent escalation calls do not overwrite.

Both fields are threaded through stopSignal with proper mutex protection. StopCause on TurnContext is a func() string (closure over stopSignal.getStopCause) rather than a plain string, because TurnContext is constructed before Stop() may be called. The caller reads it after <-tc.Stopped, at which point the value is guaranteed to be set.

Decisions

  • Keep checkpoint persistence status separate from ExitReason so loop termination and checkpoint durability are observable as independent outcomes.
  • Treat late pushes as recoverable data rather than silently dropping them or overloading UnhandledItems.
  • Make CheckPointDeleter semantics explicit instead of preserving an implicit nil-write deletion convention.
  • Make skipCheckpoint sticky across escalation calls — the intent "don't persist" should not be undone by a subsequent Stop() that only escalates the cancel mode.
  • Use first-non-empty-wins for stopCause — the business sets the cause once; escalation calls are about cancel mechanics, not about changing the reason.
  • Expose StopCause on TurnContext as a function to avoid races — the context is constructed at turn start but the cause may arrive mid-turn.

Key Insight

TurnLoop has three distinct classes of leftover data at shutdown, and they need different ownership:

  • UnhandledItems: accepted by the loop but never consumed
  • CanceledItems: tied to the interrupted turn and resumable checkpoint state
  • late items: rejected after stop and therefore owned by the caller, not by the resumed loop state

Additionally, the reason for stopping is a separate dimension from the mechanism of stopping. ExitReason captures the technical mechanism (*CancelError, context cancellation), while StopCause captures the business intent ("user session timeout", "quota exceeded"). Keeping these separate makes stop/resume behavior predictable and debuggable.

Summary

Problem Solution
Post-stop pushes were only rejected, not recoverable Buffer them separately and expose TakeLateItems()
Exit state could not distinguish loop failure from checkpoint save failure Add Checkpointed and CheckpointErr alongside ExitReason
Stale-checkpoint cleanup ownership was implicit Document CheckPointDeleter as the explicit cleanup mechanism
No way to skip checkpoint on intentional permanent stop Add WithSkipCheckpoint() StopOption, sticky across escalation
No way to attach business-level stop reason Add WithStopCause() StopOption, surfaced in TurnLoopExitState.StopCause and TurnContext.StopCause()

TurnLoop 停止清理及显式 StopOption 控制

问题

TurnLoopStop() 之后有若干清理和控制方面的缺陷。

清理盲点(已有):

第一,loop 已经停止后又到来的 Push() 虽然会返回 false,但调用方没有结构化方式在之后取回这些输入。

第二,检查点持久化状态会和通用退出结果混在一起,导致两个问题难以独立判断:loop 是否正常退出?是否尝试保存检查点?保存是否成功?

另外,旧检查点清理路径此前依赖 store 在未实现 CheckPointDeleter 时的隐式兜底行为,这使检查点生命周期的责任边界不清晰。

缺少停止控制(新增):

第三,调用方无法告诉 TurnLoop "这次 Stop 不要持久化检查点"——当业务侧确定以后不会 Resume 时,这很常见。此前每次 Stop 只要有待处理状态都会产生一次检查点写入,即使数据永远不会被消费。

第四,无法在 Stop 时附带业务层面的停止原因。已有的 ExitReason 承载的是技术层面的错误(context 取消、*CancelError、callback error),但调用方无法区分为什么在业务层面要求停止。

方案

第一部分:晚到输入恢复和检查点可观测性(已有 commit)

TurnLoopExitState 现在新增:

  • Checkpointed:表示 cleanup 是否尝试保存检查点
  • CheckpointErr:单独表示检查点保存结果,而不是塞进 ExitReason
  • TakeLateItems():取回 Stop 之后才被 push、因此没有进入正常未处理路径的输入

在写入路径上,Stop 之后失败的 Push() 会进入独立的晚到输入缓冲区。在 cleanup 路径上,检查点保存仅允许发生在真正由 Stop 导致的退出场景。

检查点删除契约也被明确化:如果 store 需要自动清理过期检查点,就必须实现 CheckPointDeleter

第二部分:显式 StopOption 控制(新 commit)

TurnLoop.Stop() 新增两个 StopOption

WithSkipCheckpoint() — 告知 TurnLoop 本次 Stop 不要持久化检查点。标志位是粘性的:一旦任何 Stop() 调用设置了它,后续升级调用无法撤消。设置后 cleanup() 中的 shouldSaveCheckpoint 条件为 false,并且会清理之前加载的旧检查点。

WithStopCause(cause string) — 附带一个业务侧提供的原因字符串。原因在两个地方可获取:

  • TurnLoopExitState.StopCause — 在 Wait() 返回后可用
  • TurnContext.StopCause() — 在 OnAgentEvents 回调中,<-tc.Stopped 关闭后可用

使用"首个非空值胜出"语义:第一个带 cause 的 Stop() 设置值;后续升级调用不会覆盖。

两个字段都通过 stopSignal 传递,有正确的 mutex 保护。TurnContext 上的 StopCausefunc() string(闭包引用 stopSignal.getStopCause),因为 TurnContext 在 turn 开始时就构造了,而 Stop() 可能在 turn 中途才到来。调用方在 <-tc.Stopped 之后读取,此时值已保证就绪。

决策

  • 将检查点持久化状态与 ExitReason 分离,使 loop 退出结果和检查点耐久性成为两个独立可观测结果。
  • 将 Stop 之后的 push 视为可恢复数据,而不是静默丢弃或混入 UnhandledItems
  • 明确 CheckPointDeleter 语义,而不是继续依赖隐式的 nil 写入删除约定。
  • skipCheckpoint 在升级调用中保持粘性 —— "不要持久化" 的意图不应被后续仅升级取消模式的 Stop() 撤消。
  • stopCause 使用首个非空值胜出 —— 业务侧设置一次原因;升级调用关心的是取消机制,不是改变原因。
  • TurnContext 上以函数方式暴露 StopCause,避免竞态 —— context 在 turn 开始时构造,但 cause 可能在 turn 中途才到达。

关键洞察

TurnLoop 在停止时会留下三类语义不同的数据,必须分别归属:

  • UnhandledItems:已经被 loop 接收,但还没被消费
  • CanceledItems:属于被中断 turn 的可恢复状态
  • 晚到输入:发生在 Stop 之后,未被 loop 接收,因此归调用方处理

此外,停止的原因和停止的机制是两个独立维度。ExitReason 捕捉的是技术机制(*CancelError、context 取消),而 StopCause 捕捉的是业务意图("用户会话超时"、"配额用尽")。将二者分开,使 Stop / Resume 语义可预测且可调试。

总结

问题 方案
Stop 之后的 push 只能失败,无法恢复输入 单独缓存,并通过 TakeLateItems() 暴露
退出结果无法区分 loop 失败与检查点保存失败 ExitReason 之外新增 CheckpointedCheckpointErr
过期检查点清理责任不明确 CheckPointDeleter 文档化为显式清理机制
无法在永久停止时跳过检查点 新增 WithSkipCheckpoint() StopOption,在升级调用中保持粘性
无法附带业务层面的停止原因 新增 WithStopCause() StopOption,在 TurnLoopExitState.StopCauseTurnContext.StopCause() 中暴露

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 31, 2026

Codecov Report

❌ Patch coverage is 85.96491% with 8 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (alpha/09@92ae043). Learn more about missing BASE report.

Files with missing lines Patch % Lines
adk/turn_loop.go 85.96% 6 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             alpha/09     #925   +/-   ##
===========================================
  Coverage            ?   81.72%           
===========================================
  Files               ?      158           
  Lines               ?    19024           
  Branches            ?        0           
===========================================
  Hits                ?    15547           
  Misses              ?     2345           
  Partials            ?     1132           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@shentongmartin shentongmartin force-pushed the fix/turnloop_late_items branch 2 times, most recently from 886a131 to 165eb23 Compare April 1, 2026 03:42
@shentongmartin shentongmartin changed the title fix(adk): keep late turn loop items feat(adk): improve TurnLoop stop cleanup and add StopOption controls Apr 7, 2026
Change-Id: Iabee0c25a83d5a25585d3592a41ca6a5fba35c2b
Change-Id: Ia0a396b9cc2e43f15e85056d966f20b010dcd2b6
Add two new StopOption variants for TurnLoop.Stop():

- WithSkipCheckpoint: prevents checkpoint persistence on stop, for
  cases where the caller does not intend to resume in the future.
  The flag is sticky across escalation calls.

- WithStopCause: attaches a business-supplied reason string. Surfaced
  in TurnLoopExitState.StopCause and, after the Stopped channel
  closes, via TurnContext.StopCause(). Uses first-non-empty-wins
  semantics across multiple Stop() calls.

Thread both fields through stopSignal with proper mutex protection.
Update cleanup() to skip checkpoint save when skipCheckpoint is set.

Change-Id: Ifeat-stop-options-skip-checkpoint-stop-cause
@shentongmartin shentongmartin force-pushed the fix/turnloop_late_items branch from e1b3d66 to 77b6d6d Compare April 7, 2026 09:42
Comment thread adk/turn_loop.go
Comment thread adk/turn_loop.go
@meguminnnnnnnnn meguminnnnnnnnn merged commit 23bf51b into alpha/09 Apr 8, 2026
16 checks passed
@meguminnnnnnnnn meguminnnnnnnnn deleted the fix/turnloop_late_items branch April 8, 2026 06:06
shentongmartin added a commit that referenced this pull request Apr 8, 2026
…925)

* fix(adk): keep late turn loop items

Change-Id: Iabee0c25a83d5a25585d3592a41ca6a5fba35c2b

* docs(adk): clarify cancel wait semantics

Change-Id: Ia0a396b9cc2e43f15e85056d966f20b010dcd2b6

* feat(adk): add WithSkipCheckpoint and WithStopCause StopOptions

Add two new StopOption variants for TurnLoop.Stop():

- WithSkipCheckpoint: prevents checkpoint persistence on stop, for
  cases where the caller does not intend to resume in the future.
  The flag is sticky across escalation calls.

- WithStopCause: attaches a business-supplied reason string. Surfaced
  in TurnLoopExitState.StopCause and, after the Stopped channel
  closes, via TurnContext.StopCause(). Uses first-non-empty-wins
  semantics across multiple Stop() calls.

Thread both fields through stopSignal with proper mutex protection.
Update cleanup() to skip checkpoint save when skipCheckpoint is set.

Change-Id: Ifeat-stop-options-skip-checkpoint-stop-cause
shentongmartin added a commit that referenced this pull request Apr 8, 2026
…925)

* fix(adk): keep late turn loop items

Change-Id: Iabee0c25a83d5a25585d3592a41ca6a5fba35c2b

* docs(adk): clarify cancel wait semantics

Change-Id: Ia0a396b9cc2e43f15e85056d966f20b010dcd2b6

* feat(adk): add WithSkipCheckpoint and WithStopCause StopOptions

Add two new StopOption variants for TurnLoop.Stop():

- WithSkipCheckpoint: prevents checkpoint persistence on stop, for
  cases where the caller does not intend to resume in the future.
  The flag is sticky across escalation calls.

- WithStopCause: attaches a business-supplied reason string. Surfaced
  in TurnLoopExitState.StopCause and, after the Stopped channel
  closes, via TurnContext.StopCause(). Uses first-non-empty-wins
  semantics across multiple Stop() calls.

Thread both fields through stopSignal with proper mutex protection.
Update cleanup() to skip checkpoint save when skipCheckpoint is set.

Change-Id: Ifeat-stop-options-skip-checkpoint-stop-cause
mrh997 pushed a commit that referenced this pull request Apr 10, 2026
…925)

* fix(adk): keep late turn loop items

Change-Id: Iabee0c25a83d5a25585d3592a41ca6a5fba35c2b

* docs(adk): clarify cancel wait semantics

Change-Id: Ia0a396b9cc2e43f15e85056d966f20b010dcd2b6

* feat(adk): add WithSkipCheckpoint and WithStopCause StopOptions

Add two new StopOption variants for TurnLoop.Stop():

- WithSkipCheckpoint: prevents checkpoint persistence on stop, for
  cases where the caller does not intend to resume in the future.
  The flag is sticky across escalation calls.

- WithStopCause: attaches a business-supplied reason string. Surfaced
  in TurnLoopExitState.StopCause and, after the Stopped channel
  closes, via TurnContext.StopCause(). Uses first-non-empty-wins
  semantics across multiple Stop() calls.

Thread both fields through stopSignal with proper mutex protection.
Update cleanup() to skip checkpoint save when skipCheckpoint is set.

Change-Id: Ifeat-stop-options-skip-checkpoint-stop-cause
shentongmartin added a commit that referenced this pull request Apr 13, 2026
…925)

* fix(adk): keep late turn loop items

Change-Id: Iabee0c25a83d5a25585d3592a41ca6a5fba35c2b

* docs(adk): clarify cancel wait semantics

Change-Id: Ia0a396b9cc2e43f15e85056d966f20b010dcd2b6

* feat(adk): add WithSkipCheckpoint and WithStopCause StopOptions

Add two new StopOption variants for TurnLoop.Stop():

- WithSkipCheckpoint: prevents checkpoint persistence on stop, for
  cases where the caller does not intend to resume in the future.
  The flag is sticky across escalation calls.

- WithStopCause: attaches a business-supplied reason string. Surfaced
  in TurnLoopExitState.StopCause and, after the Stopped channel
  closes, via TurnContext.StopCause(). Uses first-non-empty-wins
  semantics across multiple Stop() calls.

Thread both fields through stopSignal with proper mutex protection.
Update cleanup() to skip checkpoint save when skipCheckpoint is set.

Change-Id: Ifeat-stop-options-skip-checkpoint-stop-cause
shentongmartin added a commit that referenced this pull request Apr 14, 2026
…925)

* fix(adk): keep late turn loop items

Change-Id: Iabee0c25a83d5a25585d3592a41ca6a5fba35c2b

* docs(adk): clarify cancel wait semantics

Change-Id: Ia0a396b9cc2e43f15e85056d966f20b010dcd2b6

* feat(adk): add WithSkipCheckpoint and WithStopCause StopOptions

Add two new StopOption variants for TurnLoop.Stop():

- WithSkipCheckpoint: prevents checkpoint persistence on stop, for
  cases where the caller does not intend to resume in the future.
  The flag is sticky across escalation calls.

- WithStopCause: attaches a business-supplied reason string. Surfaced
  in TurnLoopExitState.StopCause and, after the Stopped channel
  closes, via TurnContext.StopCause(). Uses first-non-empty-wins
  semantics across multiple Stop() calls.

Thread both fields through stopSignal with proper mutex protection.
Update cleanup() to skip checkpoint save when skipCheckpoint is set.

Change-Id: Ifeat-stop-options-skip-checkpoint-stop-cause
N3kox pushed a commit that referenced this pull request Apr 14, 2026
…925)

* fix(adk): keep late turn loop items

Change-Id: Iabee0c25a83d5a25585d3592a41ca6a5fba35c2b

* docs(adk): clarify cancel wait semantics

Change-Id: Ia0a396b9cc2e43f15e85056d966f20b010dcd2b6

* feat(adk): add WithSkipCheckpoint and WithStopCause StopOptions

Add two new StopOption variants for TurnLoop.Stop():

- WithSkipCheckpoint: prevents checkpoint persistence on stop, for
  cases where the caller does not intend to resume in the future.
  The flag is sticky across escalation calls.

- WithStopCause: attaches a business-supplied reason string. Surfaced
  in TurnLoopExitState.StopCause and, after the Stopped channel
  closes, via TurnContext.StopCause(). Uses first-non-empty-wins
  semantics across multiple Stop() calls.

Thread both fields through stopSignal with proper mutex protection.
Update cleanup() to skip checkpoint save when skipCheckpoint is set.

Change-Id: Ifeat-stop-options-skip-checkpoint-stop-cause
shentongmartin added a commit that referenced this pull request Apr 14, 2026
…925)

* fix(adk): keep late turn loop items

Change-Id: Iabee0c25a83d5a25585d3592a41ca6a5fba35c2b

* docs(adk): clarify cancel wait semantics

Change-Id: Ia0a396b9cc2e43f15e85056d966f20b010dcd2b6

* feat(adk): add WithSkipCheckpoint and WithStopCause StopOptions

Add two new StopOption variants for TurnLoop.Stop():

- WithSkipCheckpoint: prevents checkpoint persistence on stop, for
  cases where the caller does not intend to resume in the future.
  The flag is sticky across escalation calls.

- WithStopCause: attaches a business-supplied reason string. Surfaced
  in TurnLoopExitState.StopCause and, after the Stopped channel
  closes, via TurnContext.StopCause(). Uses first-non-empty-wins
  semantics across multiple Stop() calls.

Thread both fields through stopSignal with proper mutex protection.
Update cleanup() to skip checkpoint save when skipCheckpoint is set.

Change-Id: Ifeat-stop-options-skip-checkpoint-stop-cause
shentongmartin added a commit that referenced this pull request Apr 15, 2026
…925)

* fix(adk): keep late turn loop items

Change-Id: Iabee0c25a83d5a25585d3592a41ca6a5fba35c2b

* docs(adk): clarify cancel wait semantics

Change-Id: Ia0a396b9cc2e43f15e85056d966f20b010dcd2b6

* feat(adk): add WithSkipCheckpoint and WithStopCause StopOptions

Add two new StopOption variants for TurnLoop.Stop():

- WithSkipCheckpoint: prevents checkpoint persistence on stop, for
  cases where the caller does not intend to resume in the future.
  The flag is sticky across escalation calls.

- WithStopCause: attaches a business-supplied reason string. Surfaced
  in TurnLoopExitState.StopCause and, after the Stopped channel
  closes, via TurnContext.StopCause(). Uses first-non-empty-wins
  semantics across multiple Stop() calls.

Thread both fields through stopSignal with proper mutex protection.
Update cleanup() to skip checkpoint save when skipCheckpoint is set.

Change-Id: Ifeat-stop-options-skip-checkpoint-stop-cause
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants