Skip to content

refactor(adk): improve cancel propagation, encapsulate TurnLoop stop options, add UntilIdleFor#942

Merged
shentongmartin merged 27 commits intoalpha/09from
refactor/nested_cancel
Apr 14, 2026
Merged

refactor(adk): improve cancel propagation, encapsulate TurnLoop stop options, add UntilIdleFor#942
shentongmartin merged 27 commits intoalpha/09from
refactor/nested_cancel

Conversation

@shentongmartin
Copy link
Copy Markdown
Contributor

@shentongmartin shentongmartin commented Apr 8, 2026

Summary

Problem Solution
Cancel modes propagate to nested agents by default — inner agent safe-points fire before the root agent, surprising users Make propagation opt-in via WithRecursive(): cancel modes only affect the root agent by default
TurnLoop.Stop and TurnLoop.Push directly expose AgentCancelOption and CancelMode, leaking lower-level ChatModelAgent concepts into the higher-level TurnLoop API Replace raw options with use-case-driven APIs: WithGraceful/WithGracefulTimeout for Stop, WithPreempt/WithPreemptTimeout for Push

Feature 1: Opt-in Recursive Cancel (WithRecursive)

Key Insight

Cancel has two orthogonal dimensions: timing (when to cancel) and scope (where to cancel).

  • Timing is the CancelMode: CancelImmediate, CancelAfterChatModel, CancelAfterToolCalls
  • Scope is the AgentCancelOption: WithRecursive()

Conflating scope with timing (e.g., making CancelRecursive a bitmask bit in CancelMode) leads to problems:

  1. CancelImmediate = 0 cannot be OR'd with anything
  2. Timing and scope are different categories — they belong in different types

Design Decisions

Why root-only by default? In ChatModelAgent → AgentTool → ChatModelAgent, users expect CancelAfterChatModel to fire after the root agent's ChatModel, not an arbitrary inner one. Deep propagation is powerful but surprising for the common case.

Monotonic escalation. Once WithRecursive() is set via any cancel call, it stays set. Escalation is monotonic, never reversed.

Mid-flight escalation. A recursiveChan (closed on first setRecursive(true)) enables mid-flight escalation: parked deriveChild goroutines wake up and propagate immediately when a later cancel call adds WithRecursive().

Grace period gating. wrapGraphInterruptWithGracePeriod now checks isRecursive() && hasActiveChildren() — without recursive mode, there are no child interrupts to wait for.

Feature 2: TurnLoop Cancel Option Encapsulation

Key Insight

TurnLoop is a higher-level orchestrator; it should not expose ChatModelAgent-specific cancel concepts (CancelMode, AgentCancelOption) in its public API. Instead, it should present use-case-driven vocabulary:

  • Stop is about shutdown: "graceful" vs "immediate" (the default).
  • Preempt is about interrupting the current turn at a "safe point", optionally with a timeout that escalates to immediate cancel.

The SafePoint bitmask (AfterToolCalls, AfterChatModelCall, AnySafePoint) maps internally to CancelMode but hides that detail from TurnLoop users.

API Mapping

Old (removed) New
Stop(WithAgentCancel(WithAgentCancelMode(CancelImmediate))) Stop()
Stop(WithAgentCancel(WithAgentCancelMode(CancelAfterToolCalls))) Stop(WithGraceful())
Stop(WithAgentCancel(..., WithAgentCancelTimeout(d))) Stop(WithGracefulTimeout(d))
Push(x, WithPreempt[T]()) Push(x, WithPreempt[T](AnySafePoint))
Push(x, WithPreempt[T](WithAgentCancelMode(CancelAfterToolCalls))) Push(x, WithPreempt[T](AfterToolCalls))
Push(x, WithPreempt[T](WithAgentCancelMode(CancelImmediate))) Push(x, WithPreemptTimeout[T](AnySafePoint, smallDuration))

Design Decisions

Two-function pattern over variadic/pointer. WithGraceful() + WithGracefulTimeout(d) instead of WithGraceful(d ...time.Duration) or WithGraceful(d *time.Duration). Variadic allows passing multiple values (semantic mismatch); 0 as a duration conflicts with WithGraphInterruptTimeout(0) meaning "immediate".

WithGraceful bundles recursive automatically. Graceful shutdown implies wanting to reach a safe point across the entire agent hierarchy, so WithRecursive() is included by default. Immediate stop (bare Stop()) does not set recursive — it just cancels the context.

Preempt does not set recursive. Preemption is a turn-level concern. The user preempts the current turn to process a higher-priority item; recursive propagation is not the default expectation.


摘要

问题 方案
Cancel 模式默认传播到 AgentTool 内的嵌套 Agent,内层安全点先触发,不符合预期 改为 opt-in:默认只影响根 Agent,使用 WithRecursive() 显式传播
TurnLoop.StopPush 直接暴露 AgentCancelOption / CancelMode,将底层 ChatModelAgent 概念泄露到更高层的 TurnLoop API 用面向用例的 API 替代:Stop 用 WithGraceful/WithGracefulTimeout,Push 用 WithPreempt/WithPreemptTimeout

功能 1:可选递归取消(WithRecursive

核心洞察

Cancel 有两个正交维度:时机CancelMode)和范围WithRecursive())。将 scope 混入 bitmask 会导致 CancelImmediate = 0 无法做 OR 运算。分离为独立的 AgentCancelOption 更清晰。

设计要点

  • 默认只影响根 Agent:用户期望 CancelAfterChatModel 在根 Agent 触发,而非内层。
  • 单调升级WithRecursive() 一旦设置不可回退。
  • recursiveChan 中途升级:后续 cancel 调用加入 WithRecursive() 后,已阻塞的 deriveChild goroutine 立即被唤醒。
  • Grace period 条件化:只在 recursive 且有活跃子节点时生效。

功能 2:TurnLoop Cancel Option 封装

核心洞察

TurnLoop 是更高层的编排器,不应暴露 ChatModelAgent 特有的 cancel 概念。用面向用例的词汇替代:

  • Stop:优雅关停(WithGraceful/WithGracefulTimeout)vs 立即关停(默认)
  • Preempt:在安全点中断当前轮次(WithPreempt(SafePoint)),可选超时升级(WithPreemptTimeout

SafePoint 位掩码(AfterToolCallsAfterChatModelCallAnySafePoint)内部映射为 CancelMode,但对 TurnLoop 用户隐藏了这一细节。

设计要点

  • 双函数模式WithGraceful() + WithGracefulTimeout(d) 而非变参或指针,避免语义歧义。
  • WithGraceful 自动包含递归:优雅关停意味着整个层级都应到达安全点。
  • Preempt 不设递归:Preempt 是轮次级关注点,默认不传播到子 Agent。

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 8, 2026

Codecov Report

❌ Patch coverage is 93.98907% with 11 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (alpha/09@26b0af2). Learn more about missing BASE report.

Files with missing lines Patch % Lines
adk/turn_buffer.go 93.44% 2 Missing and 2 partials ⚠️
adk/turn_loop.go 95.40% 2 Missing and 2 partials ⚠️
adk/cancel.go 90.90% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             alpha/09     #942   +/-   ##
===========================================
  Coverage            ?   82.14%           
===========================================
  Files               ?      162           
  Lines               ?    20228           
  Branches            ?        0           
===========================================
  Hits                ?    16616           
  Misses              ?     2438           
  Partials            ?     1174           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@shentongmartin shentongmartin force-pushed the refactor/nested_cancel branch from 5dd483a to 284d35d Compare April 8, 2026 12:48
@shentongmartin shentongmartin changed the title refactor(adk): make cancel propagation to nested agents opt-in via WithRecursive refactor(adk): improve cancel propagation and encapsulate TurnLoop cancel options Apr 9, 2026
@shentongmartin shentongmartin force-pushed the refactor/nested_cancel branch 3 times, most recently from 54f0d53 to 36317f0 Compare April 10, 2026 07:28
@shentongmartin shentongmartin changed the title refactor(adk): improve cancel propagation and encapsulate TurnLoop cancel options refactor(adk): improve cancel propagation, encapsulate TurnLoop stop options, add UntilIdleFor Apr 10, 2026
@shentongmartin shentongmartin force-pushed the refactor/nested_cancel branch from 8304d17 to 833e586 Compare April 13, 2026 06:12
…thRecursive

- CancelAfterChatModel/CancelAfterToolCalls/CancelImmediate now only affect
  the root agent by default; descendant agents inside AgentTools are not
  notified unless WithRecursive() is passed as an AgentCancelOption.
- Add recursive int32 + recursiveChan fields to cancelContext for gating
  deriveChild propagation with support for escalation from non-recursive
  to recursive mid-flight.
- Modify wrapGraphInterruptWithGracePeriod to only apply grace period when
  recursive mode is active.
- Update existing tests that relied on deep propagation to include
  WithRecursive().
- Add 17 new unit tests covering shallow/recursive behavior, escalation,
  grandchild propagation, race conditions, and edge cases.

Change-Id: Id4c63e59fb9ec40cddb71df50159134cb548ca31
- Stop: replace 'idempotent' with 'may be called multiple times' since
  subsequent calls update cancel options (not idempotent by definition).
- Wait: fix 'ever' to 'never' — Wait blocks forever when Run is never
  called, not when it is called.

Change-Id: I71b7020223e53d0f9ec52a7551b81c02245ec9ec
Change-Id: I297ed416bc2345ec87c224a790270416c0b80742
- Add SafePoint bitmask type (AfterToolCalls, AfterChatModelCall, AnySafePoint)
- Replace WithAgentCancel with WithGraceful/WithGracefulTimeout for Stop
- Replace generic WithPreempt[T](opts...) with WithPreempt[T](SafePoint)
  and WithPreemptTimeout[T](SafePoint, Duration) for Push
- WithGraceful bundles safe-point + recursive automatically
- WithPreemptTimeout adds timeout-based escalation to immediate cancel
- Update all doc comments referencing old APIs

Change-Id: Id938445bebdcfc3261177af6eec51fbc4ac95ac6
Add guidance that the callback should not propagate CancelError:
return non-nil error only for callback-internal failures that should
terminate the loop; return nil when the agent is canceled by Stop or
Preempt.

Change-Id: I71d99bd407b79d7612743ee7e422c9fb3d35ee79
TestTurnLoop_ConcurrentPreemptsDuringTurn uses a mock agent that blocks
on ctx.Done(). With WithPreempt(AnySafePoint) the agent waits at a
safe point that never fires because the mock has no cancel infrastructure.
Switch to WithPreemptTimeout with 1ms so the timeout escalates to
immediate cancel and unblocks the agent reliably.

Change-Id: I1a21c6cda86a3051bc040e83d2b229aef21559aa
- Rename AfterChatModelCall to AfterChatModel to match CancelAfterChatModel
  and align with the plural pattern of AfterToolCalls.
- Add type-level doc comment to SafePoint explaining bitmask combinability
  and its relationship to CancelMode.
- Document Stop() root-only default: immediate cancel only affects the root
  agent; point users to WithGraceful() for recursive propagation.
- Add validation panic for gracePeriod <= 0 in WithGracefulTimeout,
  consistent with WithPreempt's zero-check.
- Add internal comments: deriveChild two-phase double-select pattern,
  setRecursive monotonic semantics, wrapGraphInterruptWithGracePeriod
  shallow-mode guard, buildCancelFunc recursive escalation asymmetry.

Change-Id: I82e54c0005b8cedc72f0d4b90a58727278c035cc
- Delete duplicate TestTurnLoop_StopOptionsArePassed (identical to StopWithMode).
- Add 7 panic coverage tests: WithGracefulTimeout(<=0), WithPreempt(0),
  WithPreemptTimeout(0), SafePoint.toCancelMode(), NewTurnLoop nil fields,
  and deriveChild(nil).
- Rewrite cancel_recursive_test.go: add assertNotClosedWithin channel
  helper replacing 6 time.Sleep(200ms) negative assertions; add
  setupParentChild helper; restructure 17 flat tests into t.Run groups
  (Shallow/Recursive/Escalation + Race).
- Tighten assertions: GreaterOrEqual to require.GreaterOrEqual for
  fail-fast, pushCount to Equal(2) where deterministic.
- Replace 16 assert.True(t, len(x) > 0) with assert.NotEmpty across
  cancel_test.go and turn_loop_test.go.
- Extract 3 cancel error helpers (assertHasCancelError,
  drainAndAssertCancelError, drainEventsAndAssertCancelError) applied
  to 6 sites in cancel_test.go.
- Extract newPreemptTestLoop helper applied to 4 preempt tests,
  removing ~84 lines of scaffold in turn_loop_test.go.

Change-Id: Ic7e6a6b3c9efc7c892af7079fb90db625c0369e5
- WithImmediate(): explicitly request immediate cancel (previously the
  bare Stop() default). Bare Stop() now means turn-boundary exit.
- UntilIdleFor(d): deferred stop that fires after the loop has been
  continuously idle for d. Timer resets on item arrival. Escalatable
  by subsequent Stop calls with cancel options.
- Refactor Stop into commitStop to support deferred-commit semantics.
- signal() now guards agentCancelOpts with nil-check for monotonic
  escalation (bare Stop never overwrites previously-set cancel opts).
- Extract tryCancel closure in watchStopSignal to reduce duplication.

Change-Id: Ic5c5f23a16ecec74eda9c37480be799f52c1a8ef
Change-Id: Ie5ddcd1e8bf4d02362ae0f6ea96de3a5778d09cf
Change-Id: I83c00a19051de11601b44837f81500ad18483413
Change-Id: I10fa4fdcf56c2367c779971e4cd6c9f5f32b1a1a
…review v4

Change-Id: I49a5049d75e398aa9287aa4f9c3583460db6a178
Change-Id: Ib29775ffc1b41a6aad6bca437db7f36f70c256cf
Change-Id: Ie65fbfa202c41951a7cf1021420b615369c0b1bb
Change-Id: I1b75fde802cdd29d418cb4099d6ebaca53c9d507
Change-Id: I5178edc93c337a3450a8a18184d1efe35afc819f
Change-Id: I018f237745f065420ce70c2ce769bedf7f5652d1
…turn

Change-Id: Iaa951a9fe7d0a1a02e6e6b8182488f9b28d5132e
…ePoint

Change-Id: Ibeca6f9c8aa994257fa4ebd50802aafe1b3bc2ac
Change-Id: I74400bc84085deb7f205dd1b27cd3e4160965f10
Change-Id: Iad6e40b5af1074e77b003469be76c78dcda2c639
…oundaries

Change-Id: I55560a4ec22b9b66711f50f07f53e8023b71ebeb
…e in Stop() doc

- 'cancel' → 'abort' for WithImmediate (consistent with earlier rewrite)
- 'the stop commits automatically' → 'the loop shuts down automatically'
- 'A non-UntilIdleFor call commits the stop immediately' → 'A Stop() call without UntilIdleFor shuts down the loop immediately'
- 'still pending' → 'still waiting'

Change-Id: Icab4d29abb57a55d17fd8b79ff40e70986f0cc51
…ed doc

Same consolidation as Stop() doc — replace internal 'commit' jargon
with 'the loop shuts down' in user-facing documentation.

Change-Id: I0370380aa7312bfcb7d2b12b5fefdecff6d916c0
TurnLoop's buffer needs PushFront, TakeAll, Wakeup, and ClearWakeup —
operations that are not part of a general-purpose unbounded channel.
These were bolted onto internal.UnboundedChan, making it do double duty
as both a simple MPSC queue (for compose/react) and a turn-execution
buffer with priority requeue and idle-wakeup signaling.

Extract a dedicated turnBuffer[T] type in the adk package that owns all
TurnLoop-specific buffer semantics. Strip UnboundedChan back to its
original Send/TrySend/Receive/Close surface.

Change-Id: Ic6d0677b4550b98326bf582b608136286bfb61b1
…afety

- Add attack_test.go with 12 adversarial tests covering UntilIdleFor
  concurrency, stop escalation, de-escalation guard, CanceledItems
  invariant, turnBuffer semantics, and concurrent Stop race conditions.
- Upgrade 2 assert.True(errors.As) to require.True in turn_loop_test.go
  where the extracted *CancelError was immediately dereferenced.

Change-Id: I18001c4b92d19bedb26ed005162d4778ee9c909d
@shentongmartin shentongmartin force-pushed the refactor/nested_cancel branch from f691573 to db499c0 Compare April 14, 2026 05:25
@shentongmartin shentongmartin merged commit f20ee8f into alpha/09 Apr 14, 2026
16 checks passed
@shentongmartin shentongmartin deleted the refactor/nested_cancel branch April 14, 2026 06:14
shentongmartin added a commit that referenced this pull request Apr 14, 2026
shentongmartin added a commit that referenced this pull request Apr 14, 2026
shentongmartin added a commit that referenced this pull request Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants