Skip to content

fix(adk): preserve nil agentCancelOpts in stopSignal.check to prevent UntilIdleFor from canceling agent#972

Merged
shentongmartin merged 4 commits intoalpha/09from
fix/turn-loop-idle-for-notify
Apr 17, 2026
Merged

fix(adk): preserve nil agentCancelOpts in stopSignal.check to prevent UntilIdleFor from canceling agent#972
shentongmartin merged 4 commits intoalpha/09from
fix/turn-loop-idle-for-notify

Conversation

@shentongmartin
Copy link
Copy Markdown
Contributor

@shentongmartin shentongmartin commented Apr 16, 2026

Problem

Stop(UntilIdleFor(...)) spuriously cancels the running agent instead of deferring the stop until idle.

Expected: Stop(UntilIdleFor(30s)) → agent finishes → idle timer ticks → loop exits
Actual:   Stop(UntilIdleFor(30s)) → agent canceled immediately → loop exits

Solution

Two fixes in stopSignal.check() and Stop():

  1. Nil-preservation in check(): append([]AgentCancelOption{}, nil...) returns []AgentCancelOption{} (non-nil empty slice), not nil. The watchStopSignal goroutine's tryCancel uses opts == nil to distinguish "no cancel intent" from "CancelImmediate" (empty opts). Added a nil guard so check() returns nil when agentCancelOpts is nil.

  2. Drop cancel opts when paired with UntilIdleFor: Stop(UntilIdleFor(d), WithImmediate()) is semantically contradictory — "wait for idle" + "cancel now". Stop() now clears agentCancelOpts when idleFor > 0. To escalate, users must issue a separate Stop(WithImmediate()) call.

Key Insight

UntilIdleFor operates on a different axis than cancel options. It controls when the loop exits (after idle duration), while cancel options control how the running agent is interrupted. Combining them in the same Stop() call is meaningless: if you're waiting for idle, there's no running agent to cancel. Cancel opts only make sense in a follow-up Stop() call that overrides the idle wait:

loop.Stop(UntilIdleFor(30 * time.Second))  // wait for idle
// ... later, if you need to abort:
loop.Stop(WithImmediate())                 // overrides idle, cancels agent

问题

Stop(UntilIdleFor(...)) 会错误地取消正在运行的 agent,而非等待空闲后再停止。

期望:Stop(UntilIdleFor(30s)) → agent 完成 → 空闲计时器计时 → 循环退出
实际:Stop(UntilIdleFor(30s)) → agent 被立即取消 → 循环退出

解决方案

stopSignal.check()Stop() 中进行了两处修复:

  1. check() 中的 nil 保留append([]AgentCancelOption{}, nil...) 返回非 nil 的空切片 []AgentCancelOption{}watchStopSignaltryCancel 通过 opts == nil 区分"无取消意图"和"CancelImmediate"(空 opts)。添加了 nil 判断,使 check()agentCancelOpts 为 nil 时直接返回 nil。

  2. UntilIdleFor 搭配 cancel opts 时丢弃后者Stop(UntilIdleFor(d), WithImmediate()) 语义矛盾——"等待空闲" + "立即取消"。Stop() 现在在 idleFor > 0 时清除 agentCancelOpts。要升级停止方式,用户需发起单独的 Stop(WithImmediate()) 调用。

核心洞察

UntilIdleFor 与 cancel 选项处于不同的语义轴。它控制的是循环何时退出(空闲一段时间后),而 cancel 选项控制的是如何中断正在运行的 agent。在同一个 Stop() 调用中组合它们没有意义:如果你在等待空闲,就没有正在运行的 agent 需要取消。Cancel opts 只在覆盖空闲等待的后续 Stop() 调用中才有意义。

@shentongmartin shentongmartin force-pushed the fix/turn-loop-idle-for-notify branch from 6c3a472 to aa29f91 Compare April 16, 2026 07:26
… UntilIdleFor from canceling agent

stopSignal.check() used append([]AgentCancelOption{}, s.agentCancelOpts...)
which converts nil to a non-nil empty slice. In watchStopSignal's tryCancel,
the nil check (opts == nil) then fails, causing an empty-opts call to
agentCancelFunc — equivalent to CancelImmediate. This means
Stop(UntilIdleFor(...)) spuriously cancels the running agent instead of
deferring the stop until idle.

Change-Id: Ifdbde679ea54bc45c876283ea7a00d6a9bb45e9c
@shentongmartin shentongmartin force-pushed the fix/turn-loop-idle-for-notify branch from aa29f91 to 00e8898 Compare April 16, 2026 08:29
…lity comment

Add TestStopSignalCheck_NilPreservedUnderConcurrentSignals to
turn_loop_test.go, verifying that the nil guard in stopSignal.check()
does not race with concurrent signal() calls under the race detector.

Add an inline comment to tryCancel's nil check cross-referencing the
three-way semantic documented on stopSignal.agentCancelOpts.

Change-Id: I8c63db0980bb93a405e936cc2a47fb47586fa7b4
@shentongmartin shentongmartin force-pushed the fix/turn-loop-idle-for-notify branch from 3f6f758 to 9f7a76d Compare April 16, 2026 10:43
Move all 12 TestAttack_* functions from attack_test.go into
turn_loop_test.go and delete the now-empty file. No test logic changes.

Change-Id: Ic0d8ce9ac043b2caebdf103e4df0fac573ec6f5b
@shentongmartin shentongmartin force-pushed the fix/turn-loop-idle-for-notify branch from 9f7a76d to e8de5a2 Compare April 16, 2026 11:30
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (alpha/09@7854d56). Learn more about missing BASE report.

Additional details and impacted files
@@             Coverage Diff             @@
##             alpha/09     #972   +/-   ##
===========================================
  Coverage            ?   82.29%           
===========================================
  Files               ?      162           
  Lines               ?    20593           
  Branches            ?        0           
===========================================
  Hits                ?    16946           
  Misses              ?     2458           
  Partials            ?     1189           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

case 0 and case 1 in TestAttack_ConcurrentStopEscalation_RaceDetector
were both calling WithImmediate(). Restore case 0 to bare Stop() to
maintain the original 4-mode concurrent mix (bare, immediate, graceful
timeout, UntilIdleFor).

Change-Id: Id49dadc88b5d2719b0bb479c584216f869dfa86f
@shentongmartin shentongmartin force-pushed the fix/turn-loop-idle-for-notify branch from e57b092 to e2506a8 Compare April 16, 2026 12:39
@shentongmartin shentongmartin merged commit 77d5e7e into alpha/09 Apr 17, 2026
16 checks passed
@shentongmartin shentongmartin deleted the fix/turn-loop-idle-for-notify branch April 17, 2026 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants