fix(core): stop AbortSignal listener leak in long sessions (MaxListenersExceededWarning)#4366
Conversation
…s in long sessions
Users hit `MaxListenersExceededWarning: 1509 abort listeners added to
[AbortSignal]` in long interactive sessions. The agent runtime nests
parent→child controllers (masterAbortController → per-message round →
per-API-call round → tool execution) and each layer registered its own
`addEventListener('abort', ...)` on the parent without `{once:true}` or
reverse cleanup, so listeners accumulated on long-lived parents across
hundreds of model turns.
Add `utils/abortController.ts` with three helpers:
- `createAbortController(maxListeners = 50)` — factory that pre-caps the
signal so the warning never fires on per-request signals.
- `createChildAbortController(parent)` — WeakRef-based parent→child
propagation with `{once:true}` on the parent listener AND a reverse-cleanup
listener on the child that detaches the parent listener when the child
aborts. This is the key mechanism — short-lived children stop accumulating
dead listeners on long-lived parents.
- `combineAbortSignals(signals, {timeoutMs})` — N-way combiner that replaces
the existing one-input `combinedAbortSignal.ts` (kept as a `@deprecated`
shim so `httpHookRunner.ts` doesn't churn).
Migrate every production `new AbortController()` in `packages/core/src` (24
sites) to the helper. Wrap `_runReasoningLoopInner` per-iteration body and
`AgentHeadless.execute` in `try/finally` so the round controller is aborted
(triggering reverse cleanup) even when the model stream or tool execution
throws. Add `{once:true}` to the manual abort listeners in `hookRunner`,
`functionHookRunner`, and `message-bus` that were missing it. Remove the
`raiseAbortListenerCap` band-aid from `openaiContentGenerator/pipeline.ts` —
no longer needed now that the per-round signal carries `maxListeners=50`.
Add `cli/utils/warningHandler.ts` as a belt-and-suspenders: hides
`MaxListenersExceededWarning.*AbortSignal` from end users in production
(any shape Node ≥20 emits), keeps it visible under `DEBUG`/`QWEN_DEBUG`/
`NODE_ENV=development`. Uses `process.on('warning', ...)` without
`removeAllListeners` so third-party warning subscribers stay intact.
Direct reproducer in `docs/verification/abort-controller-refactor/` proves
the old pattern accumulates 2000 listeners over 2000 rounds while the new
pattern stays at 0.
📋 Review SummaryThis PR addresses a critical 🔍 General Feedback
🎯 Specific Feedback🟢 Medium
🔵 Low
✅ Highlights
|
There was a problem hiding this comment.
Pull request overview
This PR addresses long-session MaxListenersExceededWarning issues by introducing a shared AbortController/AbortSignal helper in packages/core and migrating runtime/tool code to use it, plus adding a CLI-level warning suppressor as a fallback.
Changes:
- Added
utils/abortController.tswithcreateAbortController,createChildAbortController(reverse-cleanup), andcombineAbortSignals, plus unit tests. - Migrated many core runtime/services/tools to use the helper and to
abort()child controllers infinallyblocks to trigger reverse-cleanup. - Added CLI
warningHandlerto suppress remaining AbortSignal-related MaxListenersExceededWarning in non-debug mode, and added verification artifacts/docs.
Reviewed changes
Copilot reviewed 28 out of 30 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/core/src/utils/fetch.ts | Uses createAbortController() for timeout-based fetch abort. |
| packages/core/src/utils/abortController.ts | New helper implementing capped controllers, parent→child propagation, reverse-cleanup, and signal combining. |
| packages/core/src/utils/abortController.test.ts | Unit tests covering helper behavior (including listener-count invariants and GC best-effort). |
| packages/core/src/tools/shell.ts | Migrates internal controllers to helper to avoid listener warnings/leaks. |
| packages/core/src/tools/monitor.ts | Migrates monitor entry controller creation to helper. |
| packages/core/src/tools/agent/agent.ts | Uses helper for background/foreground agent abort controllers; aborts child in finally for reverse-cleanup. |
| packages/core/src/services/chatRecordingService.ts | Migrates auto-title controller to helper. |
| packages/core/src/services/chatCompressionService.ts | Migrates fallback abort signal creation to helper. |
| packages/core/src/memory/manager.ts | Migrates dream abort controller creation to helper. |
| packages/core/src/hooks/hookRunner.ts | Adds { once: true } to abort listeners to avoid accumulation. |
| packages/core/src/hooks/functionHookRunner.ts | Adds { once: true } to abort listeners to avoid accumulation. |
| packages/core/src/hooks/combinedAbortSignal.ts | Deprecation shim delegating to combineAbortSignals. |
| packages/core/src/followup/speculation.ts | Replaces manual parent abort wiring with createChildAbortController + abort in finally. |
| packages/core/src/core/openaiContentGenerator/pipeline.ts | Removes per-request raiseAbortListenerCap workaround. |
| packages/core/src/core/client.ts | Migrates recall abort controller to helper. |
| packages/core/src/confirmation-bus/message-bus.ts | Adds { once: true } to abort listener. |
| packages/core/src/agents/runtime/agent-interactive.ts | Uses helper for master/round controllers and aborts round in finally for reverse-cleanup. |
| packages/core/src/agents/runtime/agent-headless.ts | Uses createChildAbortController(externalSignal) and aborts in outer finally to guarantee cleanup. |
| packages/core/src/agents/runtime/agent-core.ts | Uses per-round child controllers and aborts them in finally to prevent parent listener buildup. |
| packages/core/src/agents/background-agent-resume.ts | Migrates background-task controllers to helper. |
| packages/core/src/agents/arena/ArenaManager.ts | Migrates arena master controller and links agent controllers to master via createChildAbortController. |
| packages/cli/src/utils/warningHandler.ts | Adds process-level warning handler to suppress AbortSignal MaxListenersExceededWarning in non-debug. |
| packages/cli/src/utils/warningHandler.test.ts | Tests for warning suppression behavior and debug passthrough. |
| packages/cli/src/gemini.tsx | Installs the warning handler during CLI startup. |
| docs/verification/abort-controller-refactor/test-summary.txt | Captured test run summary artifact. |
| docs/verification/abort-controller-refactor/smoke-boot.log | Captured CLI boot smoke output artifact. |
| docs/verification/abort-controller-refactor/README.md | Manual verification plan and scenario matrix. |
| docs/verification/abort-controller-refactor/migration-completeness.txt | Captured migration completeness artifact (grep output). |
| docs/verification/abort-controller-refactor/listener-accumulation-repro.mjs | Standalone reproducer/proof script for listener accumulation vs new helper. |
| docs/verification/abort-controller-refactor/automated-results.md | Captured automated verification outputs and notes. |
Comments suppressed due to low confidence (2)
packages/core/src/utils/abortController.ts:142
- Auto-cleanup isn’t guaranteed if an input aborts during setup: the per-source handler aborts the controller, but the controller’s own
signal.addEventListener('abort', cleanup, ...)is only registered later. If a source abort triggers before that registration,cleanupwon’t run automatically and listeners may linger on other long-lived input signals unless the caller explicitly callscleanup(). One fix is to make cleanup registration resilient (e.g., ifdoneis already true, immediately run new cleanup fns), or to register the controller abort→cleanup hook before adding input listeners and ensure late-added cleanup fns still execute when already aborted.
for (const sourceSignal of signals) {
if (!sourceSignal) continue;
const handler = () => controller.abort(sourceSignal.reason);
sourceSignal.addEventListener('abort', handler, { once: true });
cleanups.push(() => sourceSignal.removeEventListener('abort', handler));
docs/verification/abort-controller-refactor/README.md:26
- The tmux example command hardcodes an absolute path into the repo worktree. Consider switching to
$WT/...(or a generic placeholder path) so the instructions work for any developer without editing and don’t leak local path details.
tmux new-session -d -s qwen-verify-XX
tmux pipe-pane -t qwen-verify-XX -o "cat >> $LOGDIR/XX-name.log"
tmux send-keys -t qwen-verify-XX 'cd /path/to/your/test/workspace && exec node /Users/jinye.djy/Projects/qwen-code/.claude/worktrees/joyful-honking-melody/packages/cli/dist/index.js' C-m
tmux attach -t qwen-verify-XX
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Four issues from the Copilot review: 1. combineAbortSignals — add a per-iteration `aborted` check inside the for-loop so we short-circuit if an input signal flips aborted between the initial scan and listener registration. In single-threaded JS this can't actually interleave, but the defensive check makes correctness obvious and protects against signals whose `aborted` getter has side effects. New test exercises the path via a Proxy that flips after the initial scan. 2. warningHandler docstring — was stale: said "AbortSignal / EventTarget" while the regex was tightened to AbortSignal-only in the previous review. 3. README.md — replace personal absolute path with `$WT` placeholder so the verification recipe is shareable. 4. README.md — replace the markdown table with per-scenario headed sections. Prettier had interpreted an inline `ps -ef | grep sleep` pipe character as a column separator, breaking the table rendering on GitHub. Per-section format is also easier to scan and edit.
… loop check The previous version set the Proxy's `aborted` to true before calling combineAbortSignals, so the initial `find` scan caught it and we took the fast path — not the per-iteration check the test was meant to validate. Switch to an access counter so `aborted` is false on the first read (during `find`) and true on subsequent reads (inside the loop). This forces the loop to enter, then catches the flip via the defensive per-iteration check before any listener is attached to the next input. Verified the test fails if the per-iteration check is removed.
…ng-melody
Two conflict resolutions, both folding the helper into main's new code:
- agent-core.ts: kept main's `randomUUID` import next to our
`createChildAbortController` import.
- client.ts: collapsed main's manual addEventListener + {once:true} +
removeEventListener-in-finally pattern (added in the recall-decoupling
PR) into a single `createChildAbortController(signal)` call. The
finally now triggers reverse cleanup via `controller.abort()` on
success, achieving the same listener-hygiene goal main wanted. Kept
main's `cancelPendingMemoryPrefetch()` short-circuit at the top.
…ortController repro passes lint CI Lint flagged 11 no-undef errors in docs/verification/abort-controller-refactor/listener-accumulation-repro.mjs (AbortController, console, process) because the project's flat config only declared Node globals for ./scripts/**/*.mjs. The reviewer's suggestion (`/* eslint-env node */`) doesn't work under ESLint 9 flat config — env directives are deprecated there. The proper fix is to extend the existing script-globals block to also cover the verification repro script under docs/.
Two real bugs the reviewer caught and I confirmed locally:
1. warningHandler.ts didn't actually suppress anything. Adding a
`process.on('warning')` listener does NOT prevent Node's default
onWarning printer from writing to stderr — the default is just an
ordinary listener registered in `lib/internal/process/warning.js`.
My previous code therefore:
- failed to suppress targeted AbortSignal warnings (they still hit
stderr via the default printer)
- produced a SECOND copy of every non-suppressed warning (default
printer + my handler's own stderr.write)
The unit tests missed it because they synthesised a fake warning and
called `process.listeners('warning')` directly rather than going
through `process.emitWarning`.
Fix: snapshot the existing `'warning'` listeners (which include the
default printer and any third-party telemetry hooks) BEFORE replacing
them. Install ours as the sole listener. For non-suppressed warnings
fan out to the captured set so the default printer + telemetry still
fire; for suppressed warnings stop here. Tests now use
`process.emit('warning', ...)` to drive the real listener chain, plus
a spawned-child integration test that asserts the real stderr from
`process.emitWarning` is empty for AbortSignal warnings and still
contains DeprecationWarning text.
2. abortController.createChildAbortController kept a WeakRef to the
child controller. A natural usage pattern — pass `child.signal` into
an async API and drop the controller object — could let the
controller be GC'd while the signal is still in use, after which
`parent.abort()` would no longer propagate. Reproduced with
`node --expose-gc`.
Fix: hold the child strongly via the parent's listener closure. The
reverse-cleanup listener still removes the closure when child aborts
(closure releases child → GC-eligible), and the parent's `{once:true}`
listener self-removes when parent fires (same effect). Net listener
accounting on long-lived parents is unchanged; the only difference is
the controller now stays alive long enough for propagation to reach
downstream consumers that hold only the signal. Tests updated: drop
the old `--expose-gc`-dependent assertion that abandoned children
GC immediately (that was a property of the OLD contract); add a
signal-only-retention test that verifies propagation under the new
contract without needing GC at all.
Verified: 32 helper/warning tests pass (incl. spawned-child stderr
integration); 363 affected caller tests pass; typecheck + prettier +
eslint clean for the touched files.
wenshao
left a comment
There was a problem hiding this comment.
Additional notes (files not in PR diff):
packages/core/src/hooks/promptHookRunner.ts:233— still usesnew AbortController()+ manualaddEventListener('abort', ...)without{once:true}or reverse cleanup. This is the same anti-pattern the PR eliminates at 24 other sites. Migration completeness grep (new AbortController) misses it because the file was untouched.packages/core/src/agents/runtime/agent-headless.test.ts:1121— the only test covering the "model call throws" error path isit.skip(...). The PR'stry/finallycleanup guarantee has no active automated test.
CI note: Test (ubuntu-latest) and Test (windows-latest) are failing; macOS and local runs pass. Likely pre-existing or flaky — not correlated with PR changes.
— qwen-latest-series-invite-beta-v34 via Qwen Code /review
…s orphan listeners + runtime DEBUG toggle Two real bugs the reviewer caught: 1. combineAbortSignals registered its cleanup listener on controller.signal AFTER the for-loop. Node does NOT fire 'abort' listeners added to an already-aborted signal, so when the per-iteration defensive check aborted the controller mid-loop, the cleanup never ran — orphaning every input-signal listener registered before the break, and leaving the (also-registered-after-the-break) setTimeout uncleared. Fix: skip timeout scheduling when controller.signal.aborted is already true post-loop, and when it's true call cleanup() synchronously instead of registering a doomed listener. Existing test for the mid-iteration path now also asserts that the pre-break input signal (a) has zero abort listeners — that's the assertion that catches the orphan bug. New test for the already-aborted-input + timeoutMs combination confirms the timer isn't scheduled (would otherwise overwrite the abort reason). 2. warningHandler captured isDebugMode() in a closure at init time, so toggling DEBUG / QWEN_DEBUG at runtime (e.g. via a /debug slash command) didn't update suppression behavior. Moved the check inside the handler — warnings are rare so the per-emit env-lookup cost is negligible. New test asserts a mid-stream DEBUG=1 flip starts forwarding suppressed warnings to the prior-listener chain.
…to actually exercise the new !aborted check
Reviewer correctly pointed out that the previous version of this test
took the pre-loop fast path (since `a.abort('pre')` ran before
`combineAbortSignals`), so it never reached the in-loop guard at
abortController.ts:138.
Switched to the Proxy `aborted`-getter pattern from the sibling
mid-iteration test (so the loop genuinely re-checks `aborted` and
short-circuits inside the for-loop), and added a `setTimeout` spy that
asserts the timer was never scheduled — this is the only observable
difference from "scheduled then immediately cleared by synchronous
cleanup()", which is what the timer-advance assertion alone couldn't
distinguish.
Verified by mutation testing: removing the guard makes the new test
fail; restoring it makes it pass. Refs PR QwenLM#4366.
… in combineAbortSignals Reviewer noted the timeout path only had an empty-input test, leaving the leak-sensitive case uncovered: when timeoutMs fires with a long-lived source signal in the input list, do the input-side listeners get released? They do (the timeout callback aborts the combined controller, which fires the auto-cleanup listener registered on its signal, which calls the per-input removeEventListener), but that path wasn't tested. Adds a test that snapshots the source listener count before, asserts it increased by 1 after combineAbortSignals returns, advances fake timers past timeoutMs, and asserts the count returns to baseline. Refs PR QwenLM#4366.
yiliang114
left a comment
There was a problem hiding this comment.
LGTM. Core mechanism is solid — the createChildAbortController reverse-cleanup pattern correctly prevents listener accumulation on long-lived parents, and all critical bugs surfaced during review (WeakRef GC issue, warningHandler not actually suppressing, combineAbortSignals orphan listeners) have been addressed.
The one minor gap (promptHookRunner.ts:233 still using raw new AbortController) is not a real leak — the signal there is per-round and short-lived — but could be cleaned up in a follow-up for migration consistency.
…ndows
CI failure on windows-latest:
AssertionError: expected '\r\nnode:internal/modules/run_main:12…'
to match /DeprecationWarning.*Plain deprecation/
Error [ERR_UNSUPPORTED_ESM_URL_SCHEME]: Only URLs with a scheme in:
file, data, and node are supported by the default ESM loader. On
Windows, absolute paths must be valid file:// URLs. Received protocol 'd:'
The e2e test wrote a child script with an `import "<helperPath>"` where
helperPath was a raw Windows absolute path (`D:\a\qwen-code\...`). Node's
ESM loader parses that as a URL on Windows and rejects the `D:` "scheme".
Converted the helper path to a `file://` URL via `pathToFileURL`. macOS
test still passes; the Windows-specific schemes-must-be-URL behavior is
now honored. Refs PR QwenLM#4366.
wenshao
left a comment
There was a problem hiding this comment.
Both Round 4 suggestions addressed: timeout-triggered listener cleanup test added for combineAbortSignals, and Windows E2E test fixed via pathToFileURL. All CI green (including Windows). LGTM! ✅ — qwen-latest-series-invite-beta-v34 via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
Additional findings (not inline):
-
[Critical]
packages/core/src/hooks/promptHookRunner.ts:233— Missed migration site. Usesnew AbortController()with anonymoussignal.addEventListener('abort', () => { ... })— no{once: true}, noremoveEventListener, no reverse cleanup. Each hook invocation leaks one listener on the parent signal. This is the exact pattern the PR eliminates everywhere else. SimilarlygoalHook.ts:70andgoalHook.ts:169remain un-migrated. -
[Suggestion]
ArenaManager.ts:821— Per-agent child controllers are aborted in cancel/timeout paths but NOT on normal agent exit (handleAgentExit). Reverse-cleanup is deferred until session teardown. (Skipped inline due to existing comment at this line.) -
[Suggestion]
eslint.config.js— Nono-restricted-syntaxrule preventsnew AbortController()in new code. The migration convention will erode without a lint guard.
— qwen-latest-series-invite-beta-v34 via Qwen Code /review
…grate missed sites, tighten tests
Adopted 6 of the 7 review threads (skipping the debug-logging suggestion).
1. processFunctionCalls onAbort leak (CRITICAL): the new
`finally { roundAbortController.abort(); }` in _runReasoningLoopInner
would fire the `onAbort` handler in `processFunctionCalls` if
scheduler.schedule or batchDone threw (the explicit
removeEventListener at the old happy-path exit would be skipped),
emitting spurious "Tool call cancelled by user abort." TOOL_RESULT
events for every un-emitted callId — corrupting the transcript and
misleading the model on the next round. Fixed by wrapping schedule
+ batchDone in their own try/finally so removeEventListener always
runs before the outer finally's abort.
2. Migrate 3 new-from-main `new AbortController()` sites that this
PR's audit missed (they came in via the merge from main):
- goals/goalHook.ts (2 sites: judgeController, fallback signal) —
consistency
- hooks/promptHookRunner.ts (1 site: internalAbortController) —
real leak (manual addEventListener without {once:true} or
cleanup, exactly the pattern this PR exists to fix). Switched to
createChildAbortController + finally `internalAbortController.abort()`
for reverse cleanup on the success path.
3. Repro script (`listener-accumulation-repro.mjs`): inlined helper
diverged from production — used WeakRef on child, while production
was changed to strong-ref earlier in this PR. Updated the inlined
copy to match production exactly, with a comment noting the
intentional WeakRef-on-parent-only pattern.
4. warningHandler.ts: documented the snapshot-and-replace trade-offs
in the JSDoc (late-added listeners bypass our filter; late
`removeListener` calls have no effect on our fan-out). Tried the
re-snapshot-per-warning approach the reviewer suggested but it
doesn't work — `removeAllListeners('warning')` permanently removes
the snapshot from Node's tracking, so a `process.listeners('warning')`
filter at fan-out time always returns empty for prior listeners.
The current design is the right trade-off; documentation is the
correct fix.
5. abortController.test.ts: added three coverage gaps the reviewer
identified —
- createChildAbortController forwards custom maxListeners
- manual cleanup() before scheduled timeout fires cancels it
- timeoutMs <= 0 is treated as "no timeout"
6. Migrated `httpHookRunner.ts:202` (the lone caller of the deprecated
`createCombinedAbortSignal`) to `combineAbortSignals` directly,
then deleted `combinedAbortSignal.ts` + its test. All semantics
covered by `combineAbortSignals` tests in abortController.test.ts.
Refreshed `migration-completeness.txt` (now empty — clean grep).
Tests: 194 pass across abortController/warningHandler/agent-runtime/
followup/hooks/goal/promptHook suites. Typecheck + prettier clean.
…y the PR body The PR body's "End-to-end scenarios I drove locally" section points at docs/verification/abort-controller-refactor/scripts/02-lite.sh and 06-headless-sigint.sh. These are the actual reproducible commands behind the EXIT codes / warning counts reported there — checking them in so anyone can replay without copy-pasting from the PR description. Refs PR QwenLM#4366.
Two doc fixes the reviewer flagged: - migration-completeness.txt was a 0-byte file with a confusing cross-reference. Populated with the actual grep command + its "(no output)" result so the empty-output state is explicit. - automated-results.md still referenced combinedAbortSignal.test.ts (8 tests, @deprecated shim) — both files were deleted in 94e8c58 when httpHookRunner.ts migrated to combineAbortSignals directly. Replaced the line with a reference to httpHookRunner.test.ts. Also updated the test counts to reflect current state (26 abortController, 13 warningHandler — both grew with the review cycle) and removed the stale combinedAbortSignal.ts entry from the prettier-check command. Refs PR QwenLM#4366.
wenshao
left a comment
There was a problem hiding this comment.
Reviewed after the substantial multi-round refactor history. The core mechanism (reverse-cleanup, strong-ref-on-parent, {once:true} migration) is sound and all Critical concerns from prior rounds were addressed. Three remaining suggestions are inline; one minor doc nit follows.
Minor — docs/verification/abort-controller-refactor/migration-completeness.txt: committed as 0 bytes, while automated-results.md item 2 says "See migration-completeness.txt for the captured grep." Either capture the actual grep output or delete the empty file and drop the reference.
Needs human review (low confidence — not posted inline):
packages/core/src/hooks/promptHookRunner.ts:317-319— newinternalAbortController.abort()in finally fixes a real leak (the success-path migration the author already flagged elsewhere on this PR), butpromptHookRunner.test.tshas no listener-count assertion to lock the fix in.packages/core/src/hooks/httpHookRunner.ts:220,255— manualcleanup()on both success and catch branches works correctly today (cleanup is idempotent and auto-fires on abort), but it's exactly the manual-cleanup pattern the rest of this PR replaced withtry { ... } finally { cleanup(); }.packages/core/src/agents/runtime/agent-headless.test.ts:1121has a pre-existingit.skip(...)for the model-throw case — the new outertry/finallyinagent-headless.tsfires only on that path, so the new throw-cleanup behavior remains uncovered by CI.
— claude-opus-4-7 via Claude Code /qreview
Adopting 2 of 3 new review threads (the third — automated-results.md drift — was already fixed in 5aa7110). 1. packages/core/src/agents/arena/ArenaManager.test.ts: pin the master→agent abort cascade introduced by switching per-agent controllers to `createChildAbortController(this.masterAbortController)`. New test spawns ≥2 agents, calls `manager.cancel()`, and asserts every `agentState.abortController.signal.aborted === true`. Existing cancel test only checked backend + status; if a future refactor re-introduced independent controllers, the cascade would silently regress. 2. packages/core/src/followup/speculation.test.ts: cover the `startSpeculation` abort wiring introduced when the manual addEventListener + .finally removeEventListener pattern got replaced by createChildAbortController + .finally abort(). Three tests: - parent abort propagates to state.abortController (lifetime contract) - parent-already-aborted fast path returns aborted state - parent-signal listener count returns to baseline after the fire-and- forget loop settles (reverse-cleanup proof) Mocked `runWithForkedChatModel` and `OverlayFs` so the background loop is a no-op — these tests only assert the synchronous wiring, not the loop's content.
wenshao
left a comment
There was a problem hiding this comment.
No review findings at this commit. Downgraded from Approve to Comment: CI failing (Test on windows-latest, ubuntu-latest, macos-latest Node 22.x). Code review passes — core mechanism (createChildAbortController reverse-cleanup) is correct, all 24+ migration sites have proper try/finally cleanup, 26 unit tests + 13 warning handler tests all pass locally. — qwen-latest-series-invite-beta-v34 via Qwen Code /review
Two real CI blockers in the just-added speculation tests (TS2554 and
TS2339) plus stale doc counts the reviewer flagged.
1. saveCacheSafeParams takes 3 positional args (generationConfig,
history, model), not a single object. Compile error on every
platform. Fixed by switching to the correct shape; also moved
getEventListeners to a static `import` at the top of the file
(dynamic `await import('node:events')` exposes EventEmitter's
static method via the namespace type rather than as a direct
property, so destructuring fails type-check).
2. docs/verification/abort-controller-refactor/README.md still claimed
"18 + 1 GC" tests for abortController and "9" for warningHandler;
actual current counts are 26 and 13. Also dropped the stale
combinedAbortSignal reference and added a note about the new
ArenaManager cascade + startSpeculation wiring pin tests.
Refreshed smoke-boot.log against current built bin (still 0.15.11,
which is what package.json reports on this branch).
Refs PR QwenLM#4366.
wenshao
left a comment
There was a problem hiding this comment.
No review findings at e31432f. Downgraded from Approve to Comment: CI still running (15 checks pending). Both Round 9 Critical tsc items (saveCacheSafeParams arity + getEventListeners import) are correctly fixed; speculation.test.ts passes 10/10. — qwen-latest-series-invite-beta-v34 via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — gpt-5.5 via Qwen Code /review
PR #4366 — Maintainer merge-validation reportDate: 2026-05-22 VerdictPASS — safe to merge. Every PR-author claim that could be reproduced locally was reproduced. What was validated and how1. Direct listener-accumulation reproducer — PASSThe OLD pattern reaches 2000 listeners on one parent in 2000 rounds (well past the 2. Focused unit suites — PASS (84 passed | 1 skipped)The two newest pinning tests added in 3. Build + boot smoke — PASSNo warning lines emitted during boot under 4. TypeScript strict-mode typecheck — PASS5. Headless scripted scenarios — PASSBoth scripts shipped under Log file: 6. Real interactive long-session via tmux — PASS ← the bug's actual reproduction surfaceDriven through tmux session Round breakdown:
This is the headline result. The original bug was reported as ESC behavior: the streaming essay (Grace Hopper / FORTRAN / Backus, partially 7. Migration completeness — PASSEvery production Scenarios not exercised hereThese were called out in the PR body and remain not reproduced live; the reasoning
Risks / things to keep an eye on post-merge
Evidence artifacts (all in
|
| File | Purpose |
|---|---|
listener-accumulation-repro.mjs |
Side-by-side OLD vs NEW listener count proof |
logs/02-lite-short-prompt.log |
Headless single-prompt + --trace-warnings output |
logs/06-headless-sigint.log |
Headless SIGINT-mid-stream output |
logs/interactive-long-session.log |
Headline: 50 rounds + ESC cancel, full tmux capture |
automated-results.md |
Pre-existing automated-test summary (still accurate) |
MERGE_VALIDATION.md |
This report |
|
Thanks for the fix — the core approach ( However, I think the scope here is significantly larger than it needs to be. A few observations: 1. Not all The actual leak only happens in the nested parent→child chain ( For reference, upstream Claude Code has the same 2. The blast radius makes review difficult 39 files changed, +1851/−621 for what is fundamentally a listener cleanup fix. This makes it hard to review confidently and increases the risk of unrelated regressions. Suggestion: Consider scoping this down to:
The remaining 20+ sites that are independent controllers don't benefit from the migration and just add churn. A smaller, focused PR would be much easier to review and safer to merge. |
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅
This is a well-engineered fix for the AbortSignal listener leak. The three core primitives (createAbortController, createChildAbortController, combineAbortSignals) are correctly designed with sound lifetime invariants (reverse cleanup, {once: true} self-removal, WeakRef parent pinning). The migration across ~20 call sites in packages/core is consistent, and the warningHandler correctly suppresses MaxListenersExceededWarning for AbortSignal while preserving other warnings.
Deterministic checks: tsc clean (0 errors), eslint clean (0 errors), all tests pass (49 passed, 1 skipped).
CI: 18/18 checks passing.
— qwen-latest-series-invite-beta-v34 via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
No issues found. All findings are Suggestion-level only (warningHandler idempotency edge case, pipeline raiseAbortListenerCap enforcement gap, shell.ts AbortSignal.any() cleanup, test coverage gaps in throw paths). tsc/eslint clean, build passes, test failures pre-existing (unrelated). LGTM! ✅ — DeepSeek/deepseek-v4-pro via Qwen Code /review
…vert independent-controller migrations Adopting @yiliang114's review feedback (QwenLM#4366 review comment, 2026-05-22): keep only the migrations that fix the real leak path (the agent-runtime parent→child chain that accumulates listeners on a long-lived parent signal in long sessions) and revert the consistency-only migrations on independent short-lived controllers. Issue QwenLM#4423 confirms the user-visible bug is the nested-chain accumulation — the reverted sites do not contribute to that bug. Migrations KEPT: - agents/runtime/agent-interactive.ts (master + per-message round) - agents/runtime/agent-core.ts (per-iteration + wait + processFunctionCalls) - agents/runtime/agent-headless.ts (external → execution) - hooks/promptHookRunner.ts (real cleanup leak: addEventListener without {once:true}, never removed) - hooks/httpHookRunner.ts → combineAbortSignals direct (shim deleted) - hookRunner.ts / functionHookRunner.ts / message-bus.ts: {once:true} only - openaiContentGenerator/pipeline.ts band-aid removal (per-request signals are children of the per-round controller, which carries maxListeners=50) - warningHandler.ts belt-and-suspenders Migrations REVERTED (independent short-lived controllers; restored to `new AbortController()` + their original cleanup patterns): - agents/arena/ArenaManager.ts (master + per-agent) - agents/background-agent-resume.ts (3 sites) - core/client.ts (recall — restored manual addEventListener + finally removeEventListener pattern from main) - followup/speculation.ts (restored parentAbortHandler + finally removeEventListener) - goals/goalHook.ts (judgeController + fallback signal) - memory/manager.ts (dream controller) - services/chatCompressionService.ts (fallback signal) - services/chatRecordingService.ts (autoTitle controller) - tools/agent/agent.ts (fg + bg subagent controllers — restored manual onParentAbort + finally removeEventListener) - tools/monitor.ts (entryAc) - tools/shell.ts (promote + 3 entryAc) - utils/fetch.ts (fetchWithTimeout) Tests removed alongside the reverts: - ArenaManager.test.ts "cancels cascades..." — the cascade itself was an intentional behavioral improvement that's now reverted, so the pin-test belongs with it - speculation.test.ts "startSpeculation — abort-controller wiring" block (3 tests) — they tested helper-wired behavior we reverted Verification docs updated to reflect the narrower scope. Net change: 19 raw `new AbortController()` remain (intentional, per migration-completeness.txt rationale); previously was 0. Refs PR QwenLM#4366, issue QwenLM#4423.
|
Thanks for the careful review — adopted in 9323bc3. Reverted the 13 independent-controller migrations and kept only the parent→child sites that actually fix the user-visible bug (the agent-runtime nested chain plus Scope before: 39 files / +1851 −621, helper used in ~26 sites Mapping to your suggested scope:
Plus two things I'm keeping because they're real fixes, not consistency:
Reverted sites that the issue #4423 leak does NOT come from ( Also dropped two pin-tests that specifically tested the reverted behavior ( Verification doc + PR body updated to reflect the narrower scope. 478 affected tests still pass; typecheck + prettier clean. |
本地验证报告(maintainer review)把 PR head 直接 结论✅ 建议 MERGE。PR 自带的全部 unit + reproducer + headless E2E 在本机全部复现通过;no warnings;no PR-introduced regression。 验证矩阵
§1. core 18 failures 归因(均非 PR 引入)直接证据 —— 同 3 个失败文件在两侧独立运行结果完全相同:
PR 修复机制 — 关键技术点 review
风险与遗留
验证环境:macOS Darwin 25.4.0,Node v22.17.0,npm 11.8.0。tmux 多 worktree 并行( |
LaZzyMan
left a comment
There was a problem hiding this comment.
Review
Solid fix for the MaxListenersExceededWarning users hit in long sessions. The new helper bounds listener accumulation on long-lived parent signals via three mechanisms that together cover every termination path: {once:true} on the parent listener for the parent-aborts-first path, a reverse-cleanup listener on the child for the child-aborts-first path, and a WeakRef for the parent reference so a parent that gets GC'd before its child doesn't leave the child holding a strong reference. Already-aborted parents take a synchronous fast-path that skips listener registration entirely.
combineAbortSignals cleanup semantics also hold up: every registered input listener has a paired entry in the cleanup array, external abort and timeout both route through the same auto-cleanup, and the post-loop synchronous-cleanup fallback handles the mid-loop-break case where Node won't fire listeners on an already-aborted signal — exactly the orphan-listener bug addressed in earlier review rounds. The agent runtime refactors (interactive, core, headless) all use try/finally to guarantee the child controller is aborted on every exit path, so the reverse-cleanup fires regardless of normal completion, break, return, or exception. The processFunctionCalls onAbort removal in its finally is essential — without it, a throw between scheduling and completion would leak the listener and the outer round-controller abort would emit spurious cancellation results for un-emitted callIds. The promptHookRunner cleanup leak (manual addEventListener without {once:true} and no removal) is a real bug the migration closes.
Reproducer (OLD: 2000 listeners → NEW: 0), 25 helper unit tests, and 13 warning-handler tests all pass. Codex's independent review found no actionable regressions either.
Verdict
APPROVE — helper correctness and the migration's invariant (every created child controller is either aborted or its parent listener detached, on every code path) both hold up under inspection.
Summary
Fixes
MaxListenersExceededWarning: 1509 abort listeners added to [AbortSignal]that users hit in long interactive sessions.The agent runtime nests parent→child AbortControllers (
masterAbortController→ per-message round → per-API-call round → tool execution). Each layer registeredaddEventListener('abort', ...)on its parent without{once:true}or reverse cleanup, so listeners piled up on long-lived parents across hundreds of model turns. After ~30–40 rounds it tripped Node's leak warning.Two-layer fix:
Structural — new
packages/core/src/utils/abortController.tshelper:createAbortController(maxListeners = 50)— factory that pre-caps the signal.createChildAbortController(parent)— WeakRef-based propagation with{once:true}on the parent listener plus a reverse-cleanup listener on the child that detaches the parent listener when the child aborts. This is the core mechanism — short-lived children stop accumulating dead listeners on long-lived parents.combineAbortSignals(signals, {timeoutMs})— N-way combiner.Belt-and-suspenders —
packages/cli/src/utils/warningHandler.tshides any remainingMaxListenersExceededWarning.*AbortSignalfrom end users; debug mode (DEBUG=*/QWEN_DEBUG=*/NODE_ENV=development) keeps it visible.Scope (deliberately narrow per @yiliang114's review): only the parent→child chain that actually accumulates listeners on a long-lived parent signal is migrated. Independent short-lived controllers (per-shell-command, per-fetch, per-recall, per-arena-session, per-monitor, per-spawn, etc.) stay on raw
new AbortController()— they're GC'd at end of use and do not accumulate.Migrated call sites:
agents/runtime/agent-interactive.ts(master + per-message round)agents/runtime/agent-core.ts(per-iteration round + waitForExternalInputs + processFunctionCalls try/finally)agents/runtime/agent-headless.ts(external → execution)hooks/promptHookRunner.ts(had a real cleanup leak: manualaddEventListenerwithout{once:true}and never removed)hooks/httpHookRunner.ts(migrated from the deprecatedcreateCombinedAbortSignalshim tocombineAbortSignalsdirectly; shim deleted)Plus three
{once:true}-only fixes:hooks/hookRunner.ts/hooks/functionHookRunner.ts/confirmation-bus/message-bus.tsPlus
openaiContentGenerator/pipeline.tsband-aid removal (per-request OpenAI signals are now children of the per-round controller, which carriesmaxListeners=50).Direct proof
Self-contained reproducer at
docs/verification/abort-controller-refactor/listener-accumulation-repro.mjs:Test plan
Automated (passing locally and on CI)
node docs/verification/abort-controller-refactor/listener-accumulation-repro.mjs— direct OLD-vs-NEW comparisonvitest run packages/core/src/utils/abortController.test.ts— 26 tests (factory cap, child propagation incl. signal-only retention, reverse cleanup, fast path, undefined parent, custom maxListeners,combineAbortSignalssemantics incl. cleanup-cancels-timeout, timeout-cleans-input-listeners,timeoutMs <= 0boundary, mid-iteration defensive check, GC safety best-effort)vitest run packages/cli/src/utils/warningHandler.test.ts— 13 tests including a spawned-child integration test that asserts real Node default-printer stderr is empty for AbortSignal warnings and still contains unrelated DeprecationWarning textvitest run packages/core/src/hooks/httpHookRunner.test.ts— covers the migratedcombineAbortSignalsconsumer (the deprecatedcreateCombinedAbortSignalshim and its test file were removed once the lone caller migrated)grep -rn "new AbortController" packages/core/src --include="*.ts" | grep -v test | grep -v abortController.tsreturns the 19 intentionally-unmigrated independent controllers (seedocs/verification/abort-controller-refactor/migration-completeness.txtfor the captured list + rationale)npm run build:packagessucceeds;node packages/cli/dist/index.js --versionboots clean underNODE_OPTIONS=--trace-warningsEnd-to-end scenarios I drove locally
Scripts live under
docs/verification/abort-controller-refactor/scripts/. Each captures its run todocs/verification/abort-controller-refactor/logs/.Boot under
--trace-warnings— proves no warnings on startup.Expected: version line, no
(node) MaxListenersExceededWarning. Observed: clean.02-lite — single real-Qwen prompt under
--trace-warnings— proves the steady-state happy path emits no warning.Input:
--prompt "Reply with exactly 'OK' and nothing else."Expected: exit 0, response
OK, zeroMaxListenersExceededWarningmentions in stderr.Observed: matches expected.
06 — headless
--prompt+ SIGINT — proves SIGINT propagates cleanly through the abort plumbing all the way down.Input: long-essay prompt +
kill -INT <pid>after 6 s.Expected: exit code 130 (128 + SIGINT), no warnings emitted during the interrupted stream.
Observed: matches expected.
Direct listener-accumulation repro — head-to-head proof of the structural fix.
Input: 2000 short-lived child controllers on one long-lived parent.
Expected: OLD ≥ 1500 listeners (reproduces the user-reported bug), NEW = 0.
Observed: matches expected.
Scenarios that still need a human at the keyboard
These exercise interactive-TUI code paths and were not self-verifiable here. The behavior each one targets is covered by unit tests (
agent-interactive.test.tscancellation suite for ESC mid-stream;shell.test.tsfor shell-tool abort;agent.test.tsfor subagent cancel; the helper's own 26 tests for listener accounting). Listed for completeness so a reviewer or release manager can replay them:main— confirm warning triggers at ~30–40 rounds (needs a long mixed-tool session against the real API)MaxListenersExceededWarningafter 50+ roundsAbortSignalinstance count + per-signal listener count stableReview notes
agent-core.ts/agent-headless.ts, overly-broad warning regex, andprocess.removeAllListeners('warning')stripping third-party listeners) — all addressed in the same commit. Second pass confirmed no remaining blockers.abortController.test.tsdirectly verify the listener-count invariant: 1000 short-lived children on one long-lived parent leave 0 listeners.🤖 Generated with Qwen Code