fix(code): drain cloud queue for idle resumed runs stuck disconnected#2217
Conversation
|
f1fbb7d to
d307750
Compare
joshsny
left a comment
There was a problem hiding this comment.
i think some of this logic might be getting unweildy and we might need to come back and simplify it afterwards, but sounds like this will improve things at least for now, great job finding the issue
|
@joshsny agree completely, it's becoming a bit of a 🍝 , we need some time to have proper ACP cleanup of all this |
Problem
when a cloud task is resumed from a snapshot and then goes idle, an SSE transport drop (or the watcher retry it triggers) flips the session to
disconnectedeven though the run is still alive (in_progress) on the server. A user message sent in that state is queued becausestatus !== "connected", and then nothing ever drains itso basically the message stays queued forever
Changes
agentReadyForRunIdon the session, set when_posthog/run_started/_posthog/turn_completeis observed for the current run (survives hydrate-from-logs)run_startedfor its run id, so recovery does not firetryRecoverIdleCloudQueuerecovery pathhandling #2159