Skip to content

ISSUE-007 - Stop button for in-flight runs (cooperative cancellation) #25

@andrewmusselman

Description

@andrewmusselman

Summary

Long-running agents sometimes need to be interrupted. The earlier "Cancel" button only aborted the client-side fetch; the server-side task kept running until completion. The only way to actually stop a run today is to restart the api container, which kills every other in-flight run too.

Details

Problem:

  • No surgical kill exists for a specific run.
  • Server-side asyncio.Task continues after browser disconnects.
  • task.cancel() is too aggressive — raises CancelledError mid-await inside LLM HTTP calls, leaving httpx connections in unclean states.

Proposed solution

Cooperative cancellation, layered:

  1. CancelToken via contextvar (same pattern as Trace). The agent runtime checks should_stop() between operations.
  2. Enforcement at structural boundaries. Wrap tools.call_llm, tools.data_store.*, and gofannon_client.call with an entry check — if stopping, raise AgentStopped immediately without executing. In-flight LLM calls finish naturally; only the next attempt to do anything observable raises.
    UI:
  • Stop button next to Run; disabled when no run in flight.
  • While stopping (after click, before halt): button shows "Stopping… (after current LLM call completes)" disabled.
  • Run's outcome becomes a third status stopped — neutral chip color in the Progress Log (gray with a stop icon, not red).
    Stop semantics for chained agents:
    When agent X is stopping and X has called Y, Y stops too. Stop means the whole tree. Contextvar makes this trivial.

Acceptance Criteria

  • Fixed: CancelToken contextvar threaded through agent execution
  • Fixed: tools.call_llm, tools.data_store.*, gofannon_client.call check should_stop() on entry
  • Fixed: POST /runs/{run_id}/stop sets cancel token, responds 202
  • Fixed: UI Stop button next to Run, disabled appropriately
  • Fixed: "Stopping…" state shown after click until run actually halts
  • Fixed: RunRecord.status = "stopped" distinguishable from error
  • Test added: Stop during LLM call halts at next tool boundary (not mid-await)
  • Test added: Chained sub-agent stops when parent receives stop
  • Test added: Cleanup runs (e.g., http_client.aclose() in finally) execute before exit

References

  • File: webapp/packages/api/user-service/services/agent_trace.py (Trace contextvar pattern)
  • File: webapp/packages/api/user-service/dependencies.py:_execute_agent_code
  • File: webapp/packages/webui/src/pages/AgentCreationFlow/RunsScreen.jsx
  • Tracker: FIXES.md item Q2 roadmap #6

Priority

Medium - Depends on ISSUE-003 (run registry) for the cancel token to live somewhere addressable by run_id.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions