Skip to content

Support for Graceful Cancellation and Resource Cleanup via aclose()/cancel() on CrewStreamingOutput Streaming Objects #5312

@llaoj

Description

@llaoj

Feature Area

Performance optimization

Is your feature request related to a an existing bug? Please link it here.

none

Describe the solution you'd like

Hi crewAI Team,

Background:
When using async streaming execution via Crew.akickoff (with stream=True), the returned object is a CrewStreamingOutput containing an async iterator over stream chunks. In web-serving scenarios (e.g., with FastAPI + StreamingResponse), if the HTTP client disconnects, Python's async generator protocol triggers cancellation of the surrounding coroutine/async generator.

Problem / Current Behavior:
Currently, there is:

  • No aclose()/cancel() or equivalent method on CrewStreamingOutput or the underlying async iterator returned by akickoff,
  • No hook or mechanism that allows application code (or a FastAPI middleware, etc.) to signal that downstream/unfinished LLM sub-tasks, streams, or resource consumers should be immediately canceled/cleaned up,
  • As a result, if the HTTP client disconnects, agent/crew processing continues in the background until all tasks naturally complete, potentially wasting tokens, compute, and tying up resources.

Why this matters:

  • In real world API and web applications, client-initiated disconnects are common.
  • Automatic and/or explicit cleanup (aclose, cancel, etc.) is critical for efficient resource management in LLM inference; otherwise, it could lead to resource exhaustion, quota waste, or stuck threads.
  • Most streaming async frameworks (aiostream, some OpenAI python libs, etc.) support aclose for this exact reason.

Request:

  • Please support an async cancellation/cleanup protocol for streaming:
    • Implement aclose() (and/or cancel()) methods on CrewStreamingOutput and all objects returned by the streaming kickoff pipeline (create_async_chunk_generator, etc).
    • Make sure calling aclose() (or a similar method) aggressively cancels/aborts ALL in-flight agent/LLM sub-tasks, and releases any allocations, so that compute/GPU is promptly freed and no work is done after cancellation.
    • If cancellation is not possible (e.g. due to LLM provider constraints), at least ensure that resources will be freed at the earliest possible opportunity.

Example Usage:
In frameworks like FastAPI:

streaming = await crew.akickoff(inputs=inputs)
try:
    async for chunk in streaming:
        ...
finally:
    # This should exist:
    await streaming.aclose()  # Should cancel agents, tasks, LLM calls, etc.

Thank you for your work. This feature would make crewAI much more robust and production-friendly for real-world streaming applications!

Describe alternatives you've considered

No response

Additional context

No response

Willingness to Contribute

Yes, I'd be happy to submit a pull request

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions