Feature Area
Performance optimization
Is your feature request related to a an existing bug? Please link it here.
none
Describe the solution you'd like
Hi crewAI Team,
Background:
When using async streaming execution via Crew.akickoff (with stream=True), the returned object is a CrewStreamingOutput containing an async iterator over stream chunks. In web-serving scenarios (e.g., with FastAPI + StreamingResponse), if the HTTP client disconnects, Python's async generator protocol triggers cancellation of the surrounding coroutine/async generator.
Problem / Current Behavior:
Currently, there is:
- No
aclose()/cancel() or equivalent method on CrewStreamingOutput or the underlying async iterator returned by akickoff,
- No hook or mechanism that allows application code (or a FastAPI middleware, etc.) to signal that downstream/unfinished LLM sub-tasks, streams, or resource consumers should be immediately canceled/cleaned up,
- As a result, if the HTTP client disconnects, agent/crew processing continues in the background until all tasks naturally complete, potentially wasting tokens, compute, and tying up resources.
Why this matters:
- In real world API and web applications, client-initiated disconnects are common.
- Automatic and/or explicit cleanup (
aclose, cancel, etc.) is critical for efficient resource management in LLM inference; otherwise, it could lead to resource exhaustion, quota waste, or stuck threads.
- Most streaming async frameworks (aiostream, some OpenAI python libs, etc.) support
aclose for this exact reason.
Request:
- Please support an async cancellation/cleanup protocol for streaming:
- Implement
aclose() (and/or cancel()) methods on CrewStreamingOutput and all objects returned by the streaming kickoff pipeline (create_async_chunk_generator, etc).
- Make sure calling
aclose() (or a similar method) aggressively cancels/aborts ALL in-flight agent/LLM sub-tasks, and releases any allocations, so that compute/GPU is promptly freed and no work is done after cancellation.
- If cancellation is not possible (e.g. due to LLM provider constraints), at least ensure that resources will be freed at the earliest possible opportunity.
Example Usage:
In frameworks like FastAPI:
streaming = await crew.akickoff(inputs=inputs)
try:
async for chunk in streaming:
...
finally:
# This should exist:
await streaming.aclose() # Should cancel agents, tasks, LLM calls, etc.
Thank you for your work. This feature would make crewAI much more robust and production-friendly for real-world streaming applications!
Describe alternatives you've considered
No response
Additional context
No response
Willingness to Contribute
Yes, I'd be happy to submit a pull request
Feature Area
Performance optimization
Is your feature request related to a an existing bug? Please link it here.
none
Describe the solution you'd like
Hi crewAI Team,
Background:
When using async streaming execution via
Crew.akickoff(withstream=True), the returned object is aCrewStreamingOutputcontaining an async iterator over stream chunks. In web-serving scenarios (e.g., with FastAPI + StreamingResponse), if the HTTP client disconnects, Python's async generator protocol triggers cancellation of the surrounding coroutine/async generator.Problem / Current Behavior:
Currently, there is:
aclose()/cancel()or equivalent method onCrewStreamingOutputor the underlying async iterator returned by akickoff,Why this matters:
aclose,cancel, etc.) is critical for efficient resource management in LLM inference; otherwise, it could lead to resource exhaustion, quota waste, or stuck threads.aclosefor this exact reason.Request:
aclose()(and/orcancel()) methods onCrewStreamingOutputand all objects returned by the streaming kickoff pipeline (create_async_chunk_generator, etc).aclose()(or a similar method) aggressively cancels/aborts ALL in-flight agent/LLM sub-tasks, and releases any allocations, so that compute/GPU is promptly freed and no work is done after cancellation.Example Usage:
In frameworks like FastAPI:
Thank you for your work. This feature would make crewAI much more robust and production-friendly for real-world streaming applications!
Describe alternatives you've considered
No response
Additional context
No response
Willingness to Contribute
Yes, I'd be happy to submit a pull request