-
-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
What happened?
Using FastAPI + litellm in streaming mode:
-
Azure OpenAI backend
- When the browser forcibly closes the SSE/stream connection before completion, the stream stops but no exception is raised.
- The
finally
block is executed only several minutes later.
-
Amazon Bedrock backend
- On the same client disconnect, an
asyncio.CancelledError
is raised immediately, thefinally
block runs right away, and the stream ends as expected.
- On the same client disconnect, an
I would expect Azure OpenAI to behave the same way and raise asyncio.CancelledError
immediately when the client disconnects.
Reproduction code
import asyncio
from typing import AsyncGenerator
from fastapi import APIRouter, FastAPI
from fastapi.responses import StreamingResponse
from litellm import CustomStreamWrapper, acompletion
from litellm.types.utils import StreamingChoices
router = APIRouter()
@router.post("/test")
async def test():
return StreamingResponse(generator(), media_type="text/event-stream")
async def generator():
# Azure OpenAI:
# model="azure/gpt-4o"
# api_base="https://foo-openai.openai.azure.com"
# aws_region_name=None
#
# Amazon Bedrock:
# model=bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
# api_base=None
# aws_region_name="ap-northeast-1"
try:
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Please provide a long answer."},
]
response = await acompletion(
messages=messages,
model=model
api_base=api_base
aws_region_name=aws_region_name,
stream=True,
)
if isinstance(response, (CustomStreamWrapper, AsyncGenerator)):
async for chunk in response:
choice = chunk.choices[0]
if not isinstance(choice, StreamingChoices):
raise RuntimeError(f"Unexpected type: {type(choice)}")
content = choice.delta.content
if content:
yield f"data: {content}\n\n"
except asyncio.CancelledError as e:
# Bedrock raises this immediately; Azure does not
print("asyncio.CancelledError", e)
raise
except Exception as e:
print("Exception:", e)
raise
finally:
print("finally called")
app = FastAPI()
app.include_router(router)
To reproduce:
- Start the FastAPI server.
- Hit
/test
from a browser or curl, then abort the request (e.g., close tab or press Ctrl-C). - Observe the different behaviors for Bedrock vs Azure OpenAI.
Environment
Library | Version |
---|---|
litellm | 1.72.2 |
FastAPI | 0.115.12 |
Python | 3.13.2 |
Backends tested | Azure OpenAI, Amazon Bedrock |
Expected behavior
asyncio.CancelledError
should be raised immediately on client disconnect for the Azure OpenAI backend, mirroring Bedrock’s behavior, so that cleanup in finally
runs promptly.
Relevant log output
Are you a ML Ops Team?
No
What LiteLLM version are you on ?
v1.72.2
Twitter / LinkedIn details
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working