Skip to content

[Bug]: Azure OpenAI backend does not raise asyncio.CancelledError on client disconnect (works with Bedrock) #11546

@nshmura

Description

@nshmura

What happened?

Using FastAPI + litellm in streaming mode:

  • Azure OpenAI backend

    • When the browser forcibly closes the SSE/stream connection before completion, the stream stops but no exception is raised.
    • The finally block is executed only several minutes later.
  • Amazon Bedrock backend

    • On the same client disconnect, an asyncio.CancelledError is raised immediately, the finally block runs right away, and the stream ends as expected.

I would expect Azure OpenAI to behave the same way and raise asyncio.CancelledError immediately when the client disconnects.

Reproduction code

import asyncio
from typing import AsyncGenerator

from fastapi import APIRouter, FastAPI
from fastapi.responses import StreamingResponse
from litellm import CustomStreamWrapper, acompletion
from litellm.types.utils import StreamingChoices

router = APIRouter()

@router.post("/test")
async def test():
    return StreamingResponse(generator(), media_type="text/event-stream")

async def generator():
    # Azure OpenAI:
    #   model="azure/gpt-4o"
    #   api_base="https://foo-openai.openai.azure.com"
    #   aws_region_name=None
    #
    # Amazon Bedrock:
    #   model=bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
    #   api_base=None
    #   aws_region_name="ap-northeast-1"
    try:
        messages = [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user",   "content": "Please provide a long answer."},
        ]

        response = await acompletion(
            messages=messages,
            model=model
            api_base=api_base
            aws_region_name=aws_region_name,
            stream=True,
        )

        if isinstance(response, (CustomStreamWrapper, AsyncGenerator)):
            async for chunk in response:
                choice = chunk.choices[0]
                if not isinstance(choice, StreamingChoices):
                    raise RuntimeError(f"Unexpected type: {type(choice)}")
                content = choice.delta.content
                if content:
                    yield f"data: {content}\n\n"

    except asyncio.CancelledError as e:
        # Bedrock raises this immediately; Azure does not
        print("asyncio.CancelledError", e)
        raise
    except Exception as e:
        print("Exception:", e)
        raise
    finally:
        print("finally called")

app = FastAPI()
app.include_router(router)

To reproduce:

  1. Start the FastAPI server.
  2. Hit /test from a browser or curl, then abort the request (e.g., close tab or press Ctrl-C).
  3. Observe the different behaviors for Bedrock vs Azure OpenAI.

Environment

Library Version
litellm 1.72.2
FastAPI 0.115.12
Python 3.13.2
Backends tested Azure OpenAI, Amazon Bedrock

Expected behavior

asyncio.CancelledError should be raised immediately on client disconnect for the Azure OpenAI backend, mirroring Bedrock’s behavior, so that cleanup in finally runs promptly.

Relevant log output

Are you a ML Ops Team?

No

What LiteLLM version are you on ?

v1.72.2

Twitter / LinkedIn details

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions