[Bug]: Azure OpenAI backend does not raise asyncio.CancelledError on client disconnect (works with Bedrock)

### What happened?

Using **FastAPI** + **litellm** in streaming mode:

* **Azure OpenAI backend**

  * When the browser forcibly closes the SSE/stream connection *before* completion, the stream stops but **no exception is raised**.
  * The `finally` block is executed only several minutes later.

* **Amazon Bedrock backend**

  * On the same client disconnect, an `asyncio.CancelledError` is raised **immediately**, the `finally` block runs right away, and the stream ends as expected.

I would expect Azure OpenAI to behave the same way and raise `asyncio.CancelledError` immediately when the client disconnects.

### Reproduction code

```python
import asyncio
from typing import AsyncGenerator

from fastapi import APIRouter, FastAPI
from fastapi.responses import StreamingResponse
from litellm import CustomStreamWrapper, acompletion
from litellm.types.utils import StreamingChoices

router = APIRouter()

@router.post("/test")
async def test():
    return StreamingResponse(generator(), media_type="text/event-stream")

async def generator():
    # Azure OpenAI:
    #   model="azure/gpt-4o"
    #   api_base="https://foo-openai.openai.azure.com"
    #   aws_region_name=None
    #
    # Amazon Bedrock:
    #   model=bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0
    #   api_base=None
    #   aws_region_name="ap-northeast-1"
    try:
        messages = [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user",   "content": "Please provide a long answer."},
        ]

        response = await acompletion(
            messages=messages,
            model=model
            api_base=api_base
            aws_region_name=aws_region_name,
            stream=True,
        )

        if isinstance(response, (CustomStreamWrapper, AsyncGenerator)):
            async for chunk in response:
                choice = chunk.choices[0]
                if not isinstance(choice, StreamingChoices):
                    raise RuntimeError(f"Unexpected type: {type(choice)}")
                content = choice.delta.content
                if content:
                    yield f"data: {content}\n\n"

    except asyncio.CancelledError as e:
        # Bedrock raises this immediately; Azure does not
        print("asyncio.CancelledError", e)
        raise
    except Exception as e:
        print("Exception:", e)
        raise
    finally:
        print("finally called")

app = FastAPI()
app.include_router(router)
```

To reproduce:

1. Start the FastAPI server.
2. Hit `/test` from a browser or curl, then abort the request (e.g., close tab or press Ctrl-C).
3. Observe the different behaviors for Bedrock vs Azure OpenAI.

### Environment

| Library         | Version                      |
| --------------- | ---------------------------- |
| **litellm**     | 1.72.2                       |
| **FastAPI**     | 0.115.12                     |
| Python          | 3.13.2                       |
| Backends tested | Azure OpenAI, Amazon Bedrock |

### Expected behavior

`asyncio.CancelledError` should be raised immediately on client disconnect for the Azure OpenAI backend, mirroring Bedrock’s behavior, so that cleanup in `finally` runs promptly.


### Relevant log output

```shell

```

### Are you a ML Ops Team?

No

### What LiteLLM version are you on ?

v1.72.2

### Twitter / LinkedIn details

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Azure OpenAI backend does not raise asyncio.CancelledError on client disconnect (works with Bedrock) #11546

What happened?

Reproduction code

Environment

Expected behavior

Relevant log output

Are you a ML Ops Team?

What LiteLLM version are you on ?

Twitter / LinkedIn details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Library	Version
litellm	1.72.2
FastAPI	0.115.12
Python	3.13.2
Backends tested	Azure OpenAI, Amazon Bedrock

Uh oh!

[Bug]: Azure OpenAI backend does not raise asyncio.CancelledError on client disconnect (works with Bedrock) #11546

Description

What happened?

Reproduction code

Environment

Expected behavior

Relevant log output

Are you a ML Ops Team?

What LiteLLM version are you on ?

Twitter / LinkedIn details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions