ai-proxy: streaming SSE loop can starve worker CPU under bursty upstreams

## Background

PR #13255 adds an `ngx.sleep(0)` at the end of `parse_streaming_response()` (`apisix/plugins/ai-providers/base.lua`) to yield to the nginx scheduler inside the SSE streaming loop. Without it, when the upstream socket already has data buffered and the downstream client drains immediately, neither `body_reader()` nor `ngx.flush()` yields, so the loop monopolizes the worker CPU and blocks health checks and concurrent requests on the same worker.

As pointed out in https://github.com/apache/apisix/pull/13255#discussion_r3107926053, that fix is a **workaround**: it prevents a single request from monopolizing the worker, but it does not solve the underlying problem.

## Real problems still to solve

1. **One worker, one client** — if a single SSE client keeps the upstream busy forever, that client still consumes one full worker for the entire lifetime of the stream. `ngx.sleep(0)` only interleaves it with other coroutines on the same worker; it does not bound per-request CPU time.
2. **No backpressure / fairness across requests** — a slow downstream client that never drains will keep the buffer full and the loop hot. We have no per-stream rate limiting or fair scheduling for SSE.
3. **No timeout for stalled streams** — there is no upper bound on how long a streaming response can stay in the loop.
4. **Yield granularity is coarse** — `ngx.sleep(0)` after every chunk is cheap-ish but still adds an event-loop hop per SSE event; for very chatty providers this is wasteful.

## Possible directions

- Move SSE proxying to a dedicated lightweight path that uses cosocket reads with explicit yield points and a configurable max chunks-per-yield.
- Add per-stream timeouts and total-duration limits configurable on the ai-proxy plugin.
- Investigate whether nginx `proxy_buffering off` + native streaming (without going through the Lua body filter) can handle a subset of cases.
- Add a worker-level concurrency cap for streaming AI requests.

## Acceptance

- A more principled fix that bounds per-stream CPU and provides fairness, replacing the `ngx.sleep(0)` workaround.
- The workaround in #13255 can stay until then; the comment in code points at this issue.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-proxy: streaming SSE loop can starve worker CPU under bursty upstreams #13256

Background

Real problems still to solve

Possible directions

Acceptance

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

ai-proxy: streaming SSE loop can starve worker CPU under bursty upstreams #13256

Description

Background

Real problems still to solve

Possible directions

Acceptance

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions