Skip to content

ai-proxy: streaming SSE loop can starve worker CPU under bursty upstreams #13256

@nic-6443

Description

@nic-6443

Background

PR #13255 adds an ngx.sleep(0) at the end of parse_streaming_response() (apisix/plugins/ai-providers/base.lua) to yield to the nginx scheduler inside the SSE streaming loop. Without it, when the upstream socket already has data buffered and the downstream client drains immediately, neither body_reader() nor ngx.flush() yields, so the loop monopolizes the worker CPU and blocks health checks and concurrent requests on the same worker.

As pointed out in #13255 (comment), that fix is a workaround: it prevents a single request from monopolizing the worker, but it does not solve the underlying problem.

Real problems still to solve

  1. One worker, one client — if a single SSE client keeps the upstream busy forever, that client still consumes one full worker for the entire lifetime of the stream. ngx.sleep(0) only interleaves it with other coroutines on the same worker; it does not bound per-request CPU time.
  2. No backpressure / fairness across requests — a slow downstream client that never drains will keep the buffer full and the loop hot. We have no per-stream rate limiting or fair scheduling for SSE.
  3. No timeout for stalled streams — there is no upper bound on how long a streaming response can stay in the loop.
  4. Yield granularity is coarsengx.sleep(0) after every chunk is cheap-ish but still adds an event-loop hop per SSE event; for very chatty providers this is wasteful.

Possible directions

  • Move SSE proxying to a dedicated lightweight path that uses cosocket reads with explicit yield points and a configurable max chunks-per-yield.
  • Add per-stream timeouts and total-duration limits configurable on the ai-proxy plugin.
  • Investigate whether nginx proxy_buffering off + native streaming (without going through the Lua body filter) can handle a subset of cases.
  • Add a worker-level concurrency cap for streaming AI requests.

Acceptance

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    📋 Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions