Skip to content

Model discovery JSON can be corrupted by the inference proxy's SSE streaming path #1772

@shiju-nv

Description

@shiju-nv

Problem Statement

GET /v1/models (model discovery) returns a single JSON model list. The sandbox inference proxy routes it through the Server-Sent Events streaming path. On a streaming size-cap or idle-timeout truncation, that path appends an SSE error frame to the body, which corrupts a payload the client parses as one JSON object.

Proposed Design

Make response framing a property of the inference protocol. Add a ResponseFraming field to InferenceApiPattern, set once per pattern in default_patterns. model_discovery and openai_embeddings are Buffered; the SSE protocols (chat completions, completions, responses, Anthropic messages) stay Streaming.

Alternatives Considered

Inspect the request stream flag to choose framing per request. Deferred. It would also let non-streaming chat and completion responses be served buffered, but it is a larger change.

Agent Investigation

No response

Checklist

  • I've reviewed existing issues and the architecture docs
  • This is a design proposal, not a "please build this" request

Metadata

Metadata

Assignees

No one assigned

    Labels

    state:triage-neededOpened without agent diagnostics and needs triage

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions