-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Problem
When a client sends stream: true in a /v1/chat/completions request, the response is either a 500 error or silently dropped, depending on the provider.
Root cause
chat_handler does not support streaming — it always calls g_app.chat_completion() and returns the result via web.json_response(). However, process_chat() only sets stream=false when the field is absent from the request:
if "stream" not in chat:
chat["stream"] = FalseWhen a client explicitly sends stream=true:
-
Provider-side failure: The upstream provider (e.g. Ollama) returns SSE (
text/event-stream).response_json()tries to parse SSE as JSON →"Expecting value: line 1 column 1 (char 0)"→ exception → HTTP 500. -
Client-side failure: Even if the provider call succeeds (e.g. with providers that ignore the stream flag),
chat_handlerreturns plain JSON. Streaming clients like the Vercel AI SDK (@ai-sdk/openai-compatible) expecttext/event-streamwithchat.completion.chunkobjects containing adeltafield. They silently discard the unexpected JSON and report no response.
Impact
Any client that defaults to stream: true (which is the default for most OpenAI-compatible SDKs, including OpenClaw's embedded agent) gets either 500 errors or empty responses.
Reproduction
# From any client pointing at llmspy:
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"<any-model>","messages":[{"role":"user","content":"hi"}],"stream":true}'
# Returns 500 or provider-dependent errorProposed fix
- Force
stream=falseunconditionally inprocess_chat()so providers always return parseable JSON. - In
chat_handler, detect the client's originalstreampreference and, whentrue, convert the JSON response to SSE chunks inchat.completion.chunkformat before sending.
See PR for implementation.