May not be an issue for most and I’m not savvy enough to know if this would be the same for other chat hosts, but experienced this and asked Opus to summarize what it found when using ds4-server with openai:
Streaming chat requests without a tools field bypass the structured OpenAI streamer and fall through to the plain SSE path, which emits every token (including the literal closer) as content deltas. Reasoning never lands in reasoning_content, and clients like Jan that look for balanced ... tags see only the closer.
Repro:
curl -N http://127.0.0.1:8000/v1/chat/completions
-H 'Content-Type: application/json'
-d '{"model":"deepseek-v4-flash","messages":[{"role":"user","content":"What is 17 * 23?"}],"stream":true}'
Observed: every delta is {"content":"..."}, including one {"content":""} mid-stream.
Expected: thinking tokens in reasoning_content deltas, no literal on the wire.
Cause is in ds4_server.c around line 4964: structured_stream and openai_live_tools both require j->req.has_tools, so non-tool chats skip openai_sse_stream_update (which already handles thinking correctly) and use the plain sse_chunk path at line 5083. Dropping the has_tools requirement so the OpenAI structured streamer runs for all OpenAI streaming chats fixes it; verified locally against Jan.
May not be an issue for most and I’m not savvy enough to know if this would be the same for other chat hosts, but experienced this and asked Opus to summarize what it found when using ds4-server with openai:
Streaming chat requests without a tools field bypass the structured OpenAI streamer and fall through to the plain SSE path, which emits every token (including the literal closer) as content deltas. Reasoning never lands in reasoning_content, and clients like Jan that look for balanced ... tags see only the closer.
Repro:
curl -N http://127.0.0.1:8000/v1/chat/completions
-H 'Content-Type: application/json'
-d '{"model":"deepseek-v4-flash","messages":[{"role":"user","content":"What is 17 * 23?"}],"stream":true}'
Observed: every delta is {"content":"..."}, including one {"content":""} mid-stream.
Expected: thinking tokens in reasoning_content deltas, no literal on the wire.
Cause is in ds4_server.c around line 4964: structured_stream and openai_live_tools both require j->req.has_tools, so non-tool chats skip openai_sse_stream_update (which already handles thinking correctly) and use the plain sse_chunk path at line 5083. Dropping the has_tools requirement so the OpenAI structured streamer runs for all OpenAI streaming chats fixes it; verified locally against Jan.