Skip to content

fix(openai): coalesce system messages for self-hosted and open-model endpoints#3357

Merged
Sayt-0 merged 1 commit into
mainfrom
fix/merge-system-messages-openai-compatible
Jul 1, 2026
Merged

fix(openai): coalesce system messages for self-hosted and open-model endpoints#3357
Sayt-0 merged 1 commit into
mainfrom
fix/merge-system-messages-openai-compatible

Conversation

@Sayt-0

@Sayt-0 Sayt-0 commented Jul 1, 2026

Copy link
Copy Markdown
Member

Problem

docker-agent emits one system message per source: the agent instruction plus each toolset's instructions (see session.go buildInvariantSystemMessages), and a handoff prompt in multi-agent teams. Some OpenAI-compatible backends reject a request that carries more than one system message.

Reported in #3344 (previously #2327): Qwen 3.5/3.6 served by vLLM fails with

HTTP 400: System message must be at the beginning.

The model's Jinja chat template raises raise_exception('System message must be at the beginning.') for any system message that is not at index 0.

Root cause

The chat-completions path already coalesces consecutive system messages, but shouldMergeConsecutiveMessages gated it to a narrow allow-list, so common vLLM configs fell through:

Config merged before this PR result
custom provider (api_type auto-set to openai_chatcompletions) yes ok
provider: baseten / ovhcloud yes ok
provider: openai + base_url (self-hosted vLLM) no HTTP 400
openrouter / nebius + Qwen no HTTP 400

The config in the report (provider: openai with a base_url pointing at a self-hosted vLLM server) is the third row.

Fix

Extend shouldMergeConsecutiveMessages to cover the endpoints that plausibly front a strict-template model, while leaving first-party APIs untouched:

  • provider: openai with a custom base_url (self-hosted vLLM / SGLang), the exact reported config.
  • Open-model host aliases that serve models such as Qwen: openrouter, nebius (alongside the existing baseten, ovhcloud).
  • Unchanged: explicit api_type: openai_chatcompletions (custom providers already merged).

First-party APIs with a fixed model lineup (official OpenAI, Mistral, xAI, MiniMax, GitHub Copilot, OpenCode) tolerate multiple system messages and are deliberately excluded, so their behavior and recorded e2e cassettes are unchanged. Merging is a safe normalization (same-role content is concatenated), runs only on the Chat Completions path (the Responses path uses a separate converter), and matches what the DMR client already does.

Validation

Reproduced end to end using the real binary output and the real upstream Qwen chat template:

docker-agent binary system messages sent real strict Qwen3.5 chat_template.jinja rendered via jinja2 (the engine transformers/vLLM use)
before 2 ([system, system, user]) rejected: System message must be at the beginning.
after 1 ([system, user], merged) rendered ok

The request bodies were captured from a real docker-agent run --exec against a provider: openai + base_url endpoint. The template is the upstream strict version hosted at unsloth/Qwen3.5-2B@8b63d90c32e8 (the guard cited in #2327).

Tests (no network or GPU dependency, CI-safe):

  • TestReproIssue3344_QwenViaVLLM: fake vLLM/Qwen server enforcing the leading-system-message rule, covering provider: openai + base_url and the openrouter alias.
  • TestShouldMergeConsecutiveMessages_Gating: table asserting which endpoints merge, including that first-party mistral and xai do not.

Existing system_message_merge_test.go (baseten, ovhcloud, explicit api_type) and the Mistral e2e cassette tests (TestExec_Mistral, TestExec_Mistral_ToolCall) stay green.

Note on real-hardware testing

This change was not validated against a live vLLM deployment: no GPU was available to run vLLM with model weights. Instead the exact failing code path (server-side Jinja chat-template rendering) was reproduced faithfully with the real captured request body and the real upstream Qwen chat template, plus the CI tests above.

Re #3344

@Sayt-0 Sayt-0 requested a review from a team as a code owner July 1, 2026 09:30
@aheritier aheritier added area/providers For features/issues/fixes related to LLM providers (Bedrock, LiteLLM, Qwen, custom, etc.) area/providers/openai For features/issues/fixes related to the usage of OpenAI models kind/fix PR fixes a bug (maps to fix:). Use on PRs only. labels Jul 1, 2026
@Sayt-0 Sayt-0 force-pushed the fix/merge-system-messages-openai-compatible branch 2 times, most recently from 8270106 to 257a9d7 Compare July 1, 2026 09:42
@Sayt-0 Sayt-0 changed the title fix(openai): coalesce system messages for all OpenAI-compatible endpoints fix(openai): coalesce system messages for self-hosted and open-model endpoints Jul 1, 2026
…endpoints

docker-agent emits one system message per source (the agent instruction plus
each toolset's instructions). Strict server-side chat templates reject a request
that carries more than one system message: Qwen 3.5/3.6 served by vLLM fails with
"HTTP 400: System message must be at the beginning" because the model's Jinja
chat template only allows a system message at index 0 (issues #2327, #3344).

The chat-completions path already coalesced consecutive system messages, but only
for an allow-list (explicit api_type=openai_chatcompletions, baseten, ovhcloud),
so the reported config (provider: openai plus a base_url pointing at a self-hosted
vLLM server) fell through and hit the error.

Extend shouldMergeConsecutiveMessages to also cover an openai provider with a
custom base_url (self-hosted vLLM/SGLang) and the open-model host aliases that
serve strict-template models (openrouter, nebius, alongside baseten and ovhcloud).
First-party APIs with a fixed model lineup (official OpenAI, Mistral, xAI, ...)
tolerate multiple system messages and are left unchanged. The merge is a safe
normalization (same-role content is concatenated), runs only on the Chat
Completions path, and matches what the DMR client already does.

Re #3344
@Sayt-0 Sayt-0 force-pushed the fix/merge-system-messages-openai-compatible branch from 257a9d7 to ab96aa8 Compare July 1, 2026 10:12
@Sayt-0 Sayt-0 merged commit b611d93 into main Jul 1, 2026
8 checks passed
@rumpl rumpl deleted the fix/merge-system-messages-openai-compatible branch July 1, 2026 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/providers/openai For features/issues/fixes related to the usage of OpenAI models area/providers For features/issues/fixes related to LLM providers (Bedrock, LiteLLM, Qwen, custom, etc.) kind/fix PR fixes a bug (maps to fix:). Use on PRs only.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants