fix(openai): coalesce system messages for self-hosted and open-model endpoints by Sayt-0 · Pull Request #3357 · docker/docker-agent

Sayt-0 · 2026-07-01T09:30:17Z

Problem

docker-agent emits one system message per source: the agent instruction plus each toolset's instructions (see session.go buildInvariantSystemMessages), and a handoff prompt in multi-agent teams. Some OpenAI-compatible backends reject a request that carries more than one system message.

Reported in #3344 (previously #2327): Qwen 3.5/3.6 served by vLLM fails with

HTTP 400: System message must be at the beginning.

The model's Jinja chat template raises raise_exception('System message must be at the beginning.') for any system message that is not at index 0.

Root cause

The chat-completions path already coalesces consecutive system messages, but shouldMergeConsecutiveMessages gated it to a narrow allow-list, so common vLLM configs fell through:

Config	merged before this PR	result
custom provider (api_type auto-set to `openai_chatcompletions`)	yes	ok
`provider: baseten` / `ovhcloud`	yes	ok
`provider: openai` + `base_url` (self-hosted vLLM)	no	HTTP 400
`openrouter` / `nebius` + Qwen	no	HTTP 400

The config in the report (provider: openai with a base_url pointing at a self-hosted vLLM server) is the third row.

Fix

Extend shouldMergeConsecutiveMessages to cover the endpoints that plausibly front a strict-template model, while leaving first-party APIs untouched:

provider: openai with a custom base_url (self-hosted vLLM / SGLang), the exact reported config.
Open-model host aliases that serve models such as Qwen: openrouter, nebius (alongside the existing baseten, ovhcloud).
Unchanged: explicit api_type: openai_chatcompletions (custom providers already merged).

First-party APIs with a fixed model lineup (official OpenAI, Mistral, xAI, MiniMax, GitHub Copilot, OpenCode) tolerate multiple system messages and are deliberately excluded, so their behavior and recorded e2e cassettes are unchanged. Merging is a safe normalization (same-role content is concatenated), runs only on the Chat Completions path (the Responses path uses a separate converter), and matches what the DMR client already does.

Validation

Reproduced end to end using the real binary output and the real upstream Qwen chat template:

docker-agent binary	system messages sent	real strict Qwen3.5 `chat_template.jinja` rendered via jinja2 (the engine transformers/vLLM use)
before	2 (`[system, system, user]`)	rejected: `System message must be at the beginning.`
after	1 (`[system, user]`, merged)	rendered ok

The request bodies were captured from a real docker-agent run --exec against a provider: openai + base_url endpoint. The template is the upstream strict version hosted at unsloth/Qwen3.5-2B@8b63d90c32e8 (the guard cited in #2327).

Tests (no network or GPU dependency, CI-safe):

TestReproIssue3344_QwenViaVLLM: fake vLLM/Qwen server enforcing the leading-system-message rule, covering provider: openai + base_url and the openrouter alias.
TestShouldMergeConsecutiveMessages_Gating: table asserting which endpoints merge, including that first-party mistral and xai do not.

Existing system_message_merge_test.go (baseten, ovhcloud, explicit api_type) and the Mistral e2e cassette tests (TestExec_Mistral, TestExec_Mistral_ToolCall) stay green.

Note on real-hardware testing

This change was not validated against a live vLLM deployment: no GPU was available to run vLLM with model weights. Instead the exact failing code path (server-side Jinja chat-template rendering) was reproduced faithfully with the real captured request body and the real upstream Qwen chat template, plus the CI tests above.

Re #3344

…endpoints docker-agent emits one system message per source (the agent instruction plus each toolset's instructions). Strict server-side chat templates reject a request that carries more than one system message: Qwen 3.5/3.6 served by vLLM fails with "HTTP 400: System message must be at the beginning" because the model's Jinja chat template only allows a system message at index 0 (issues #2327, #3344). The chat-completions path already coalesced consecutive system messages, but only for an allow-list (explicit api_type=openai_chatcompletions, baseten, ovhcloud), so the reported config (provider: openai plus a base_url pointing at a self-hosted vLLM server) fell through and hit the error. Extend shouldMergeConsecutiveMessages to also cover an openai provider with a custom base_url (self-hosted vLLM/SGLang) and the open-model host aliases that serve strict-template models (openrouter, nebius, alongside baseten and ovhcloud). First-party APIs with a fixed model lineup (official OpenAI, Mistral, xAI, ...) tolerate multiple system messages and are left unchanged. The merge is a safe normalization (same-role content is concatenated), runs only on the Chat Completions path, and matches what the DMR client already does. Re #3344

Sayt-0 requested a review from a team as a code owner July 1, 2026 09:30

aheritier added area/providers For features/issues/fixes related to LLM providers (Bedrock, LiteLLM, Qwen, custom, etc.) area/providers/openai For features/issues/fixes related to the usage of OpenAI models kind/fix PR fixes a bug (maps to fix:). Use on PRs only. labels Jul 1, 2026

Sayt-0 force-pushed the fix/merge-system-messages-openai-compatible branch 2 times, most recently from 8270106 to 257a9d7 Compare July 1, 2026 09:42

Sayt-0 changed the title ~~fix(openai): coalesce system messages for all OpenAI-compatible endpoints~~ fix(openai): coalesce system messages for self-hosted and open-model endpoints Jul 1, 2026

dgageot approved these changes Jul 1, 2026

View reviewed changes

Sayt-0 force-pushed the fix/merge-system-messages-openai-compatible branch from 257a9d7 to ab96aa8 Compare July 1, 2026 10:12

Sayt-0 merged commit b611d93 into main Jul 1, 2026
8 checks passed

dgageot mentioned this pull request Jul 1, 2026

fix(openai): merge custom-provider system prompts #2382

Closed

rumpl deleted the fix/merge-system-messages-openai-compatible branch July 1, 2026 20:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(openai): coalesce system messages for self-hosted and open-model endpoints#3357

fix(openai): coalesce system messages for self-hosted and open-model endpoints#3357
Sayt-0 merged 1 commit into
mainfrom
fix/merge-system-messages-openai-compatible

Sayt-0 commented Jul 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Sayt-0 commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root cause

Fix

Validation

Note on real-hardware testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Sayt-0 commented Jul 1, 2026 •

edited

Loading