Skip to content

bug: Large XML-heavy system prompt breaks tool calling across multiple Ollama models (Qwen3, Qwen2.5, Devstral) #24824

@andrewgwoodruff

Description

@andrewgwoodruff

Environment

  • OpenCode: 1.4.6
  • Model: qwen3-coder:30b via Ollama
  • Ollama: 0.20.7
  • OS: macOS Darwin 25.4.0 (Apple M5 Pro)
  • Project: ~73 skills loaded from .claude/skills/

Problem

When OpenCode loads skills from a Claude Code project, the system prompt reaches ~41KB, and the skills listing is serialized as XML tags (<available_skills>, <skill>, <description>, etc.). I'm thinking that this is causing qwen3-coder:30b to switch from OpenAI JSON tool-call format to its native HERMES XML format; OpenCode can't parse HERMES XML, so all tool invocations fail silently or log invalid Invalid Tool.

Reproduction

  1. Open a project with 50+ Claude Code skills in .claude/skills/
  2. Configure Ollama qwen3-coder:30b as the model (via @ai-sdk/openai-compatible provider pointing to http://localhost:11434/v1)
  3. Run any prompt that requires a tool call (e.g. "list top-level directories")
  4. Observe output like:
    <function=bash><parameter=command>ls</parameter></function>
    
    instead of a JSON tool_calls response

Root cause analysis

The system prompt size and XML structure is the trigger. Verified via proxy logging between OpenCode and Ollama:

  • OpenCode system prompt with skills loaded: 41,804 bytes, contains XML-tagged skill list
  • With this prompt + 45 tools → model outputs HERMES XML (fails, invalid Invalid Tool)
  • With a minimal plain-text prompt + same 45 tools via direct Ollama API → model outputs valid OpenAI JSON tool_calls (succeeds)
  • Streaming with 45 tools in isolation works correctly, so it seems that this is not an Ollama streaming bug

The XML tags in the skills listing (<skill>, <description>, <location>) appear to signal to qwen3-coder that it should respond in XML, overriding the tool-call format the model would otherwise use. Note: I haven't confirmed whether it's the XML structure specifically or just the prompt length; a 41KB plain-text prompt as a control would settle it, but the XML hypothesis is the stronger lead given the model behavior.

Possible mitigations

  1. Serialize skills in JSON or plain-text format instead of XML tags in the system prompt
  2. Add HERMES XML tool-call parsing as a fallback (similar to OpenCode can not support DeepSeek tool calling method: DSML(DeepSeek Makeup Language) #24566 for DSML)
  3. Expose a config option to disable skills injection into the system prompt for local/non-Claude models

Related

Metadata

Metadata

Assignees

Labels

coreAnything pertaining to core functionality of the application (opencode server stuff)

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions