llama.cpp qwen 3.5 error

**smallcode:**
last version 1.2.3

**Llama.cpp last version**
model: unsloth/Qwen3.5-4B-Q4_K_M_unsloth.gguf

**.env:** 
SMALLCODE_MODEL=Qwen3.5-4B-Q4_K_M_unsloth.gguf
SMALLCODE_BASE_URL=http://127.0.0.1:8080/v1

**error:**
<img width="1905" height="72" alt="Image" src="https://github.com/user-attachments/assets/02def2db-2b4e-4ad8-bc18-0e7504eea7e3" />

**llama.cpp log:**
`0.48.160.470 W srv    operator(): got exception: {"error":{"code":400,"message":"Unable to generate parser for this template. Automatic parser generation failed: \n------------\nWhile executing CallExpression at line 85, column 32 in source:\n...first %}↵            {{- raise_exception('System message must be at the beginnin...\n                                           ^\nError: Jinja Exception: System message must be at the beginning.","type":"invalid_request_error"}}
0.50.210.207 W srv    operator(): got exception: {"error":{"code":400,"message":"Unable to generate parser for this template. Automatic parser generation failed: \n------------\nWhile executing CallExpression at line 85, column 32 in source:\n...first %}↵            {{- raise_exception('System message must be at the beginnin...\n                                           ^\nError: Jinja Exception: System message must be at the beginning.","type":"invalid_request_error"}}`

**Further explanation from Claude:**

**Root Cause**

When a request includes `tools`, llama.cpp automatically generates a grammar from the model's Jinja chat template to enforce valid tool-call output. Qwen3's chat template contains a strict validation guard (around line 85):

```jinja
{%- if not loop.first %}
    {{- raise_exception('System message must be at the beginning.') }}
{%- endif %}
```

This raises an exception if **any `role: "system"` message appears at a position other than index 0** in the `messages` array.

SmallCode's architecture injects additional system-role content mid-conversation in several places:
- Knowledge injection (from `knowledge/` directory)
- Working memory / task re-injection on greeting regression
- Plan re-injection as a turn anchor
- Multi-file edit coordination headers

If any of these are appended as a new `{ role: "system", content: "..." }` object rather than merged into the first system message, the Qwen3 template throws the exception and llama.cpp returns HTTP 400 before the request is even processed.

**Expected Behavior**

All dynamic system-role injections should be **merged into a single system message at position 0**, not appended as additional system objects.

**Suggested Fix**

In the function that assembles the final `messages` array before each API call, consolidate all system-role content:

```js
// Instead of pushing a new system message:
// messages.push({ role: "system", content: knowledgeInjection }); // ❌

// Merge into the existing system message at index 0:
function buildMessages(systemParts, history) {
  const systemContent = systemParts.filter(Boolean).join("\n\n");
  return [
    { role: "system", content: systemContent },
    ...history.filter(m => m.role !== "system") // strip any stray system messages from history
  ];
}
```

This ensures the `messages` array always has exactly one system message, always at index 0 — which satisfies Qwen3's (and other strict models') chat template requirements.

**Additional Notes**

- This likely affects all Qwen3-family models (Qwen3-4B, 8B, 14B, 32B, etc.) and any other model whose Jinja chat template enforces system-message ordering.
- The bug only surfaces when `tools` are present in the request, because that is when llama.cpp executes the template for grammar generation. Plain chat requests without tools may work fine even with the broken message order.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama.cpp qwen 3.5 error #62

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

llama.cpp qwen 3.5 error #62

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions