smallcode:
last version 1.2.3
Llama.cpp last version
model: unsloth/Qwen3.5-4B-Q4_K_M_unsloth.gguf
.env:
SMALLCODE_MODEL=Qwen3.5-4B-Q4_K_M_unsloth.gguf
SMALLCODE_BASE_URL=http://127.0.0.1:8080/v1
error:

llama.cpp log:
0.48.160.470 W srv operator(): got exception: {"error":{"code":400,"message":"Unable to generate parser for this template. Automatic parser generation failed: \n------------\nWhile executing CallExpression at line 85, column 32 in source:\n...first %}↵ {{- raise_exception('System message must be at the beginnin...\n ^\nError: Jinja Exception: System message must be at the beginning.","type":"invalid_request_error"}} 0.50.210.207 W srv operator(): got exception: {"error":{"code":400,"message":"Unable to generate parser for this template. Automatic parser generation failed: \n------------\nWhile executing CallExpression at line 85, column 32 in source:\n...first %}↵ {{- raise_exception('System message must be at the beginnin...\n ^\nError: Jinja Exception: System message must be at the beginning.","type":"invalid_request_error"}}
Further explanation from Claude:
Root Cause
When a request includes tools, llama.cpp automatically generates a grammar from the model's Jinja chat template to enforce valid tool-call output. Qwen3's chat template contains a strict validation guard (around line 85):
{%- if not loop.first %}
{{- raise_exception('System message must be at the beginning.') }}
{%- endif %}
This raises an exception if any role: "system" message appears at a position other than index 0 in the messages array.
SmallCode's architecture injects additional system-role content mid-conversation in several places:
- Knowledge injection (from
knowledge/ directory)
- Working memory / task re-injection on greeting regression
- Plan re-injection as a turn anchor
- Multi-file edit coordination headers
If any of these are appended as a new { role: "system", content: "..." } object rather than merged into the first system message, the Qwen3 template throws the exception and llama.cpp returns HTTP 400 before the request is even processed.
Expected Behavior
All dynamic system-role injections should be merged into a single system message at position 0, not appended as additional system objects.
Suggested Fix
In the function that assembles the final messages array before each API call, consolidate all system-role content:
// Instead of pushing a new system message:
// messages.push({ role: "system", content: knowledgeInjection }); // ❌
// Merge into the existing system message at index 0:
function buildMessages(systemParts, history) {
const systemContent = systemParts.filter(Boolean).join("\n\n");
return [
{ role: "system", content: systemContent },
...history.filter(m => m.role !== "system") // strip any stray system messages from history
];
}
This ensures the messages array always has exactly one system message, always at index 0 — which satisfies Qwen3's (and other strict models') chat template requirements.
Additional Notes
- This likely affects all Qwen3-family models (Qwen3-4B, 8B, 14B, 32B, etc.) and any other model whose Jinja chat template enforces system-message ordering.
- The bug only surfaces when
tools are present in the request, because that is when llama.cpp executes the template for grammar generation. Plain chat requests without tools may work fine even with the broken message order.
smallcode:
last version 1.2.3
Llama.cpp last version
model: unsloth/Qwen3.5-4B-Q4_K_M_unsloth.gguf
.env:
SMALLCODE_MODEL=Qwen3.5-4B-Q4_K_M_unsloth.gguf
SMALLCODE_BASE_URL=http://127.0.0.1:8080/v1
error:

llama.cpp log:
0.48.160.470 W srv operator(): got exception: {"error":{"code":400,"message":"Unable to generate parser for this template. Automatic parser generation failed: \n------------\nWhile executing CallExpression at line 85, column 32 in source:\n...first %}↵ {{- raise_exception('System message must be at the beginnin...\n ^\nError: Jinja Exception: System message must be at the beginning.","type":"invalid_request_error"}} 0.50.210.207 W srv operator(): got exception: {"error":{"code":400,"message":"Unable to generate parser for this template. Automatic parser generation failed: \n------------\nWhile executing CallExpression at line 85, column 32 in source:\n...first %}↵ {{- raise_exception('System message must be at the beginnin...\n ^\nError: Jinja Exception: System message must be at the beginning.","type":"invalid_request_error"}}Further explanation from Claude:
Root Cause
When a request includes
tools, llama.cpp automatically generates a grammar from the model's Jinja chat template to enforce valid tool-call output. Qwen3's chat template contains a strict validation guard (around line 85):This raises an exception if any
role: "system"message appears at a position other than index 0 in themessagesarray.SmallCode's architecture injects additional system-role content mid-conversation in several places:
knowledge/directory)If any of these are appended as a new
{ role: "system", content: "..." }object rather than merged into the first system message, the Qwen3 template throws the exception and llama.cpp returns HTTP 400 before the request is even processed.Expected Behavior
All dynamic system-role injections should be merged into a single system message at position 0, not appended as additional system objects.
Suggested Fix
In the function that assembles the final
messagesarray before each API call, consolidate all system-role content:This ensures the
messagesarray always has exactly one system message, always at index 0 — which satisfies Qwen3's (and other strict models') chat template requirements.Additional Notes
toolsare present in the request, because that is when llama.cpp executes the template for grammar generation. Plain chat requests without tools may work fine even with the broken message order.