Skip to content

Gemma 4 & Qwen 3.5 full prompt reprocessing from system prompt in OpenCode and Pi Coding Agent but only with llama.ccp server as backend. #22474

@vevi33

Description

@vevi33

Description

I have an issue where gemma-4 is sometimes reprocessing the whole prompt after the long assistant message and many tool calls. It happens with OpenCode as well.
LLama.CCP HIP and VULKAN backend.
Oddly enough it is not happening with LM studio as backend.
Using unsloth's GGUFs.

My config:

{
"$schema": "https://opencode.ai/config.json",
"model": "llama-cpp/Main31B",
"compaction": {
"auto": false,
"prune": false
},
"share": "disabled",
"small_model": "llama-cpp/Mini",
"enabled_providers": ["llama-cpp"],
"provider": {
"llama-cpp": {
"name": "llama-cpp",
"npm": "@ai-sdk/openai-compatible",
"options": {
"baseURL": "http://127.0.0.1:8080/v1"
},
"models": {

Plugins

none

OpenCode version

latest

Steps to reproduce

So I ask a long task --- Model starts to think, tool call (many SWA checkpoints saved) ---- Model writes final answer (no full reprocessing yet) ---- BOOM, when I write a new message everything is reprocessed after system prompt.

After I reply to a long assistant message (which include tool calls and reading files, which creates new checkpoints) it should read the last checkpoint, but there is not enough similarity so it rereads everything after the system prompt.

Screenshot and/or share link

Relevant log:

slot release: id 0 | task 4647 | stop processing: n_tokens = 13387, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-gemma4
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.122 (> 0.100 thold), f_keep = 0.113
srv get_availabl: updating prompt cache
srv prompt_save: - saving prompt with length 13387, total state size = 481.749 MiB
srv load: - looking for better prompt, base f_keep = 0.113, sim = 0.122
srv update: - cache state: 2 prompts, 4020.147 MiB (limits: 8196.000 MiB, 100096 tokens, 100096 est)
srv update: - prompt 0000022E4256B2A0: 5662 tokens, checkpoints: 3, 1310.724 MiB
srv update: - prompt 0000022E6CE4E500: 13387 tokens, checkpoints: 8, 2709.423 MiB
srv get_availabl: prompt cache update took 560.23 ms
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 4896 | processing task, is_child = 0
slot update_slots: id 0 | task 4896 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 12378
slot update_slots: id 0 | task 4896 | n_past = 1509, slot.prompt.tokens.size() = 13387, seq_id = 0, pos_min = 12260, n_swa = 1024
slot update_slots: id 0 | task 4896 | Checking checkpoint with [11604, 13138] against 485...
slot update_slots: id 0 | task 4896 | Checking checkpoint with [11091, 12626] against 485...
slot update_slots: id 0 | task 4896 | Checking checkpoint with [7530, 9065] against 485...
slot update_slots: id 0 | task 4896 | Checking checkpoint with [7018, 8553] against 485...
slot update_slots: id 0 | task 4896 | Checking checkpoint with [812, 2160] against 485...
slot update_slots: id 0 | task 4896 | Checking checkpoint with [504, 1837] against 485...
slot update_slots: id 0 | task 4896 | Checking checkpoint with [156, 1504] against 485...
slot update_slots: id 0 | task 4896 | restored context checkpoint (pos_min = 156, pos_max = 1504, n_tokens = 1505, n_past = 1504, size = 263.493 MiB)
slot update_slots: id 0 | task 4896 | erased invalidated context checkpoint (pos_min = 504, pos_max = 1837, n_tokens = 1838, n_swa = 1024, pos_next = 1504, size = 260.563 MiB)
slot update_slots: id 0 | task 4896 | erased invalidated context checkpoint (pos_min = 812, pos_max = 2160, n_tokens = 2161, n_swa = 1024, pos_next = 1504, size = 263.493 MiB)
slot update_slots: id 0 | task 4896 | erased invalidated context checkpoint (pos_min = 7018, pos_max = 8553, n_tokens = 8554, n_swa = 1024, pos_next = 1504, size = 300.018 MiB)
slot update_slots: id 0 | task 4896 | erased invalidated context checkpoint (pos_min = 7530, pos_max = 9065, n_tokens = 9066, n_swa = 1024, pos_next = 1504, size = 300.018 MiB)
slot update_slots: id 0 | task 4896 | erased invalidated context checkpoint (pos_min = 11091, pos_max = 12626, n_tokens = 12627, n_swa = 1024, pos_next = 1504, size = 300.018 MiB)
slot update_slots: id 0 | task 4896 | erased invalidated context checkpoint (pos_min = 11604, pos_max = 13138, n_tokens = 13139, n_swa = 1024, pos_next = 1504, size = 299.823 MiB)
slot update_slots: id 0 | task 4896 | n_tokens = 1504, memory_seq_rm [1504, end)
slot update_slots: id 0 | task 4896 | prompt processing progress, n_tokens = 3552, batch.n_tokens = 2048, progress = 0.286961
slot update_slots: id 0 | task 4896 | n_tokens = 3552, memory_seq_rm [3552, end)
slot update_slots: id 0 | task 4896 | prompt processing progress, n_tokens = 5600, batch.n_tokens = 2048, progress = 0.452416
slot update_slots: id 0 | task 4896 | n_tokens = 5600, memory_seq_rm [5600, end)

Operating System

Win11

Terminal

PowerShelll 7

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcoreAnything pertaining to core functionality of the application (opencode server stuff)windows

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions