Gemma 4 & Qwen 3.5 full prompt reprocessing from system prompt in OpenCode and Pi Coding Agent but only with llama.ccp server as backend.

### Description

I have an issue where gemma-4 is sometimes reprocessing the whole prompt after the long assistant message and many tool calls. It happens with OpenCode as well.
LLama.CCP HIP and VULKAN backend.
Oddly enough it is not happening with LM studio as backend.
Using unsloth's GGUFs.


My config:

{
  "$schema": "https://opencode.ai/config.json",
  "model": "llama-cpp/Main31B",
  "compaction": {
    "auto": false,
    "prune": false
  },
  "share": "disabled",
  "small_model": "llama-cpp/Mini",
  "enabled_providers": ["llama-cpp"],
  "provider": {
    "llama-cpp": {
      "name": "llama-cpp",
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://127.0.0.1:8080/v1"
      },
      "models": {

### Plugins

none

### OpenCode version

latest

### Steps to reproduce

So I ask a long task --- Model starts to think, tool call (many SWA checkpoints saved) ---- Model writes final answer (no full reprocessing yet) ---- BOOM, when I write a new message everything is reprocessed after system prompt.

After I reply to a long assistant message (which include tool calls and reading files, which creates new checkpoints) it should read the last checkpoint, but there is not enough similarity so it rereads everything after the system prompt.

### Screenshot and/or share link

Relevant log:

slot      release: id  0 | task 4647 | stop processing: n_tokens = 13387, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-gemma4
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.122 (> 0.100 thold), f_keep = 0.113
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 13387, total state size = 481.749 MiB
srv          load:  - looking for better prompt, base f_keep = 0.113, sim = 0.122
srv        update:  - cache state: 2 prompts, 4020.147 MiB (limits: 8196.000 MiB, 100096 tokens, 100096 est)
srv        update:    - prompt 0000022E4256B2A0:    5662 tokens, checkpoints:  3,  1310.724 MiB
srv        update:    - prompt 0000022E6CE4E500:   13387 tokens, checkpoints:  8,  2709.423 MiB
srv  get_availabl: prompt cache update took 560.23 ms
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id  0 | task 4896 | processing task, is_child = 0
slot update_slots: id  0 | task 4896 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 12378
slot update_slots: id  0 | task 4896 | n_past = 1509, slot.prompt.tokens.size() = 13387, seq_id = 0, pos_min = 12260, n_swa = 1024
slot update_slots: id  0 | task 4896 | Checking checkpoint with [11604, 13138] against 485...
slot update_slots: id  0 | task 4896 | Checking checkpoint with [11091, 12626] against 485...
slot update_slots: id  0 | task 4896 | Checking checkpoint with [7530, 9065] against 485...
slot update_slots: id  0 | task 4896 | Checking checkpoint with [7018, 8553] against 485...
slot update_slots: id  0 | task 4896 | Checking checkpoint with [812, 2160] against 485...
slot update_slots: id  0 | task 4896 | Checking checkpoint with [504, 1837] against 485...
slot update_slots: id  0 | task 4896 | Checking checkpoint with [156, 1504] against 485...
slot update_slots: id  0 | task 4896 | restored context checkpoint (pos_min = 156, pos_max = 1504, n_tokens = 1505, n_past = 1504, size = 263.493 MiB)
slot update_slots: id  0 | task 4896 | erased invalidated context checkpoint (pos_min = 504, pos_max = 1837, n_tokens = 1838, n_swa = 1024, pos_next = 1504, size = 260.563 MiB)
slot update_slots: id  0 | task 4896 | erased invalidated context checkpoint (pos_min = 812, pos_max = 2160, n_tokens = 2161, n_swa = 1024, pos_next = 1504, size = 263.493 MiB)
slot update_slots: id  0 | task 4896 | erased invalidated context checkpoint (pos_min = 7018, pos_max = 8553, n_tokens = 8554, n_swa = 1024, pos_next = 1504, size = 300.018 MiB)
slot update_slots: id  0 | task 4896 | erased invalidated context checkpoint (pos_min = 7530, pos_max = 9065, n_tokens = 9066, n_swa = 1024, pos_next = 1504, size = 300.018 MiB)
slot update_slots: id  0 | task 4896 | erased invalidated context checkpoint (pos_min = 11091, pos_max = 12626, n_tokens = 12627, n_swa = 1024, pos_next = 1504, size = 300.018 MiB)
slot update_slots: id  0 | task 4896 | erased invalidated context checkpoint (pos_min = 11604, pos_max = 13138, n_tokens = 13139, n_swa = 1024, pos_next = 1504, size = 299.823 MiB)
slot update_slots: id  0 | task 4896 | n_tokens = 1504, memory_seq_rm [1504, end)
slot update_slots: id  0 | task 4896 | prompt processing progress, n_tokens = 3552, batch.n_tokens = 2048, progress = 0.286961
slot update_slots: id  0 | task 4896 | n_tokens = 3552, memory_seq_rm [3552, end)
slot update_slots: id  0 | task 4896 | prompt processing progress, n_tokens = 5600, batch.n_tokens = 2048, progress = 0.452416
slot update_slots: id  0 | task 4896 | n_tokens = 5600, memory_seq_rm [5600, end)


### Operating System

Win11

### Terminal

PowerShelll 7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemma 4 & Qwen 3.5 full prompt reprocessing from system prompt in OpenCode and Pi Coding Agent but only with llama.ccp server as backend. #22474

Description

Plugins

OpenCode version

Steps to reproduce

Screenshot and/or share link

Operating System

Terminal

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Gemma 4 & Qwen 3.5 full prompt reprocessing from system prompt in OpenCode and Pi Coding Agent but only with llama.ccp server as backend. #22474

Description

Description

Plugins

OpenCode version

Steps to reproduce

Screenshot and/or share link

Operating System

Terminal

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions