Description
I have an issue where gemma-4 is sometimes reprocessing the whole prompt after the long assistant message and many tool calls. It happens with OpenCode as well.
LLama.CCP HIP and VULKAN backend.
Oddly enough it is not happening with LM studio as backend.
Using unsloth's GGUFs.
My config:
{
"$schema": "https://opencode.ai/config.json",
"model": "llama-cpp/Main31B",
"compaction": {
"auto": false,
"prune": false
},
"share": "disabled",
"small_model": "llama-cpp/Mini",
"enabled_providers": ["llama-cpp"],
"provider": {
"llama-cpp": {
"name": "llama-cpp",
"npm": "@ai-sdk/openai-compatible",
"options": {
"baseURL": "http://127.0.0.1:8080/v1"
},
"models": {
Plugins
none
OpenCode version
latest
Steps to reproduce
So I ask a long task --- Model starts to think, tool call (many SWA checkpoints saved) ---- Model writes final answer (no full reprocessing yet) ---- BOOM, when I write a new message everything is reprocessed after system prompt.
After I reply to a long assistant message (which include tool calls and reading files, which creates new checkpoints) it should read the last checkpoint, but there is not enough similarity so it rereads everything after the system prompt.
Screenshot and/or share link
Relevant log:
slot release: id 0 | task 4647 | stop processing: n_tokens = 13387, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-gemma4
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.122 (> 0.100 thold), f_keep = 0.113
srv get_availabl: updating prompt cache
srv prompt_save: - saving prompt with length 13387, total state size = 481.749 MiB
srv load: - looking for better prompt, base f_keep = 0.113, sim = 0.122
srv update: - cache state: 2 prompts, 4020.147 MiB (limits: 8196.000 MiB, 100096 tokens, 100096 est)
srv update: - prompt 0000022E4256B2A0: 5662 tokens, checkpoints: 3, 1310.724 MiB
srv update: - prompt 0000022E6CE4E500: 13387 tokens, checkpoints: 8, 2709.423 MiB
srv get_availabl: prompt cache update took 560.23 ms
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 4896 | processing task, is_child = 0
slot update_slots: id 0 | task 4896 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 12378
slot update_slots: id 0 | task 4896 | n_past = 1509, slot.prompt.tokens.size() = 13387, seq_id = 0, pos_min = 12260, n_swa = 1024
slot update_slots: id 0 | task 4896 | Checking checkpoint with [11604, 13138] against 485...
slot update_slots: id 0 | task 4896 | Checking checkpoint with [11091, 12626] against 485...
slot update_slots: id 0 | task 4896 | Checking checkpoint with [7530, 9065] against 485...
slot update_slots: id 0 | task 4896 | Checking checkpoint with [7018, 8553] against 485...
slot update_slots: id 0 | task 4896 | Checking checkpoint with [812, 2160] against 485...
slot update_slots: id 0 | task 4896 | Checking checkpoint with [504, 1837] against 485...
slot update_slots: id 0 | task 4896 | Checking checkpoint with [156, 1504] against 485...
slot update_slots: id 0 | task 4896 | restored context checkpoint (pos_min = 156, pos_max = 1504, n_tokens = 1505, n_past = 1504, size = 263.493 MiB)
slot update_slots: id 0 | task 4896 | erased invalidated context checkpoint (pos_min = 504, pos_max = 1837, n_tokens = 1838, n_swa = 1024, pos_next = 1504, size = 260.563 MiB)
slot update_slots: id 0 | task 4896 | erased invalidated context checkpoint (pos_min = 812, pos_max = 2160, n_tokens = 2161, n_swa = 1024, pos_next = 1504, size = 263.493 MiB)
slot update_slots: id 0 | task 4896 | erased invalidated context checkpoint (pos_min = 7018, pos_max = 8553, n_tokens = 8554, n_swa = 1024, pos_next = 1504, size = 300.018 MiB)
slot update_slots: id 0 | task 4896 | erased invalidated context checkpoint (pos_min = 7530, pos_max = 9065, n_tokens = 9066, n_swa = 1024, pos_next = 1504, size = 300.018 MiB)
slot update_slots: id 0 | task 4896 | erased invalidated context checkpoint (pos_min = 11091, pos_max = 12626, n_tokens = 12627, n_swa = 1024, pos_next = 1504, size = 300.018 MiB)
slot update_slots: id 0 | task 4896 | erased invalidated context checkpoint (pos_min = 11604, pos_max = 13138, n_tokens = 13139, n_swa = 1024, pos_next = 1504, size = 299.823 MiB)
slot update_slots: id 0 | task 4896 | n_tokens = 1504, memory_seq_rm [1504, end)
slot update_slots: id 0 | task 4896 | prompt processing progress, n_tokens = 3552, batch.n_tokens = 2048, progress = 0.286961
slot update_slots: id 0 | task 4896 | n_tokens = 3552, memory_seq_rm [3552, end)
slot update_slots: id 0 | task 4896 | prompt processing progress, n_tokens = 5600, batch.n_tokens = 2048, progress = 0.452416
slot update_slots: id 0 | task 4896 | n_tokens = 5600, memory_seq_rm [5600, end)
Operating System
Win11
Terminal
PowerShelll 7
Description
I have an issue where gemma-4 is sometimes reprocessing the whole prompt after the long assistant message and many tool calls. It happens with OpenCode as well.
LLama.CCP HIP and VULKAN backend.
Oddly enough it is not happening with LM studio as backend.
Using unsloth's GGUFs.
My config:
{
"$schema": "https://opencode.ai/config.json",
"model": "llama-cpp/Main31B",
"compaction": {
"auto": false,
"prune": false
},
"share": "disabled",
"small_model": "llama-cpp/Mini",
"enabled_providers": ["llama-cpp"],
"provider": {
"llama-cpp": {
"name": "llama-cpp",
"npm": "@ai-sdk/openai-compatible",
"options": {
"baseURL": "http://127.0.0.1:8080/v1"
},
"models": {
Plugins
none
OpenCode version
latest
Steps to reproduce
So I ask a long task --- Model starts to think, tool call (many SWA checkpoints saved) ---- Model writes final answer (no full reprocessing yet) ---- BOOM, when I write a new message everything is reprocessed after system prompt.
After I reply to a long assistant message (which include tool calls and reading files, which creates new checkpoints) it should read the last checkpoint, but there is not enough similarity so it rereads everything after the system prompt.
Screenshot and/or share link
Relevant log:
slot release: id 0 | task 4647 | stop processing: n_tokens = 13387, truncated = 0
srv update_slots: all slots are idle
srv params_from_: Chat format: peg-gemma4
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.122 (> 0.100 thold), f_keep = 0.113
srv get_availabl: updating prompt cache
srv prompt_save: - saving prompt with length 13387, total state size = 481.749 MiB
srv load: - looking for better prompt, base f_keep = 0.113, sim = 0.122
srv update: - cache state: 2 prompts, 4020.147 MiB (limits: 8196.000 MiB, 100096 tokens, 100096 est)
srv update: - prompt 0000022E4256B2A0: 5662 tokens, checkpoints: 3, 1310.724 MiB
srv update: - prompt 0000022E6CE4E500: 13387 tokens, checkpoints: 8, 2709.423 MiB
srv get_availabl: prompt cache update took 560.23 ms
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 4896 | processing task, is_child = 0
slot update_slots: id 0 | task 4896 | new prompt, n_ctx_slot = 100096, n_keep = 0, task.n_tokens = 12378
slot update_slots: id 0 | task 4896 | n_past = 1509, slot.prompt.tokens.size() = 13387, seq_id = 0, pos_min = 12260, n_swa = 1024
slot update_slots: id 0 | task 4896 | Checking checkpoint with [11604, 13138] against 485...
slot update_slots: id 0 | task 4896 | Checking checkpoint with [11091, 12626] against 485...
slot update_slots: id 0 | task 4896 | Checking checkpoint with [7530, 9065] against 485...
slot update_slots: id 0 | task 4896 | Checking checkpoint with [7018, 8553] against 485...
slot update_slots: id 0 | task 4896 | Checking checkpoint with [812, 2160] against 485...
slot update_slots: id 0 | task 4896 | Checking checkpoint with [504, 1837] against 485...
slot update_slots: id 0 | task 4896 | Checking checkpoint with [156, 1504] against 485...
slot update_slots: id 0 | task 4896 | restored context checkpoint (pos_min = 156, pos_max = 1504, n_tokens = 1505, n_past = 1504, size = 263.493 MiB)
slot update_slots: id 0 | task 4896 | erased invalidated context checkpoint (pos_min = 504, pos_max = 1837, n_tokens = 1838, n_swa = 1024, pos_next = 1504, size = 260.563 MiB)
slot update_slots: id 0 | task 4896 | erased invalidated context checkpoint (pos_min = 812, pos_max = 2160, n_tokens = 2161, n_swa = 1024, pos_next = 1504, size = 263.493 MiB)
slot update_slots: id 0 | task 4896 | erased invalidated context checkpoint (pos_min = 7018, pos_max = 8553, n_tokens = 8554, n_swa = 1024, pos_next = 1504, size = 300.018 MiB)
slot update_slots: id 0 | task 4896 | erased invalidated context checkpoint (pos_min = 7530, pos_max = 9065, n_tokens = 9066, n_swa = 1024, pos_next = 1504, size = 300.018 MiB)
slot update_slots: id 0 | task 4896 | erased invalidated context checkpoint (pos_min = 11091, pos_max = 12626, n_tokens = 12627, n_swa = 1024, pos_next = 1504, size = 300.018 MiB)
slot update_slots: id 0 | task 4896 | erased invalidated context checkpoint (pos_min = 11604, pos_max = 13138, n_tokens = 13139, n_swa = 1024, pos_next = 1504, size = 299.823 MiB)
slot update_slots: id 0 | task 4896 | n_tokens = 1504, memory_seq_rm [1504, end)
slot update_slots: id 0 | task 4896 | prompt processing progress, n_tokens = 3552, batch.n_tokens = 2048, progress = 0.286961
slot update_slots: id 0 | task 4896 | n_tokens = 3552, memory_seq_rm [3552, end)
slot update_slots: id 0 | task 4896 | prompt processing progress, n_tokens = 5600, batch.n_tokens = 2048, progress = 0.452416
slot update_slots: id 0 | task 4896 | n_tokens = 5600, memory_seq_rm [5600, end)
Operating System
Win11
Terminal
PowerShelll 7