Inconsistent cache reuse and long delays despite stable request snapshots

### Description

I investigated a long-running session tree with severe intermittent delays and inconsistent prompt-cache reuse. The core issue is that delayed turns repeatedly occurred in both the primary session and delegated secondary sessions, but the strongest evidence does **not** support a simple explanation of "the system prompt changed on every slow turn."

The most concerning observation is that multiple delayed turns shared the same stored request-state snapshot hash while `cache.read` flipped between cache hit and cache miss. That suggests one of:

- cache reuse is unstable for effectively identical request state,
- token/cache accounting is inconsistent,
- or request construction differs in a way not reflected by the stored snapshot hash.

I also observed:

- repeated delayed primary turns ranging from ~936s to ~2294s,
- some delayed turns ending with `MessageAbortedError`,
- negative input-token values on some cached turns,
- sparse local logs that did not explain the slowdown window.

This feels related to the broader class of cache/accounting issues already discussed in issues like cache misses / low cache hit rate, dynamic system prompt invalidation, and negative token accounting, but this report adds one specific piece of evidence: **same snapshot, mixed cache behavior during the slowdown period**.

#### High-level aggregates

| Class | Assistant turns | Avg duration | Max duration |
|---|---:|---:|---:|
| Primary | 1221 | 32.9s | 2293.7s |
| Secondary | 209 | 49.4s | 1515.9s |

#### Error signals

| Signal | Count observed | Notes |
|---|---:|---|
| `MessageAbortedError` in analyzed root session | 4 | At least one very long delayed primary turn ended this way |
| Negative input token values | observed | Makes some token accounting unreliable |
| Sparse local logs | yes | Available local logs did not explain the slowdown window |

#### Stable snapshot / mixed cache evidence

Example cluster 1 - same snapshot across consecutive delayed primary turns:

| Time | Snapshot stable? | Input tokens | Cache read | Result |
|---|---|---:|---:|---|
| 17:41:09 | yes | 81,851 | 0 | tool-calls |
| 17:41:17 | yes | 82,132 | 0 | tool-calls |
| 17:41:27 | yes | 82,859 | 0 | tool-calls |
| 17:41:40 | yes | 196 | 83,392 | tool-calls |
| 17:41:53 | yes | -423 | 84,160 | tool-calls |
| 17:42:09 | yes | 84,358 | 0 | tool-calls |

Example cluster 2 - same snapshot spanning secondary and primary turns:

| Time | Class | Snapshot stable? | Input tokens | Cache read | Result |
|---|---|---|---:|---:|---|
| 17:28:36 | Secondary | yes | -132 | 57,664 | tool-calls |
| 17:29:56 | Secondary | yes | 68,461 | 0 | tool-calls |
| 17:30:46 | Secondary | yes | 69,245 | 0 | stop |
| 17:30:46 | Primary | yes | 70,188 | 0 | tool-calls |
| 17:31:31 | Primary | yes | 72,065 | 0 | tool-calls |
| 17:32:26 | Secondary | yes | 22,968 | 0 | tool-calls |
| 17:33:36 | Secondary | yes | 11,868 | 23,360 | tool-calls |
| 17:33:50 | Secondary | yes | 38,163 | 0 | tool-calls |

#### Delay pattern summary

| Observation | Seen? | Notes |
|---|---|---|
| Repeated delayed primary turns over 15 minutes | yes | Multiple turns ranged from ~936s to ~2294s |
| Repeated delayed secondary turns | yes | Some appear to be genuinely doing work, not necessarily hung |
| Delays only after compaction | no | Slow turns appeared both before and after compaction |
| Same request-state snapshot across multiple delayed turns | yes | Important: slow clusters shared stable snapshots |
| Cache hits and misses within same snapshot cluster | yes | Strong evidence against "prompt changed every turn" |

#### What changed vs what stayed stable

Stayed stable during some misses:

- stored request-state `snapshot` hash
- overall result type (`tool-calls` repeatedly)
- broad turn structure within the same cluster

Changed during some misses:

- `cache.read` accounting (`0` vs positive)
- input-token accounting, including negative values
- session branch (`Primary` vs `Secondary`) in some shared-snapshot clusters
- occasional terminal outcome (`stop` vs `tool-calls`)

#### Why I think this is an OpenCode/core issue worth investigating

- I could not confirm a dynamic system-prompt mutation on every slow turn.
- The stored snapshot staying the same while cache behavior flips is the clearest anomaly.
- The negative token values suggest accounting/reporting bugs may be obscuring the real failure mode.
- Some primary turns ended in `MessageAbortedError`, which suggests provider/client abort handling may also be involved.

It would be very helpful to know:

1. what exactly contributes to the stored `snapshot` hash,
2. whether same-snapshot requests can still differ in provider-facing cache-relevant bytes,
3. and whether `cache.read = 0` in these cases means true cache miss, unreported cache state, or another accounting artifact.

### Plugins

Weave orchestration plugin, a local bash-wrapper plugin, and a command-safety plugin.

### OpenCode version

`0.0.0-master-202603312117`

### Steps to reproduce

1. Run a long multi-turn session with a primary conversation that delegates into multiple secondary sessions.
2. Let the session accumulate repeated `tool-calls` turns over time.
3. Inspect stored session/message/part data for delayed turns.
4. Group turns by stored request-state `snapshot` hash.
5. Observe that some repeated same-snapshot clusters alternate between `cache.read > 0` and `cache.read = 0`, even while delays continue.
6. In some cases, also observe negative input-token values or `MessageAbortedError` on delayed turns.

### Screenshot and/or share link

No share link included. I can provide more anonymized structured evidence if needed.

### Operating System

macOS 26.2

### Terminal

OpenCode CLI


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent cache reuse and long delays despite stable request snapshots #20565

Description

High-level aggregates

Error signals

Stable snapshot / mixed cache evidence

Delay pattern summary

What changed vs what stayed stable

Why I think this is an OpenCode/core issue worth investigating

Plugins

OpenCode version

Steps to reproduce

Screenshot and/or share link

Operating System

Terminal

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Signal	Count observed	Notes
`MessageAbortedError` in analyzed root session	4	At least one very long delayed primary turn ended this way
Negative input token values	observed	Makes some token accounting unreliable
Sparse local logs	yes	Available local logs did not explain the slowdown window

Time	Snapshot stable?	Input tokens	Cache read	Result
17:41:09	yes	81,851	0	tool-calls
17:41:17	yes	82,132	0	tool-calls
17:41:27	yes	82,859	0	tool-calls
17:41:40	yes	196	83,392	tool-calls
17:41:53	yes	-423	84,160	tool-calls
17:42:09	yes	84,358	0	tool-calls

Time	Class	Snapshot stable?	Input tokens	Cache read	Result
17:28:36	Secondary	yes	-132	57,664	tool-calls
17:29:56	Secondary	yes	68,461	0	tool-calls
17:30:46	Secondary	yes	69,245	0	stop
17:30:46	Primary	yes	70,188	0	tool-calls
17:31:31	Primary	yes	72,065	0	tool-calls
17:32:26	Secondary	yes	22,968	0	tool-calls
17:33:36	Secondary	yes	11,868	23,360	tool-calls
17:33:50	Secondary	yes	38,163	0	tool-calls

Observation	Seen?	Notes
Repeated delayed primary turns over 15 minutes	yes	Multiple turns ranged from ~936s to ~2294s
Repeated delayed secondary turns	yes	Some appear to be genuinely doing work, not necessarily hung
Delays only after compaction	no	Slow turns appeared both before and after compaction
Same request-state snapshot across multiple delayed turns	yes	Important: slow clusters shared stable snapshots
Cache hits and misses within same snapshot cluster	yes	Strong evidence against "prompt changed every turn"

Inconsistent cache reuse and long delays despite stable request snapshots #20565

Description

Description

High-level aggregates

Error signals

Stable snapshot / mixed cache evidence

Delay pattern summary

What changed vs what stayed stable

Why I think this is an OpenCode/core issue worth investigating

Plugins

OpenCode version

Steps to reproduce

Screenshot and/or share link

Operating System

Terminal

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions