Description
I investigated a long-running session tree with severe intermittent delays and inconsistent prompt-cache reuse. The core issue is that delayed turns repeatedly occurred in both the primary session and delegated secondary sessions, but the strongest evidence does not support a simple explanation of "the system prompt changed on every slow turn."
The most concerning observation is that multiple delayed turns shared the same stored request-state snapshot hash while cache.read flipped between cache hit and cache miss. That suggests one of:
- cache reuse is unstable for effectively identical request state,
- token/cache accounting is inconsistent,
- or request construction differs in a way not reflected by the stored snapshot hash.
I also observed:
- repeated delayed primary turns ranging from ~936s to ~2294s,
- some delayed turns ending with
MessageAbortedError,
- negative input-token values on some cached turns,
- sparse local logs that did not explain the slowdown window.
This feels related to the broader class of cache/accounting issues already discussed in issues like cache misses / low cache hit rate, dynamic system prompt invalidation, and negative token accounting, but this report adds one specific piece of evidence: same snapshot, mixed cache behavior during the slowdown period.
High-level aggregates
| Class |
Assistant turns |
Avg duration |
Max duration |
| Primary |
1221 |
32.9s |
2293.7s |
| Secondary |
209 |
49.4s |
1515.9s |
Error signals
| Signal |
Count observed |
Notes |
MessageAbortedError in analyzed root session |
4 |
At least one very long delayed primary turn ended this way |
| Negative input token values |
observed |
Makes some token accounting unreliable |
| Sparse local logs |
yes |
Available local logs did not explain the slowdown window |
Stable snapshot / mixed cache evidence
Example cluster 1 - same snapshot across consecutive delayed primary turns:
| Time |
Snapshot stable? |
Input tokens |
Cache read |
Result |
| 17:41:09 |
yes |
81,851 |
0 |
tool-calls |
| 17:41:17 |
yes |
82,132 |
0 |
tool-calls |
| 17:41:27 |
yes |
82,859 |
0 |
tool-calls |
| 17:41:40 |
yes |
196 |
83,392 |
tool-calls |
| 17:41:53 |
yes |
-423 |
84,160 |
tool-calls |
| 17:42:09 |
yes |
84,358 |
0 |
tool-calls |
Example cluster 2 - same snapshot spanning secondary and primary turns:
| Time |
Class |
Snapshot stable? |
Input tokens |
Cache read |
Result |
| 17:28:36 |
Secondary |
yes |
-132 |
57,664 |
tool-calls |
| 17:29:56 |
Secondary |
yes |
68,461 |
0 |
tool-calls |
| 17:30:46 |
Secondary |
yes |
69,245 |
0 |
stop |
| 17:30:46 |
Primary |
yes |
70,188 |
0 |
tool-calls |
| 17:31:31 |
Primary |
yes |
72,065 |
0 |
tool-calls |
| 17:32:26 |
Secondary |
yes |
22,968 |
0 |
tool-calls |
| 17:33:36 |
Secondary |
yes |
11,868 |
23,360 |
tool-calls |
| 17:33:50 |
Secondary |
yes |
38,163 |
0 |
tool-calls |
Delay pattern summary
| Observation |
Seen? |
Notes |
| Repeated delayed primary turns over 15 minutes |
yes |
Multiple turns ranged from ~936s to ~2294s |
| Repeated delayed secondary turns |
yes |
Some appear to be genuinely doing work, not necessarily hung |
| Delays only after compaction |
no |
Slow turns appeared both before and after compaction |
| Same request-state snapshot across multiple delayed turns |
yes |
Important: slow clusters shared stable snapshots |
| Cache hits and misses within same snapshot cluster |
yes |
Strong evidence against "prompt changed every turn" |
What changed vs what stayed stable
Stayed stable during some misses:
- stored request-state
snapshot hash
- overall result type (
tool-calls repeatedly)
- broad turn structure within the same cluster
Changed during some misses:
cache.read accounting (0 vs positive)
- input-token accounting, including negative values
- session branch (
Primary vs Secondary) in some shared-snapshot clusters
- occasional terminal outcome (
stop vs tool-calls)
Why I think this is an OpenCode/core issue worth investigating
- I could not confirm a dynamic system-prompt mutation on every slow turn.
- The stored snapshot staying the same while cache behavior flips is the clearest anomaly.
- The negative token values suggest accounting/reporting bugs may be obscuring the real failure mode.
- Some primary turns ended in
MessageAbortedError, which suggests provider/client abort handling may also be involved.
It would be very helpful to know:
- what exactly contributes to the stored
snapshot hash,
- whether same-snapshot requests can still differ in provider-facing cache-relevant bytes,
- and whether
cache.read = 0 in these cases means true cache miss, unreported cache state, or another accounting artifact.
Plugins
Weave orchestration plugin, a local bash-wrapper plugin, and a command-safety plugin.
OpenCode version
0.0.0-master-202603312117
Steps to reproduce
- Run a long multi-turn session with a primary conversation that delegates into multiple secondary sessions.
- Let the session accumulate repeated
tool-calls turns over time.
- Inspect stored session/message/part data for delayed turns.
- Group turns by stored request-state
snapshot hash.
- Observe that some repeated same-snapshot clusters alternate between
cache.read > 0 and cache.read = 0, even while delays continue.
- In some cases, also observe negative input-token values or
MessageAbortedError on delayed turns.
Screenshot and/or share link
No share link included. I can provide more anonymized structured evidence if needed.
Operating System
macOS 26.2
Terminal
OpenCode CLI
Description
I investigated a long-running session tree with severe intermittent delays and inconsistent prompt-cache reuse. The core issue is that delayed turns repeatedly occurred in both the primary session and delegated secondary sessions, but the strongest evidence does not support a simple explanation of "the system prompt changed on every slow turn."
The most concerning observation is that multiple delayed turns shared the same stored request-state snapshot hash while
cache.readflipped between cache hit and cache miss. That suggests one of:I also observed:
MessageAbortedError,This feels related to the broader class of cache/accounting issues already discussed in issues like cache misses / low cache hit rate, dynamic system prompt invalidation, and negative token accounting, but this report adds one specific piece of evidence: same snapshot, mixed cache behavior during the slowdown period.
High-level aggregates
Error signals
MessageAbortedErrorin analyzed root sessionStable snapshot / mixed cache evidence
Example cluster 1 - same snapshot across consecutive delayed primary turns:
Example cluster 2 - same snapshot spanning secondary and primary turns:
Delay pattern summary
What changed vs what stayed stable
Stayed stable during some misses:
snapshothashtool-callsrepeatedly)Changed during some misses:
cache.readaccounting (0vs positive)PrimaryvsSecondary) in some shared-snapshot clustersstopvstool-calls)Why I think this is an OpenCode/core issue worth investigating
MessageAbortedError, which suggests provider/client abort handling may also be involved.It would be very helpful to know:
snapshothash,cache.read = 0in these cases means true cache miss, unreported cache state, or another accounting artifact.Plugins
Weave orchestration plugin, a local bash-wrapper plugin, and a command-safety plugin.
OpenCode version
0.0.0-master-202603312117Steps to reproduce
tool-callsturns over time.snapshothash.cache.read > 0andcache.read = 0, even while delays continue.MessageAbortedErroron delayed turns.Screenshot and/or share link
No share link included. I can provide more anonymized structured evidence if needed.
Operating System
macOS 26.2
Terminal
OpenCode CLI