Feature: Cache monitoring & automatic bust root-cause analysis (Dashboard + Doctor)
Type: Feature request / Idea
Components: packages/dashboard, packages/plugin/scripts/context-dump, doctor CLI, optional opencode-interceptor integration
Summary
Magic Context Dashboard already surfaces cache hit ratios and severity (stable / warning / bust / full_bust), but users cannot easily answer why a step busted, what changed in the prompt between steps, or whether the cause is MC-side (transform flush, system hash change, heuristics) vs provider-side (routing, implicit cache semantics, eviction). For openai-compatible providers (GLM, DeepSeek, etc.) this is especially painful because MC does not see wire payloads.
This issue proposes closing that gap with: (1) richer persisted cache diagnostics, (2) automatic root-cause classification with confidence, (3) Dashboard drill-down and export, and (4) a doctor check-cache <session_id> workflow built on existing context-dump + interceptor dumps.
Problem
User story
On one provider where I use GLM 5.1 I get streaks of busts but also streaks of working fine. I need to figure out what's going on there.
Users want to:
- Inspect cache events with enough detail to find bust causes (not just red bars).
- See per-message token deltas — what grew in
input vs what was served from cache.read.
- Correlate MC transform decisions (
defer / execute, ctx-flush, system prompt hash changes) with provider usage.
- Export diagnostics for bug reports (provider routing, model comparison).
- (Related) Edit [User] notes and some [Hist] fields (e.g. notes) in the Dashboard — separate but adjacent UX gap.
Real-world example
Investigation of ses_* on OneRouter showed:
| Model |
Unique API steps |
Stable (≥90% hit) |
Bust (<50%) |
Median cache.read |
Max cache.read |
| DeepSeek v4 Pro |
110 |
84 |
20 |
139,136 |
168,064 |
| GLM 5.1 |
220 |
6 |
214 |
2,688 |
98,496 |
GLM can hit ~99% cache ratio when input is tiny (e.g. 131 tokens, 95,936 cached) but tool-loop steps with ~90k input and ~2.6k cache.read classify as bust on every step — indistinguishable in the Dashboard from a true MC-induced invalidation without deeper context.
Captured requests had no cache_control; OpenCode does not apply caching hints to @ai-sdk/openai-compatible today. Dashboard showed bust; logs showed decision=defer + small cache hits — metric vs root cause are different questions.
Current state (what already exists)
Dashboard — Cache tab & Cache Diagnostics page
DbCacheEvent (packages/dashboard/src-tauri/src/db.rs): input_tokens, cache_read, cache_write, hit_ratio, severity, cause, turn_id, agent.
log_parser.rs: severity from hit_ratio = cache_read / (input + cache_read + cache_write); detect_bust_cause() scans ±10 log lines for strings like system prompt hash, Execute pass, Historian output, decision=defer.
CacheDiagnostics.tsx: session timeline, turn grouping, bust counts.
- Limitation: causes are log-string heuristics with a narrow window; no message-level diff, no wire dump linkage, no provider/model dimension on events.
Plugin — context-dump script
metrics.ts: perMessageCache with cache_bust when prevCacheRead > 0 && cacheRead < prevCacheRead.
run-context-dump.ts → classifyBust(): richer MC-aware labels (historian injection, system/variant flush, nudge anchor / same turn, etc.) using OpenCode DB + context.db metadata.
- Output: JSON dump with original/transformed messages, stats, last 10 classified busts.
- Limitation: CLI-only; not integrated into Dashboard or Doctor; no interceptor request correlation.
opencode-interceptor
- Writes
{seq}-{provider}-{timestamp}.{request,response,meta}.json per API call (src/intercept/dump.ts).
- Includes model, messages, headers (e.g.
x-session-affinity), stream_options.include_usage.
- Limitation: responses often empty for SSE; MC does not consume these files today.
Gaps
| Gap |
Impact |
| No unified step ID linking log event ↔ OpenCode message ↔ interceptor dump |
Cannot drill from red bar to request body |
classifyBust() (context-dump) not used by Dashboard |
Dashboard causes are weaker than script |
| Low hit ratio on openai-compatible routes treated same as true invalidation |
False alarm fatigue (GLM tool loops) |
| No delta view: Δinput, Δcache_read, Δtransformed_chars between steps |
Cannot see what grew |
| No streak detection (N consecutive busts, model-specific baselines) |
User's "streaks" pattern invisible |
| No export (JSON/CSV) from Cache tab |
Hard to share with provider/MC issues |
| MC transform diagnostics not persisted per API step |
Must grep magic-context.log manually |
| Non-Anthropic wire capture depends on optional interceptor |
Doctor cannot mandate prerequisites |
Proposed solution (phased)
Phase 1 — Doctor: check-cache <session_id> (ship existing script)
Expose runContextDump() as a first-class doctor subcommand:
bunx @cortexkit/opencode-magic-context doctor check-cache ses_xxx
# Optional: --interceptor-dir ~/.local/share/.../opencode-interceptor/ses_xxx
# Optional: --json-out ./cache-report.json
Behavior:
- Run current context-dump pipeline (OpenCode DB + context.db + transform replay).
- Print summary table: bust count, last N busts with
classification + detail (already in scripts/context-dump.ts).
- Preflight: if provider is not Anthropic, check for interceptor dump dir; warn if missing with setup instructions.
- Exit non-zero if bust rate exceeds threshold (optional
--fail-on-bust-rate 0.5).
Acceptance:
- Parity with
bun run scripts/context-dump.ts <session_id>.
- Documented in doctor help and README.
Phase 2 — Persist cache diagnostics in context.db
New table (name TBD, e.g. cache_step_diagnostics):
| Column |
Description |
session_id, message_id, timestamp |
Join keys |
provider, model |
From message.updated |
input_tokens, cache_read, cache_write, hit_ratio |
Usage snapshot |
severity |
Same rules as dashboard |
mc_transform_decision |
defer | execute |
mc_bust_signals |
JSON: system_prompt_hash_changed, rematerialized, ctx_flush, heuristics_ran, … |
classification, classification_detail |
From classifyBust() logic |
classification_confidence |
high | medium | low |
interceptor_dump_basename |
Optional link to capture file |
delta_input, delta_cache_read |
vs previous assistant step |
Populate on message.updated when hasUsageTokens=true (plugin hook), reusing classify logic from context-dump.
Acceptance:
- Dashboard reads from DB first; log parser remains fallback for historical sessions.
- Schema migrated via existing plugin migration path.
Phase 3 — Dashboard: bust drill-down & export
3a. Cache event detail panel
When user expands a cache step/turn:
- Usage breakdown bar:
input vs cache.read vs cache.write
- Classification + confidence + linked MC log excerpts (timestamp-matched)
- Delta row vs previous step:
Δinput, Δcache_read, hit ratio change
- Transform summary: defer/execute, rematerialized m[0]/m[1], tag message count delta
- Link: "Open interceptor dump" (if basename present and file exists)
3b. Streak & model-aware views
- Detect consecutive busts ≥ N; highlight in timeline
- Optional filter: "show only provider/model X"
- Baseline hint for openai-compatible: if
input > 50k and hit_ratio < 10%, label as provider_low_cache_yield (informational) vs mc_likely_bust when execute+hash change in same window
3c. Export
- Export session cache report (JSON): all
DbCacheEvent + diagnostics + bust classifications
- Export bust-only CSV for spreadsheets
- Button on Cache tab + Cache Diagnostics page
Acceptance:
- User can answer "what changed on this bust?" without leaving Dashboard for typical MC causes.
- Export opens in GitHub issue template for provider bugs.
Phase 4 — Wire-level root cause (interceptor integration)
For non-Anthropic providers, correlate interceptor dumps with cache steps:
- Match by timestamp ± duration (
meta.json durationMs) and session affinity header.
- Compute request-level signals:
- Message count, total request bytes
- System prompt hash (first 2 system blocks)
- Presence of
cache_control / providerOptions
- Model string
- Diff vs previous dump: which message indices changed (length/hash), new tools, reasoning blocks added
- Surface in Dashboard: "Request grew by +12,400 tokens in messages [45–47] (tool results)"
Dependencies:
opencode-interceptor enabled and capturing session
- Future: optional MC auth-plugin hook for OpenAI websocket providers (out of scope for v1)
Acceptance:
doctor check-cache accepts --interceptor-dir and includes wire diff in output.
- Dashboard shows "wire diff unavailable" with setup link when dumps missing.
Phase 5 — Automatic root-cause engine (unify classifiers)
Merge and extend:
log_parser.rs detect_bust_cause()
run-context-dump.ts classifyBust()
- New rules from production cases:
| Cause ID |
Detection signal |
mc_system_prompt_hash |
log: system prompt hash changed within Δt |
mc_explicit_flush |
ctx-flush / explicit_flush / rematerialized=true |
mc_execute_heuristics |
decision=execute + heuristics WILL RUN |
mc_historian |
historian write within 5m window (existing) |
mc_storage_fatal |
storage fatal (schema skew — annotate, not blame provider) |
provider_cold_start |
first event or cache_read=0 after long gap |
provider_low_yield |
high input, low cache.read, defer pass, no MC mutation |
provider_routing_change |
interceptor model/provider header changed (if exposed) |
client_no_cache_hints |
openai-compatible + no cache_control in dump |
unknown |
fallback |
Output: { cause_id, summary, evidence: [{source, snippet, timestamp}], confidence }.
Acceptance:
- Same engine in doctor, plugin persistence, and dashboard.
- Unit tests with fixtures from anonymized real sessions (GLM streak, DeepSeek stable chain).
Related: editable User / Historian notes (separate scope)
Discord also asked to edit [User] notes and [Hist] note fields in Dashboard. Recommend separate issue to avoid scope creep, but shared UX pattern: inline edit → RPC → context.db update with audit timestamp.
Non-goals (v1)
- MC becoming a MITM proxy for all providers (maintainer constraint)
- Fixing OpenCode/provider caching behavior (document as
client_no_cache_hints finding only)
- Real-time websocket capture without auth plugin
Acceptance criteria (overall)
Implementation notes
- Reuse
buildDumpStats() / classifyBust() — avoid duplicating logic in Rust; consider shared WASM or JSON schema generated from TS.
- Dashboard already lazy-loads cache events (
cacheActivated); extend get_session_cache_events to join diagnostics table.
- Respect
PER_SESSION / TOTAL_CAP limits in CacheDiagnostics.tsx when adding fields.
References
packages/plugin/scripts/context-dump/ — metrics, classifyBust, runContextDump
packages/dashboard/src-tauri/src/log_parser.rs — severity + detect_bust_cause
packages/dashboard/src/components/CacheDiagnostics/CacheDiagnostics.tsx
opencode-interceptor — src/intercept/dump.ts, per-session capture dirs
- Community reproduction: GLM 5.1 / OneRouter,
hit_ratio ~2–3% on tool-loop steps vs ~99% on tiny-input continuations
Feature: Cache monitoring & automatic bust root-cause analysis (Dashboard + Doctor)
Type: Feature request / Idea
Components:
packages/dashboard,packages/plugin/scripts/context-dump,doctorCLI, optionalopencode-interceptorintegrationSummary
Magic Context Dashboard already surfaces cache hit ratios and severity (
stable/warning/bust/full_bust), but users cannot easily answer why a step busted, what changed in the prompt between steps, or whether the cause is MC-side (transform flush, system hash change, heuristics) vs provider-side (routing, implicit cache semantics, eviction). For openai-compatible providers (GLM, DeepSeek, etc.) this is especially painful because MC does not see wire payloads.This issue proposes closing that gap with: (1) richer persisted cache diagnostics, (2) automatic root-cause classification with confidence, (3) Dashboard drill-down and export, and (4) a
doctor check-cache <session_id>workflow built on existingcontext-dump+ interceptor dumps.Problem
User story
Users want to:
inputvs what was served fromcache.read.defer/execute,ctx-flush, system prompt hash changes) with provider usage.Real-world example
Investigation of
ses_*on OneRouter showed:cache.readcache.readGLM can hit ~99% cache ratio when
inputis tiny (e.g. 131 tokens, 95,936 cached) but tool-loop steps with ~90kinputand ~2.6kcache.readclassify as bust on every step — indistinguishable in the Dashboard from a true MC-induced invalidation without deeper context.Captured requests had no
cache_control; OpenCode does not apply caching hints to@ai-sdk/openai-compatibletoday. Dashboard showed bust; logs showeddecision=defer+ small cache hits — metric vs root cause are different questions.Current state (what already exists)
Dashboard — Cache tab & Cache Diagnostics page
DbCacheEvent(packages/dashboard/src-tauri/src/db.rs):input_tokens,cache_read,cache_write,hit_ratio,severity,cause,turn_id,agent.log_parser.rs: severity fromhit_ratio = cache_read / (input + cache_read + cache_write);detect_bust_cause()scans ±10 log lines for strings likesystem prompt hash,Execute pass,Historian output,decision=defer.CacheDiagnostics.tsx: session timeline, turn grouping, bust counts.Plugin —
context-dumpscriptmetrics.ts:perMessageCachewithcache_bustwhenprevCacheRead > 0 && cacheRead < prevCacheRead.run-context-dump.ts→classifyBust(): richer MC-aware labels (historian injection,system/variant flush,nudge anchor / same turn, etc.) using OpenCode DB +context.dbmetadata.opencode-interceptor{seq}-{provider}-{timestamp}.{request,response,meta}.jsonper API call (src/intercept/dump.ts).x-session-affinity),stream_options.include_usage.Gaps
classifyBust()(context-dump) not used by Dashboardmagic-context.logmanuallyProposed solution (phased)
Phase 1 — Doctor:
check-cache <session_id>(ship existing script)Expose
runContextDump()as a first-class doctor subcommand:Behavior:
classification+detail(already inscripts/context-dump.ts).--fail-on-bust-rate 0.5).Acceptance:
bun run scripts/context-dump.ts <session_id>.Phase 2 — Persist cache diagnostics in
context.dbNew table (name TBD, e.g.
cache_step_diagnostics):session_id,message_id,timestampprovider,modelmessage.updatedinput_tokens,cache_read,cache_write,hit_ratioseveritymc_transform_decisiondefer|executemc_bust_signalssystem_prompt_hash_changed,rematerialized,ctx_flush,heuristics_ran, …classification,classification_detailclassifyBust()logicclassification_confidencehigh|medium|lowinterceptor_dump_basenamedelta_input,delta_cache_readPopulate on
message.updatedwhenhasUsageTokens=true(plugin hook), reusing classify logic from context-dump.Acceptance:
Phase 3 — Dashboard: bust drill-down & export
3a. Cache event detail panel
When user expands a cache step/turn:
inputvscache.readvscache.writeΔinput,Δcache_read, hit ratio change3b. Streak & model-aware views
input > 50kandhit_ratio < 10%, label asprovider_low_cache_yield(informational) vsmc_likely_bustwhen execute+hash change in same window3c. Export
DbCacheEvent+ diagnostics + bust classificationsAcceptance:
Phase 4 — Wire-level root cause (interceptor integration)
For non-Anthropic providers, correlate interceptor dumps with cache steps:
meta.jsondurationMs) and session affinity header.cache_control/providerOptionsDependencies:
opencode-interceptorenabled and capturing sessionAcceptance:
doctor check-cacheaccepts--interceptor-dirand includes wire diff in output.Phase 5 — Automatic root-cause engine (unify classifiers)
Merge and extend:
log_parser.rsdetect_bust_cause()run-context-dump.tsclassifyBust()mc_system_prompt_hashsystem prompt hash changedwithin Δtmc_explicit_flushctx-flush/explicit_flush/rematerialized=truemc_execute_heuristicsdecision=execute+heuristics WILL RUNmc_historianmc_storage_fatalstorage fatal(schema skew — annotate, not blame provider)provider_cold_startcache_read=0after long gapprovider_low_yieldinput, lowcache.read, defer pass, no MC mutationprovider_routing_changeclient_no_cache_hintscache_controlin dumpunknownOutput:
{ cause_id, summary, evidence: [{source, snippet, timestamp}], confidence }.Acceptance:
Related: editable User / Historian notes (separate scope)
Discord also asked to edit [User] notes and [Hist] note fields in Dashboard. Recommend separate issue to avoid scope creep, but shared UX pattern: inline edit → RPC →
context.dbupdate with audit timestamp.Non-goals (v1)
client_no_cache_hintsfinding only)Acceptance criteria (overall)
doctor check-cache <session_id>ships with bust table + JSON exportImplementation notes
buildDumpStats()/classifyBust()— avoid duplicating logic in Rust; consider shared WASM or JSON schema generated from TS.cacheActivated); extendget_session_cache_eventsto join diagnostics table.PER_SESSION/TOTAL_CAPlimits inCacheDiagnostics.tsxwhen adding fields.References
packages/plugin/scripts/context-dump/— metrics, classifyBust, runContextDumppackages/dashboard/src-tauri/src/log_parser.rs— severity + detect_bust_causepackages/dashboard/src/components/CacheDiagnostics/CacheDiagnostics.tsxopencode-interceptor—src/intercept/dump.ts, per-session capture dirshit_ratio~2–3% on tool-loop steps vs ~99% on tiny-input continuations