Skip to content

(feat:) Cache monitoring & automatic bust root-cause analysis (Dashboard + Doctor) #130

@Zireael

Description

@Zireael

Feature: Cache monitoring & automatic bust root-cause analysis (Dashboard + Doctor)

Type: Feature request / Idea
Components: packages/dashboard, packages/plugin/scripts/context-dump, doctor CLI, optional opencode-interceptor integration


Summary

Magic Context Dashboard already surfaces cache hit ratios and severity (stable / warning / bust / full_bust), but users cannot easily answer why a step busted, what changed in the prompt between steps, or whether the cause is MC-side (transform flush, system hash change, heuristics) vs provider-side (routing, implicit cache semantics, eviction). For openai-compatible providers (GLM, DeepSeek, etc.) this is especially painful because MC does not see wire payloads.

This issue proposes closing that gap with: (1) richer persisted cache diagnostics, (2) automatic root-cause classification with confidence, (3) Dashboard drill-down and export, and (4) a doctor check-cache <session_id> workflow built on existing context-dump + interceptor dumps.


Problem

User story

On one provider where I use GLM 5.1 I get streaks of busts but also streaks of working fine. I need to figure out what's going on there.

Users want to:

  1. Inspect cache events with enough detail to find bust causes (not just red bars).
  2. See per-message token deltas — what grew in input vs what was served from cache.read.
  3. Correlate MC transform decisions (defer / execute, ctx-flush, system prompt hash changes) with provider usage.
  4. Export diagnostics for bug reports (provider routing, model comparison).
  5. (Related) Edit [User] notes and some [Hist] fields (e.g. notes) in the Dashboard — separate but adjacent UX gap.

Real-world example

Investigation of ses_* on OneRouter showed:

Model Unique API steps Stable (≥90% hit) Bust (<50%) Median cache.read Max cache.read
DeepSeek v4 Pro 110 84 20 139,136 168,064
GLM 5.1 220 6 214 2,688 98,496

GLM can hit ~99% cache ratio when input is tiny (e.g. 131 tokens, 95,936 cached) but tool-loop steps with ~90k input and ~2.6k cache.read classify as bust on every step — indistinguishable in the Dashboard from a true MC-induced invalidation without deeper context.

Captured requests had no cache_control; OpenCode does not apply caching hints to @ai-sdk/openai-compatible today. Dashboard showed bust; logs showed decision=defer + small cache hits — metric vs root cause are different questions.


Current state (what already exists)

Dashboard — Cache tab & Cache Diagnostics page

  • DbCacheEvent (packages/dashboard/src-tauri/src/db.rs): input_tokens, cache_read, cache_write, hit_ratio, severity, cause, turn_id, agent.
  • log_parser.rs: severity from hit_ratio = cache_read / (input + cache_read + cache_write); detect_bust_cause() scans ±10 log lines for strings like system prompt hash, Execute pass, Historian output, decision=defer.
  • CacheDiagnostics.tsx: session timeline, turn grouping, bust counts.
  • Limitation: causes are log-string heuristics with a narrow window; no message-level diff, no wire dump linkage, no provider/model dimension on events.

Plugin — context-dump script

  • metrics.ts: perMessageCache with cache_bust when prevCacheRead > 0 && cacheRead < prevCacheRead.
  • run-context-dump.tsclassifyBust(): richer MC-aware labels (historian injection, system/variant flush, nudge anchor / same turn, etc.) using OpenCode DB + context.db metadata.
  • Output: JSON dump with original/transformed messages, stats, last 10 classified busts.
  • Limitation: CLI-only; not integrated into Dashboard or Doctor; no interceptor request correlation.

opencode-interceptor

  • Writes {seq}-{provider}-{timestamp}.{request,response,meta}.json per API call (src/intercept/dump.ts).
  • Includes model, messages, headers (e.g. x-session-affinity), stream_options.include_usage.
  • Limitation: responses often empty for SSE; MC does not consume these files today.

Gaps

Gap Impact
No unified step ID linking log event ↔ OpenCode message ↔ interceptor dump Cannot drill from red bar to request body
classifyBust() (context-dump) not used by Dashboard Dashboard causes are weaker than script
Low hit ratio on openai-compatible routes treated same as true invalidation False alarm fatigue (GLM tool loops)
No delta view: Δinput, Δcache_read, Δtransformed_chars between steps Cannot see what grew
No streak detection (N consecutive busts, model-specific baselines) User's "streaks" pattern invisible
No export (JSON/CSV) from Cache tab Hard to share with provider/MC issues
MC transform diagnostics not persisted per API step Must grep magic-context.log manually
Non-Anthropic wire capture depends on optional interceptor Doctor cannot mandate prerequisites

Proposed solution (phased)

Phase 1 — Doctor: check-cache <session_id> (ship existing script)

Expose runContextDump() as a first-class doctor subcommand:

bunx @cortexkit/opencode-magic-context doctor check-cache ses_xxx
# Optional: --interceptor-dir ~/.local/share/.../opencode-interceptor/ses_xxx
# Optional: --json-out ./cache-report.json

Behavior:

  • Run current context-dump pipeline (OpenCode DB + context.db + transform replay).
  • Print summary table: bust count, last N busts with classification + detail (already in scripts/context-dump.ts).
  • Preflight: if provider is not Anthropic, check for interceptor dump dir; warn if missing with setup instructions.
  • Exit non-zero if bust rate exceeds threshold (optional --fail-on-bust-rate 0.5).

Acceptance:

  • Parity with bun run scripts/context-dump.ts <session_id>.
  • Documented in doctor help and README.

Phase 2 — Persist cache diagnostics in context.db

New table (name TBD, e.g. cache_step_diagnostics):

Column Description
session_id, message_id, timestamp Join keys
provider, model From message.updated
input_tokens, cache_read, cache_write, hit_ratio Usage snapshot
severity Same rules as dashboard
mc_transform_decision defer | execute
mc_bust_signals JSON: system_prompt_hash_changed, rematerialized, ctx_flush, heuristics_ran, …
classification, classification_detail From classifyBust() logic
classification_confidence high | medium | low
interceptor_dump_basename Optional link to capture file
delta_input, delta_cache_read vs previous assistant step

Populate on message.updated when hasUsageTokens=true (plugin hook), reusing classify logic from context-dump.

Acceptance:

  • Dashboard reads from DB first; log parser remains fallback for historical sessions.
  • Schema migrated via existing plugin migration path.

Phase 3 — Dashboard: bust drill-down & export

3a. Cache event detail panel

When user expands a cache step/turn:

  • Usage breakdown bar: input vs cache.read vs cache.write
  • Classification + confidence + linked MC log excerpts (timestamp-matched)
  • Delta row vs previous step: Δinput, Δcache_read, hit ratio change
  • Transform summary: defer/execute, rematerialized m[0]/m[1], tag message count delta
  • Link: "Open interceptor dump" (if basename present and file exists)

3b. Streak & model-aware views

  • Detect consecutive busts ≥ N; highlight in timeline
  • Optional filter: "show only provider/model X"
  • Baseline hint for openai-compatible: if input > 50k and hit_ratio < 10%, label as provider_low_cache_yield (informational) vs mc_likely_bust when execute+hash change in same window

3c. Export

  • Export session cache report (JSON): all DbCacheEvent + diagnostics + bust classifications
  • Export bust-only CSV for spreadsheets
  • Button on Cache tab + Cache Diagnostics page

Acceptance:

  • User can answer "what changed on this bust?" without leaving Dashboard for typical MC causes.
  • Export opens in GitHub issue template for provider bugs.

Phase 4 — Wire-level root cause (interceptor integration)

For non-Anthropic providers, correlate interceptor dumps with cache steps:

  1. Match by timestamp ± duration (meta.json durationMs) and session affinity header.
  2. Compute request-level signals:
    • Message count, total request bytes
    • System prompt hash (first 2 system blocks)
    • Presence of cache_control / providerOptions
    • Model string
  3. Diff vs previous dump: which message indices changed (length/hash), new tools, reasoning blocks added
  4. Surface in Dashboard: "Request grew by +12,400 tokens in messages [45–47] (tool results)"

Dependencies:

  • opencode-interceptor enabled and capturing session
  • Future: optional MC auth-plugin hook for OpenAI websocket providers (out of scope for v1)

Acceptance:

  • doctor check-cache accepts --interceptor-dir and includes wire diff in output.
  • Dashboard shows "wire diff unavailable" with setup link when dumps missing.

Phase 5 — Automatic root-cause engine (unify classifiers)

Merge and extend:

  • log_parser.rs detect_bust_cause()
  • run-context-dump.ts classifyBust()
  • New rules from production cases:
Cause ID Detection signal
mc_system_prompt_hash log: system prompt hash changed within Δt
mc_explicit_flush ctx-flush / explicit_flush / rematerialized=true
mc_execute_heuristics decision=execute + heuristics WILL RUN
mc_historian historian write within 5m window (existing)
mc_storage_fatal storage fatal (schema skew — annotate, not blame provider)
provider_cold_start first event or cache_read=0 after long gap
provider_low_yield high input, low cache.read, defer pass, no MC mutation
provider_routing_change interceptor model/provider header changed (if exposed)
client_no_cache_hints openai-compatible + no cache_control in dump
unknown fallback

Output: { cause_id, summary, evidence: [{source, snippet, timestamp}], confidence }.

Acceptance:

  • Same engine in doctor, plugin persistence, and dashboard.
  • Unit tests with fixtures from anonymized real sessions (GLM streak, DeepSeek stable chain).

Related: editable User / Historian notes (separate scope)

Discord also asked to edit [User] notes and [Hist] note fields in Dashboard. Recommend separate issue to avoid scope creep, but shared UX pattern: inline edit → RPC → context.db update with audit timestamp.


Non-goals (v1)

  • MC becoming a MITM proxy for all providers (maintainer constraint)
  • Fixing OpenCode/provider caching behavior (document as client_no_cache_hints finding only)
  • Real-time websocket capture without auth plugin

Acceptance criteria (overall)

  • doctor check-cache <session_id> ships with bust table + JSON export
  • Dashboard cache step shows classification, deltas, and MC evidence
  • Session cache report exportable from UI
  • Interceptor dumps linked when present; clear setup path when absent
  • GLM-style "chronic low hit ratio" distinguishable from MC flush busts
  • Docs: "Diagnosing cache busts" covering Anthropic dumps, interceptor, limitations

Implementation notes

  • Reuse buildDumpStats() / classifyBust() — avoid duplicating logic in Rust; consider shared WASM or JSON schema generated from TS.
  • Dashboard already lazy-loads cache events (cacheActivated); extend get_session_cache_events to join diagnostics table.
  • Respect PER_SESSION / TOTAL_CAP limits in CacheDiagnostics.tsx when adding fields.

References

  • packages/plugin/scripts/context-dump/ — metrics, classifyBust, runContextDump
  • packages/dashboard/src-tauri/src/log_parser.rs — severity + detect_bust_cause
  • packages/dashboard/src/components/CacheDiagnostics/CacheDiagnostics.tsx
  • opencode-interceptorsrc/intercept/dump.ts, per-session capture dirs
  • Community reproduction: GLM 5.1 / OneRouter, hit_ratio ~2–3% on tool-loop steps vs ~99% on tiny-input continuations

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions