ms: storage health + query latency + maintenance ops in Settings#451
Merged
Conversation
Adds a self-contained, dependency-free observability layer over the
local SQLite metrics store, surfaced in the SPA's Settings page so
operators can see at a glance whether the store is healthy and act on
it without shelling into the box.
Backend (internal/cmgr/ms):
- stats.go: opStats{count,total,max,last} per op + Stats.Snapshot. The
shared `track` helper instruments every public method via one-line
defer; new ops register in Stats.all and the SPA picks them up.
- health.go: DBHealth aggregates file size, page/freelist counts, row
counts, last rule write, and the stats snapshot. Maintenance ops
(CleanupOlderThan, Vacuum, Truncate, ResetStats) return a unified
MaintenanceResult with duration + before/after byte counts.
- ms.go: nodeRows/ruleRows atomic.Int64 caches keep Health() off the
COUNT(*) hot path; reconciled via recountRows on startup, Vacuum,
and Truncate. INSERT OR REPLACE may overcount briefly — bounded.
- Truncate requires confirm == "yes I am sure" exactly; a defaulted
JSON field cannot wipe data.
cmgr: Cmgr interface gains DBHealth/DBCleanup/DBVacuum/DBTruncate/
DBResetStats, returning ErrMetricsDisabled when the store is not
configured.
Web (internal/web):
- 5 new routes under /api/v1/db/* (auth-gated via the existing api
group middleware). dbMaintenanceErr maps cmgr/ms domain errors onto
HTTP status uniformly.
- Settings.tsx grows Storage / Query Latency / Maintenance cards;
Truncate uses prompt() with literal-match confirm; every op funnels
through one runMaint helper for consistent loading/toast state.
Tests: round-trip, cleanup row-affected, truncate confirm strictness,
reset-stats — all green under -race.
6 tasks
Density / IA pass on /settings, plus the copy-button bug surfaced on LAN (plain-HTTP) deployments where navigator.clipboard is undefined. Layout - Drop the standalone "theme" card — broken and already covered by the sidebar toggle, no need for a duplicate. - Drop the "api surface" card — a hardcoded endpoint enumeration with no operator value; OpenAPI is the right home if we ever want it. - Fold the "reload configuration" card into the runtime-configuration card's right slot. One button + one paragraph no longer steals a whole grid cell; the reload status pill renders inline. - Group storage / query-latency / maintenance under a "database" section title; group the updates panel under "updates". Adds the vertical hierarchy that 11 sibling cards in a 2-col grid lacked. - Maintenance card switches from md:grid-cols-3 to flex-wrap so the three actions hug the start instead of floating in unequal cells; status pill moves to the card-header right slot. - Latency table drops the redundant "last" column and pins column widths via table-fixed + colgroup so the right edge no longer overflows the card on lg viewports. UpdatesPanel - Collapse "current build" + "check for updates" into a single card. Build DescList sits in the body; channel selector + Check button + nightly/stable pill move into the card-header right slot. - Drop the in-panel h2 "updates" header — the new SectionTitle in Settings owns that label, and rendering it twice was redundant. - "update progress" stays as a separate conditional card. Copy bug - New util/clipboard.ts wraps navigator.clipboard with the legacy document.execCommand fallback. Plain-HTTP origins (e.g. ehco on a LAN IP) are not secure contexts, so navigator.clipboard is undefined and the previous catch-and-ignore meant the button appeared dead. The fallback works in non-secure contexts and is the canonical pattern for this scenario. - Settings calls copyText() instead of doing its own try/catch; the helper is generic enough for any future copy affordance.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a self-contained, dependency-free observability layer over the local SQLite metrics store, embedded directly in ehco's own admin SPA. No external Prometheus / OTel collector — ehco does it all itself.
Three new cards in Settings:
rule_metricstable inline so a broken sync pipeline is visible.add_node,add_rule,query_node,query_rule,cleanup,vacuum,truncate), since process start. Reset button.Design notes
track(*opStats) func()helper is the only instrumentation shape — every public ms method ends with onedefer track(&ms.stats.X)()line.atomic.Int64-cached. AvoidsSELECT COUNT(*)on every Settings refresh; reconciled byrecountRows()after Vacuum/Truncate and at startup. INSERT OR REPLACE can briefly overcount on duplicate PKs — bounded, doc'd, self-healing.confirm == "yes I am sure"exactly (string, not bool — a defaulted JSON field can never wipe data)./api/v1echo group middleware — confirm strings are a second line of defence, not the first.API surface
When the underlying store is disabled (no upstream sync URL), every endpoint returns 503 via
cmgr.ErrMetricsDisabled.Test plan
go test -race ./internal/cmgr/ms/...— round-trip, cleanup, truncate strictness, reset (all PASS)make lintcleanmake testfull suite greenmake uibuilds (Vite OK, +~3KB gzipped)rule_metrics=0warning shows on the boxes that surfaced in the original investigationadd_nodep~max stays sub-ms on real trafficWhy
Direct outcome of investigating PR #443's "SQLite 撑不住" claim. Probing a real node showed: 2.5MB db, all queries <1ms, but
rule_metricsempty (separate sync-pipeline bug). Decision: stay on SQLite long-term; build observability so future "撑不住" judgements are data-driven, and surface the empty-table case loudly so the underlying bug doesn't hide.🤖 Generated with Claude Code