feat(gastown): add real-time throughput gauges and timeseries charts#1432
feat(gastown): add real-time throughput gauges and timeseries charts#1432
Conversation
…d gas tank Add metrics collection infrastructure to TownDO with periodic snapshots, token/cost/LOC reporting from the container, and a full observability UI with live gauges and zoomable timeseries charts. Closes #1297
|
|
||
| // ── Town Metrics ──────────────────────────────────────────────────────── | ||
|
|
||
| app.post('/api/towns/:townId/usage', c => |
There was a problem hiding this comment.
WARNING: These metrics ingestion routes are behind the wrong auth middleware
/api/towns/:townId/* is already wrapped in kiloAuthMiddleware + townAuthMiddleware, but the container calls /usage and /loc with a container-scoped JWT. kiloAuthMiddleware only accepts NEXTAUTH user tokens, so both POSTs will 401 and the throughput gauges/timeseries will stay empty unless these routes are moved under authMiddleware (or otherwise excluded from the user-auth middleware).
| private initPromise: Promise<void> | null = null; | ||
| private _ownerUserId: string | undefined; | ||
| private usageAccumulator = metrics.createUsageAccumulator(); | ||
| private lastSnapshotAt: string | null = null; |
There was a problem hiding this comment.
WARNING: This metrics state is lost on every Durable Object restart
usageAccumulator and lastSnapshotAt live only in memory. When the DO is evicted or redeployed between alarm ticks, the next snapshot falls back to 1970-01-01 for its delta queries, so events_since_last/beads_*_since_last jump to all-time totals, and any token/cost usage reported since the previous tick disappears. Persisting the accumulator/baseline (or deriving the baseline from the latest snapshot row) avoids those restart-induced spikes and gaps.
| } | ||
|
|
||
| // Report aggregate LOC to TownDO | ||
| if (townId && apiUrl && authToken && (totalAdditions > 0 || totalDeletions > 0)) { |
There was a problem hiding this comment.
WARNING: The poller never clears LOC back to zero
This POST is skipped whenever totalAdditions and totalDeletions both become 0. Because setLoc() keeps the last reported snapshot until another /loc call arrives, the town can keep showing stale non-zero line counts after a branch is rebased/merged cleanly or after all agents go idle. The poller needs to report the 0/0 state as well so the gauges can reset.
Code Review SummaryStatus: 3 Issues Found | Recommendation: Address before merge Overview
Issue Details (click to expand)WARNING
Other Observations (not in diff)None. Files Reviewed (16 files)
Fix these issues in Kilo Cloud Reviewed by gpt-5.4-20260305 · 1,348,890 tokens |
|
Due to the monorepo restructure you will need to recreate this PR on a new branch from main. Pass the prompt found at, https://github.com/Kilo-Org/cloud/blob/main/plans/monorepo-migration-prompt.md, to your coding agent while running in this branch. Please close this PR once done or if you don't plan to proceed with it. |
Summary
metrics_snapshotsSQLite table stores periodic snapshots (agent counts, bead counts, event deltas, token/cost/LOC data) every alarm tick with 7-day retentionmessage.completed/assistant.completedevents extract token usage and POST to/api/towns/:townId/usage; a 30sdiffSummarypoller (modeled after kilo'sGitStatsPoller) reports LOC diffs to/api/towns/:townId/locgetMetricsTimeseriestRPC procedure with bucketed aggregationGET /api/gastown/balanceNext.js route for gas tank balance (personal + org-scoped)Closes #1297
Verification
pnpm typecheck— passes with no new errors (pre-existing errors inpackages/db,src/lib/bot/run.ts,secret-ui-adapterunchanged)AlarmStatustype occurrences updated inrouter.d.ts(flat router, wrapped router, admin flat, admin wrapped)metrics_snapshotsschema includes all 16 columns with correct SQLite typesreportTokenUsageandpollLocStatsauth patterns match existingbroadcastEventpatternVisual Changes
N/A (no screenshots available — this adds new UI sections to the gastown town overview and observability pages)
Reviewer Notes
metrics_snapshotstable is auto-created viainitMetricsTables()inensureInitialized(). Existing towns will get the table on next alarm tick — no migration needed.message.completedevent properties shape. The handler normalizes multiple formats (usage.inputTokens,input_tokens, nestedusageobject) but the exact payload shape should be verified against the live SDK.worktree.diffSummary()API (same as kilo'sGitStatsPoller). The poller is gated onagent.defaultBranchbeing set — lightweight/triage agents without git worktrees are skipped./api/gastown/balance) uses the samegetUserFromAuthpattern as/api/gastown/token— no new auth surface.