Skip to content

feat(gastown): add real-time throughput gauges and timeseries charts#1432

Open
jrf0110 wants to merge 1 commit intomainfrom
1297-realtime-gauges
Open

feat(gastown): add real-time throughput gauges and timeseries charts#1432
jrf0110 wants to merge 1 commit intomainfrom
1297-realtime-gauges

Conversation

@jrf0110
Copy link
Copy Markdown
Contributor

@jrf0110 jrf0110 commented Mar 23, 2026

Summary

  • Add metrics collection infrastructure to TownDO: a metrics_snapshots SQLite table stores periodic snapshots (agent counts, bead counts, event deltas, token/cost/LOC data) every alarm tick with 7-day retention
  • Wire container → TownDO reporting: message.completed/assistant.completed events extract token usage and POST to /api/towns/:townId/usage; a 30s diffSummary poller (modeled after kilo's GitStatsPoller) reports LOC diffs to /api/towns/:townId/loc
  • Add live throughput gauges to the town overview: radial SVG gauges for tokens/sec, cost/sec, lines changed, active agents, plus a gas tank fuel gauge with color transitions and time-remaining estimate
  • Add zoomable timeseries charts (1h/6h/24h/7d) to the observability tab: token throughput, cost over time, agent utilization, bead velocity, and lines of code — all powered by a new getMetricsTimeseries tRPC procedure with bucketed aggregation
  • Add GET /api/gastown/balance Next.js route for gas tank balance (personal + org-scoped)

Closes #1297

Verification

  • pnpm typecheck — passes with no new errors (pre-existing errors in packages/db, src/lib/bot/run.ts, secret-ui-adapter unchanged)
  • Verified all 4 AlarmStatus type occurrences updated in router.d.ts (flat router, wrapped router, admin flat, admin wrapped)
  • Verified metrics_snapshots schema includes all 16 columns with correct SQLite types
  • Manual review of container reportTokenUsage and pollLocStats auth patterns match existing broadcastEvent pattern

Visual Changes

N/A (no screenshots available — this adds new UI sections to the gastown town overview and observability pages)

Reviewer Notes

  • The metrics_snapshots table is auto-created via initMetricsTables() in ensureInitialized(). Existing towns will get the table on next alarm tick — no migration needed.
  • Token usage reporting depends on the SDK's message.completed event properties shape. The handler normalizes multiple formats (usage.inputTokens, input_tokens, nested usage object) but the exact payload shape should be verified against the live SDK.
  • LOC stats use the SDK's worktree.diffSummary() API (same as kilo's GitStatsPoller). The poller is gated on agent.defaultBranch being set — lightweight/triage agents without git worktrees are skipped.
  • The gas tank balance route (/api/gastown/balance) uses the same getUserFromAuth pattern as /api/gastown/token — no new auth surface.
  • LOC values in snapshots are absolute (total additions/deletions vs base branch), not deltas. The timeseries uses AVG aggregation for these gauge-type values.

…d gas tank

Add metrics collection infrastructure to TownDO with periodic snapshots,
token/cost/LOC reporting from the container, and a full observability UI
with live gauges and zoomable timeseries charts.

Closes #1297

// ── Town Metrics ────────────────────────────────────────────────────────

app.post('/api/towns/:townId/usage', c =>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: These metrics ingestion routes are behind the wrong auth middleware

/api/towns/:townId/* is already wrapped in kiloAuthMiddleware + townAuthMiddleware, but the container calls /usage and /loc with a container-scoped JWT. kiloAuthMiddleware only accepts NEXTAUTH user tokens, so both POSTs will 401 and the throughput gauges/timeseries will stay empty unless these routes are moved under authMiddleware (or otherwise excluded from the user-auth middleware).

private initPromise: Promise<void> | null = null;
private _ownerUserId: string | undefined;
private usageAccumulator = metrics.createUsageAccumulator();
private lastSnapshotAt: string | null = null;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: This metrics state is lost on every Durable Object restart

usageAccumulator and lastSnapshotAt live only in memory. When the DO is evicted or redeployed between alarm ticks, the next snapshot falls back to 1970-01-01 for its delta queries, so events_since_last/beads_*_since_last jump to all-time totals, and any token/cost usage reported since the previous tick disappears. Persisting the accumulator/baseline (or deriving the baseline from the latest snapshot row) avoids those restart-induced spikes and gaps.

}

// Report aggregate LOC to TownDO
if (townId && apiUrl && authToken && (totalAdditions > 0 || totalDeletions > 0)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: The poller never clears LOC back to zero

This POST is skipped whenever totalAdditions and totalDeletions both become 0. Because setLoc() keeps the last reported snapshot until another /loc call arrives, the town can keep showing stale non-zero line counts after a branch is rebased/merged cleanly or after all agents go idle. The poller needs to report the 0/0 state as well so the gauges can reset.

@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot bot commented Mar 23, 2026

Code Review Summary

Status: 3 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 3
SUGGESTION 0
Issue Details (click to expand)

WARNING

File Line Issue
cloudflare-gastown/src/gastown.worker.ts 563 /usage and /loc are protected by kiloAuthMiddleware, so container JWT posts will 401
cloudflare-gastown/src/dos/Town.do.ts 231 Metrics accumulator and snapshot baseline live only in memory, so DO restarts create spikes/gaps
cloudflare-gastown/container/src/process-manager.ts 246 LOC polling suppresses 0/0 updates, leaving stale non-zero line counts in the gauges
Other Observations (not in diff)

None.

Files Reviewed (16 files)
  • cloudflare-gastown/container/src/process-manager.ts - 1 issue
  • cloudflare-gastown/container/src/types.ts - 0 issues
  • cloudflare-gastown/src/db/tables/metrics-snapshots.table.ts - 0 issues
  • cloudflare-gastown/src/dos/Town.do.ts - 1 issue
  • cloudflare-gastown/src/dos/town/metrics.ts - 0 issues
  • cloudflare-gastown/src/gastown.worker.ts - 1 issue
  • cloudflare-gastown/src/handlers/town-metrics.handler.ts - 0 issues
  • cloudflare-gastown/src/trpc/router.ts - 0 issues
  • cloudflare-gastown/src/trpc/schemas.ts - 0 issues
  • src/app/(app)/gastown/[townId]/TownOverviewPageClient.tsx - 0 issues
  • src/app/(app)/gastown/[townId]/observability/ObservabilityPageClient.tsx - 0 issues
  • src/app/api/gastown/balance/route.ts - 0 issues
  • src/components/gastown/TerminalBar.tsx - 0 issues
  • src/components/gastown/ThroughputGauges.tsx - 0 issues
  • src/lib/gastown/types/router.d.ts - 0 issues
  • src/lib/gastown/types/schemas.d.ts - 0 issues

Fix these issues in Kilo Cloud


Reviewed by gpt-5.4-20260305 · 1,348,890 tokens

@jeanduplessis
Copy link
Copy Markdown
Contributor

Due to the monorepo restructure you will need to recreate this PR on a new branch from main. Pass the prompt found at, https://github.com/Kilo-Org/cloud/blob/main/plans/monorepo-migration-prompt.md, to your coding agent while running in this branch. Please close this PR once done or if you don't plan to proceed with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Real-time throughput gauges and timeseries — tokens/sec, cost/sec, gas tank

2 participants