Skip to content

feat(gastown): add manual container token refresh in town settings#1102

Merged
jrf0110 merged 2 commits intomainfrom
1101-token-refresh
Mar 15, 2026
Merged

feat(gastown): add manual container token refresh in town settings#1102
jrf0110 merged 2 commits intomainfrom
1101-token-refresh

Conversation

@jrf0110
Copy link
Contributor

@jrf0110 jrf0110 commented Mar 15, 2026

Summary

  • Add a "Refresh Token" button in the town settings Container section that lets operators manually force-refresh the container JWT, bypassing the 1-hour alarm throttle
  • New forceRefreshContainerToken() public RPC method on TownDO and refreshContainerToken tRPC mutation with ownership verification
  • Useful when agents hit auth failures and the alarm hasn't fired yet, or after config changes that need immediate propagation

Closes #1101

Verification

  • pnpm typecheck — all 28 workspace projects pass cleanly

Visual Changes

New "Container" section at the bottom of the town settings page with a "Refresh Token" button (spinning icon while pending, success/error toasts).

Reviewer Notes

  • The existing refreshContainerToken in Town.do is private and throttled to 1h. The new forceRefreshContainerToken bypasses the throttle but still updates lastContainerTokenRefreshAt so the next alarm-driven refresh doesn't fire unnecessarily.
  • The router.d.ts type bridge file was updated manually to include the new procedure in both the inner gastownRouter and outer wrappedGastownRouter types.

@kilo-code-bot
Copy link
Contributor

kilo-code-bot bot commented Mar 15, 2026

Code Review Summary

Status: 2 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 2
SUGGESTION 0

Fix these issues in Kilo Cloud

Issue Details (click to expand)

WARNING

File Line Issue
cloudflare-gastown/gastown-grafana-dash-1.json 1109 Removing the row cap makes the "Top Error Messages" panel unbounded and risks slow/high-cardinality queries.
cloudflare-gastown/gastown-grafana-dash-1.json 1820 Removing the row cap makes the per-user analytics panel unbounded and risks slow dashboard loads on busy towns.
Other Observations (not in diff)

N/A

Files Reviewed (5 files)
  • cloudflare-gastown/gastown-grafana-dash-1.json - 2 issues
  • cloudflare-gastown/src/dos/Town.do.ts - 0 issues
  • cloudflare-gastown/src/trpc/router.ts - 0 issues
  • src/app/(app)/gastown/[townId]/settings/TownSettingsPageClient.tsx - 0 issues
  • src/lib/gastown/types/router.d.ts - 0 issues

Reviewed by gpt-5.4-20260305 · 823,449 tokens

The $timeFilter macro already bounds the result set sufficiently.
Hardcoded LIMITs were silently truncating data in 9 panels.
"nullifySparse": false,
"query": "SELECT blob5 AS error_message, blob1 AS event, SUM(_sample_interval) AS count FROM gastown_events WHERE $timeFilter AND blob5 != '' GROUP BY error_message, event ORDER BY count DESC LIMIT 30",
"rawSql": "SELECT blob5 AS error_message, blob1 AS event, SUM(_sample_interval) AS count FROM gastown_events WHERE $timeFilter AND blob5 != '' GROUP BY error_message, event ORDER BY count DESC LIMIT 30",
"query": "SELECT blob5 AS error_message, blob1 AS event, SUM(_sample_interval) AS count FROM gastown_events WHERE $timeFilter AND blob5 != '' GROUP BY error_message, event ORDER BY count DESC",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Keep a row cap on this high-cardinality panel

Removing the LIMIT turns the "Top Error Messages" table into an unbounded GROUP BY error_message, event over the whole time window. On production data that can return thousands of rows, which is likely to slow the ClickHouse query and make the dashboard much heavier to load.

"nullifySparse": false,
"query": "SELECT blob2 AS user_id, SUM(_sample_interval) AS total_events, SUM(IF(blob5 != '', _sample_interval, 0)) AS error_count, SUM(IF(blob5 != '', _sample_interval, 0)) / SUM(_sample_interval) AS error_rate, SUM(_sample_interval * double1) / SUM(_sample_interval) AS avg_latency_ms, COUNT(DISTINCT blob6) AS town_count FROM gastown_events WHERE $timeFilter AND blob2 != '' GROUP BY user_id ORDER BY total_events DESC LIMIT 25",
"rawSql": "SELECT blob2 AS user_id, SUM(_sample_interval) AS total_events, SUM(IF(blob5 != '', _sample_interval, 0)) AS error_count, SUM(IF(blob5 != '', _sample_interval, 0)) / SUM(_sample_interval) AS error_rate, SUM(_sample_interval * double1) / SUM(_sample_interval) AS avg_latency_ms, COUNT(DISTINCT blob6) AS town_count FROM gastown_events WHERE $timeFilter AND blob2 != '' GROUP BY user_id ORDER BY total_events DESC LIMIT 25",
"query": "SELECT blob2 AS user_id, SUM(_sample_interval) AS total_events, SUM(IF(blob5 != '', _sample_interval, 0)) AS error_count, SUM(IF(blob5 != '', _sample_interval, 0)) / SUM(_sample_interval) AS error_rate, SUM(_sample_interval * double1) / SUM(_sample_interval) AS avg_latency_ms, COUNT(DISTINCT blob6) AS town_count FROM gastown_events WHERE $timeFilter AND blob2 != '' GROUP BY user_id ORDER BY total_events DESC",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Bound this per-user query before shipping

This panel now groups every distinct user_id in the selected window with no LIMIT. On busy towns that can explode to thousands of rows and materially increase both query time and Grafana render time; the previous top-N cap kept the panel useful without making it unbounded.

@jrf0110 jrf0110 merged commit 5de9b52 into main Mar 15, 2026
18 checks passed
@jrf0110 jrf0110 deleted the 1101-token-refresh branch March 15, 2026 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Manual container token refresh button in town settings

2 participants