Skip to content

feat(gastown): add proactive idle-container stop in TownDO alarm#3113

Merged
kilo-code-bot[bot] merged 3 commits intogastown-stagingfrom
gt/toast/71ae2361
May 7, 2026
Merged

feat(gastown): add proactive idle-container stop in TownDO alarm#3113
kilo-code-bot[bot] merged 3 commits intogastown-stagingfrom
gt/toast/71ae2361

Conversation

@jrf0110
Copy link
Copy Markdown
Contributor

@jrf0110 jrf0110 commented May 7, 2026

Summary

Add proactive idle-container stop to the TownDO alarm handler. When a town has no active work and the mayor has been idle for >5 minutes, the alarm now calls container.stop() to force a graceful SIGTERM drain instead of waiting for Cloudflare's port-idle timer (which gets reset by PTY WebSocket keep-alives from dashboard terminals). This targets the root cause of 300+ active containers for ~100 active users.

  • New stopContainerIfIdle() method with dependency-injected logic in town/container-idle-stop.ts for testability
  • Wired into alarm handler just before the re-arm block
  • Emits container.idle_stop event with reason (no_active_work / mayor_idle_Xm) for observability
  • 2-minute throttle prevents thrash; failed stops are retried next tick (throttle not set on failure)
  • 13 unit tests covering all branches including edge cases (null mayor, null last_activity_at, getState failure, stop failure, throttle)

Verification

  • Created a town with a running container, confirmed the alarm fires stopContainerIfIdle() after the idle threshold
  • All 13 unit tests pass

Visual Changes

N/A

Reviewer Notes

  • The implementation delegates to a standalone sub-module (container-idle-stop.ts) with injected dependencies, making it fully testable without DO infrastructure
  • getState() is used (control-plane RPC) instead of fetch() or warmUp() to avoid waking a sleeping container
  • The ! non-null assertion on mayor.last_activity_at was removed in favor of != null guards, consistent with the project coding style

John Fawcett added 2 commits May 7, 2026 16:07
When a town has no active work and the mayor has been idle for >5min,
the alarm handler now calls container.stop() to force a graceful SIGTERM
drain instead of waiting for Cloudflare's port-idle timer (which gets
reset by PTY WebSocket keep-alives). This targets the root cause of
300+ active containers for ~100 active users.

- Add stopContainerIfIdle() with dependency-injected logic in
  town/container-idle-stop.ts for testability
- Wire into alarm handler just before the re-arm block
- Emit container.idle_stop event with reason for observability
- 2min throttle prevents thrash; failed stops are retried next tick
- 13 unit tests covering all branches
Replace mayor.last_activity_at! with null-safe checks using
mayor.last_activity_at != null guards, consistent with coding
style that forbids ! non-null assertions.
Comment thread services/gastown/src/dos/town/container-idle-stop.ts Outdated
@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented May 7, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Files Reviewed (3 files)
  • services/gastown/src/dos/Town.do.ts
  • services/gastown/src/dos/town/container-idle-stop.ts
  • services/gastown/src/dos/town/container-idle-stop.test.ts

Reviewed by gpt-5.5-2026-04-23 · 318,978 tokens

The state guard only checked for 'running', but containers can also
report 'healthy' as an active state (consistent with gastown.worker.ts).
Added 'healthy' to the guard and a corresponding test.
@kilo-code-bot kilo-code-bot Bot merged commit 62e1f13 into gastown-staging May 7, 2026
2 checks passed
@kilo-code-bot kilo-code-bot Bot deleted the gt/toast/71ae2361 branch May 7, 2026 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant