Skip to content

[Gastown] No proactive GitHub token refresh — installation tokens (ghs_) expire every hour, stranding agent work on 401 #4393

Description

@kilo-code-bot

What happened?

GitHub App installation tokens (ghs_ prefix, used as GIT_TOKEN for git push/fetch/PR) expire every hour, but Gastown has no proactive refresh mechanism for them — only reactive refresh on 401 failure.

Root cause (confirmed by reading container source):

  1. git-manager.ts:212 has refreshGitToken(rigId) which calls POST /api/towns/{townId}/rigs/{rigId}/refresh-git-token to mint a fresh installation token. This works (I tested it manually — got a fresh token).
  2. execWithAuthRetry() (git-manager.ts:345) calls refreshGitToken reactively — ONLY when a container-side git operation fails with 401/403. It then retries with the fresh token. This works for container operations (clone/fetch/merge).
  3. BUT the code comment at git-manager.ts:210 says: "periodic refresh timer in process-manager (if added)" — the proactive timer was never implemented.
  4. token-refresh.ts only handles the CONTAINER JWT (GASTOWN_CONTAINER_TOKEN), with a refreshTokenIfNearExpiry() call at boot. It does NOT handle GIT_TOKEN.
  5. When agents are idle for >1 hour, the ghs_ token silently expires. The agent's own git push (running in a subprocess via credential-store files in /tmp) hits 401 with no auto-retry — the work is stranded until a manual refresh.

Impact: This has caused 3+ stranded-work incidents in our town (b1d2b62b). Story 1.1 and Story 1.3 both completed locally but couldn't push due to 401. Each required manual intervention. The v4 convoy (16 beads) mass-failed because all agents hit 401 simultaneously.

Fix needed: Add a periodic timer in process-manager.ts that calls refreshGitToken(rigId) for every active rig every ~50 minutes (before the 1-hour expiry), and rewrites the credential-store files (which refreshGitToken already does). The infrastructure exists — only the timer is missing.

Additionally: the user created a 90-day PAT via github.com, but Gastown uses the GitHub App integration token (ghs_, 1hr) instead. Consider allowing the town config's git_auth.github_token to accept a static PAT (ghp_/github_pat_) and skip the installation-token generation when a long-lived PAT is provided.

Town: b1d2b62b-b236-48c0-8558-3b32057470be. Rigs: e29ba178-b878-499f-9b8f-a3d3f8bdcc88 (ForgetfulDrinkerMKII).

Area

Container / Git

Context

  • Town ID: b1d2b62b-b236-48c0-8558-3b32057470be
  • Agent: Mayor (d0f9f7a6-3442-4cd2-bc2f-e6acf48d536a)
  • Rig ID: e29ba178-b878-499f-9b8f-a3d3f8bdcc88

Recent Errors

ghs_ GitHub App installation token returns HTTP 401 Bad credentials after ~1 hour. refreshGitToken() exists but is only called reactively on 401 failure (execWithAuthRetry). No proactive periodic timer implemented.

Filed automatically by the Mayor via gt_report_bug.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions