Skip to content

[Amber] Long running sessions have ServiceAccount tokens expire #445

@bobbravo2

Description

@bobbravo2

Issue Description

Fix WebSocket Reconnection Errors for Long-Running Sessions
Problem Analysis
The HTTP 500 error occurs when:

Long-running sessions (>45 minutes) have expired ServiceAccount tokens
User switches devices (desktop to mobile), triggering WebSocket reconnection
Backend returns HTTP 500 instead of proper 401 for expired tokens
Runner cannot recover from authentication failures
Root Causes Identified
Token Expiration: Runner ServiceAccount tokens expire after 45 minutes (runnerTokenRefreshTTL in components/operator/internal/handlers/helpers.go)
Backend Error Handling: SSAR failures return HTTP 500 instead of 401 (components/backend/handlers/middleware.go:309)
No Token Refresh: Runner doesn't refresh tokens during reconnection attempts (components/runners/runner-shell/runner_shell/core/transport_ws.py)
Implementation Plan
Phase 1: Immediate Fix - Backend Error Handling
Update components/backend/handlers/middleware.go ValidateProjectContext:

Return HTTP 401 for authentication failures instead of HTTP 500
Add specific error messages to distinguish token expiration from other SSAR failures
Log token validation failures with context

Files to Fix (optional)

components/backend/handlers/middleware.go

Fix Type

Code Formatting

Confirmation

  • I understand this will create an automated PR
  • The changes are low-risk and reversible
  • All tests should continue to pass after fixes

Metadata

Metadata

Assignees

Labels

amber:auto-fixAmber agent: automated low-risk fixes (formatting, linting)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions