-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Issue Description
Fix WebSocket Reconnection Errors for Long-Running Sessions
Problem Analysis
The HTTP 500 error occurs when:
Long-running sessions (>45 minutes) have expired ServiceAccount tokens
User switches devices (desktop to mobile), triggering WebSocket reconnection
Backend returns HTTP 500 instead of proper 401 for expired tokens
Runner cannot recover from authentication failures
Root Causes Identified
Token Expiration: Runner ServiceAccount tokens expire after 45 minutes (runnerTokenRefreshTTL in components/operator/internal/handlers/helpers.go)
Backend Error Handling: SSAR failures return HTTP 500 instead of 401 (components/backend/handlers/middleware.go:309)
No Token Refresh: Runner doesn't refresh tokens during reconnection attempts (components/runners/runner-shell/runner_shell/core/transport_ws.py)
Implementation Plan
Phase 1: Immediate Fix - Backend Error Handling
Update components/backend/handlers/middleware.go ValidateProjectContext:
Return HTTP 401 for authentication failures instead of HTTP 500
Add specific error messages to distinguish token expiration from other SSAR failures
Log token validation failures with context
Files to Fix (optional)
components/backend/handlers/middleware.go
Fix Type
Code Formatting
Confirmation
- I understand this will create an automated PR
- The changes are low-risk and reversible
- All tests should continue to pass after fixes