fix: replace fixed 15s WebSocket reconnect with exponential backoff + jitter#5617
fix: replace fixed 15s WebSocket reconnect with exponential backoff + jitter#5617
Conversation
Greptile SummaryThis PR replaces the fixed 15-second
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant CP as CaptureProvider
participant T as Timer (single-shot)
participant WS as WebSocket
Note over CP: WebSocket disconnects
CP->>CP: onClosed() / onError()
CP->>CP: _startKeepAliveServices()
CP->>CP: _getReconnectDelay()<br/>(2^attempt capped at 120s, with jitter)
CP->>CP: _reconnectAttempt++
CP->>T: Schedule Timer(delay)
T-->>CP: Timer fires
CP->>WS: _initiateWebsocket()
alt Socket connects successfully
WS-->>CP: onConnected()
CP->>CP: _reconnectAttempt = 0
else Socket creation fails (null)
CP->>CP: _startKeepAliveServices()<br/>(reschedule with increased backoff)
else Socket connects then drops
WS-->>CP: onClosed() / onError()
CP->>CP: _startKeepAliveServices()<br/>(reschedule with increased backoff)
end
Last reviewed commit: a67c19b |
| } | ||
|
|
||
| Duration _getReconnectDelay() { | ||
| final baseDelay = min(pow(2, _reconnectAttempt).toInt(), _maxBackoffSeconds); |
There was a problem hiding this comment.
toInt() before min() will crash on high attempt counts
.toInt() is called on pow(2, _reconnectAttempt) before min clamps the value. Since pow returns a double, once _reconnectAttempt reaches 1024, pow(2, 1024) returns double.infinity, and double.infinity.toInt() throws an UnsupportedError. With the backoff cap at 120s and full jitter, attempt 1024 can be reached in ~34 hours of persistent disconnection.
Move .toInt() after the min call so the cap is applied first:
| final baseDelay = min(pow(2, _reconnectAttempt).toInt(), _maxBackoffSeconds); | |
| final baseDelay = min(pow(2, _reconnectAttempt), _maxBackoffSeconds).toInt(); |
Implements exponential backoff (1s->2s->4s...->120s cap) with full jitter for WebSocket reconnection, preventing synchronized reconnect storms. Fixes #5527
a67c19b to
8951ee0
Compare
|
@kodjima33 The switch from periodic to one-shot timer means early returns (user not signed in, device not ready) silently abandon the retry chain with no follow-up timer scheduled, which can cause a permanent WebSocket disconnection during active recording until manual intervention. Could you revert this and add tests covering the early-return paths to ensure the retry always re-schedules on transient failures before re-merging? Let us know if you need a hand with testing. by AI for @beastoin |
Summary
Replaces the fixed 15-second WebSocket reconnection timer with exponential backoff and full jitter to prevent synchronized reconnect storms.
Changes
_reconnectAttemptcounter and_maxBackoffSeconds = 120_getReconnectDelay()— exponential backoff (1s→2s→4s→...→120s cap) with full jitterTimer.periodic(Duration(seconds: 15))with single-shotTimerusing computed delay_reconnectAttempt = 0on successful connection inonConnected()Note: The diff includes some auto-formatting changes alongside the substantive logic changes.
Fixes #5527