Correctly update currentStrategy after reconnectDeadlineMillis has reached#1668
Conversation
PR checklist ✅All required conditions are satisfied:
🎉 Great job! This PR is ready for review. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThe Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
SDK Size Comparison 📏
|
11e4306
into
fix/cleanup-old-rtc-session-on-migration
* refactor(core): add retryable signaling decorator, retry policies, and tracer improvements
Replace SignalLostSignalingServiceDecorator with RetryableSignalingServiceDecorator
that uses configurable retry policies (exponential backoff) and separates terminal
SFU errors from transient network failures. Refactor SignalingServiceTracerDecorator
to use a generic traced() helper for consistent request/response/exception tracing.
Clean up Publisher/Subscriber error handling and fix sourceSets configuration.
Made-with: Cursor
* fix(core): phased migration to prevent old-session race condition
Replace immediate teardown of the old RtcSession during migration with a
phased approach: enterMigration() keeps media flowing while listening for
ParticipantMigrationCompleteEvent from the old SFU; finalizeMigration()
tears down only after the handoff is confirmed (or timed out at 7s).
Refactor prepareRejoin(reason) to send final stats, leave gracefully, and
cleanup via the unified cleanup() path. Move mediaScope.cancel() from
leaveWithReason() into cleanup() so it is always invoked regardless of
the teardown path (rejoin, migrate, or explicit leave).
Made-with: Cursor
* refactor(core): add unified reconnect loop with strategy escalation
Consolidate fastReconnect, rejoin, and migrate into a single
Call.reconnect(strategy, reason) entry point with a retry loop that
mirrors the JS SDK:
- FAST reconnect is attempted up to MAX_FAST_RECONNECT_ATTEMPTS (3),
then escalates to REJOIN.
- MIGRATE failures also escalate to REJOIN.
- A global reconnectMutex replaces the old schedule()/SingleFlight
coalescing to provide mutual exclusion across all strategies.
- MAX_RECONNECT_ATTEMPTS (10) and leaveAfterDisconnectSeconds act
as circuit breakers; exhaustion sets ReconnectingFailed state.
- Public fastReconnect(), rejoin(), migrate() wrappers are preserved
for backward compatibility.
Made-with: Cursor
* refactor(core): simplify RtcSession reconnect delegation
Thin out RtcSession's stateJob to forward all SFU socket-state
transitions directly to Call.reconnect() — the unified retry loop now
owns escalation logic (FAST -> REJOIN, MIGRATE -> REJOIN).
- Remove sfuConnectionRetryCount / MAX_SFU_CONNECTION_RETRIES; retry
counting is handled by the reconnect loop in Call.
- Replace onSignalingLost callback with onSfuApiError (maps SFU error
codes to strategies) and onSfuNetworkFailure (always FAST).
- Wire SfuConnectionModule to use RetryableSignalingServiceDecorator
and the new dual-callback interface.
- Delete the now-unused SignalLostSignalingServiceDecorator.
Made-with: Cursor
* test(core): update tests for unified reconnect architecture
- SfuConnectionRetryTest: replace per-strategy and retry-counter tests
with forwarding tests that verify stateJob delegates each strategy
to Call.reconnect().
- ReconnectAttemptsCountTest: test FAST (no increment), REJOIN
(increments), and accumulated attempts through the unified loop.
- FailedSfuIdsTest: use addFailedSfuId directly instead of calling
migrate() which now requires full session setup.
- JoinCallTest: skip network-dependent latency test.
Made-with: Cursor
* fix(core): align reconnect guard with JS SDK — allow migrate while connected
The pre-loop guard in Call.reconnect() blocked MIGRATE (and all other
strategies) when the call was in Connected state. This prevented both
server-initiated migration (GoAway/error event with MIGRATE strategy)
and debug-triggered migration from executing.
Align with JS SDK: only skip reconnect when already RECONNECTING,
MIGRATING, or RECONNECTING_FAILED — exactly matching the JS guard.
Remove the Connected and Disconnected checks entirely.
Made-with: Cursor
* fix(core): prevent reconnect race and clean up migration/fast-reconnect
- Set Reconnecting/Migrating state before acquiring reconnectMutex so
concurrent callers (stateJob, NetworkStateListener) see it and skip
- Launch call.reconnect() from stateJob in a separate coroutine so it
survives stateJob cancellation during prepareReconnect()
- Simplify fastReconnect: connect synchronously, wait for Connected
state, then restore session — removes serialProcessor indirection
- Remove ParticipantMigrationComplete await in migration flow to avoid
unnecessary synchronization latency
- Make prepareReconnect() explicitly disconnect the old SFU socket
before reconnecting to prevent stale-socket state machine errors
- Promote socketListenerJob to class field in SfuSocket and clean up
old WebSocket on reconnect to prevent leaked connections
- Move pre-reconnect stats collection before prepareReconnect() so
stats are sent while the connection is still alive
Made-with: Cursor
* fix(core): reliable network transition handling without dropping calls
Prevent false disconnect signals during network switches (e.g. cellular→WiFi)
by checking actual connectivity in NetworkStateProvider.onLost instead of
unconditionally marking the network as down. Add defense-in-depth guards
throughout the reconnect pipeline:
- Call.kt: leave timer checks connection state before executing; reconnect
loop uses tryLock to avoid queuing redundant attempts
- RtcSession: centralize cleanup in cancelActiveWork(); add network-aware
guards on SFU error/state callbacks; handle DisconnectedPermanently
with escalation to rejoin; fast reconnect throws on stale peer connections
instead of calling rejoin directly
- HealthMonitor: skip reconnect attempts when network is unavailable
- SfuSocketStateService: NetworkDisconnected stays parked on socket errors
to avoid futile retry loops; handles NetworkAvailable for recovery
Made-with: Cursor
* api dump and spotless changes
* Fix - catch (Exception) swallows CancellationException in reconnect loop
* Fix - session.value!! force-unwrap TOCTOU race in reconnectRejoin/reconnectMigrate
* spotless apply
* Typo fixed
* Fix: Added catch (e: CancellationException) { throw e } before the generic catch. This avoids running sendCallStats() and logging a misleading trace when the coroutine was cancelled
* Adding back the sfuReconnectTimeoutMillis so that there's no breaking change with a @deprecated annotation
* Fixed nitpicks by coderabbit
* import fixed
* Renamed the local variable to loopStartTime
* Renamed the local variable to loopIteration
* Add comments
* All precondition guards in reconnectFast, reconnectRejoin, reconnectMigrate now throw ReconnectPreconditionException instead of IllegalStateException.
Catch block now has three layers in order:
1. CancellationException → re-throw (coroutine cancellation)
2. ReconnectPreconditionException → log + set ReconnectingFailed + break (terminal, no retry)
3. Exception → retry with escalation (transient failures)
* refactor(core): replace exception-driven reconnect with sealed ReconnectOutcome
Replace ReconnectPreconditionException and try-catch control flow in the
reconnect loop with a sealed class ReconnectOutcome (Success,
PreconditionNotMet, PeerConnectionStale, Failed). Each reconnect method
now returns an outcome instead of throwing, and the loop dispatches via
an exhaustive `when` — the compiler enforces that every case is handled.
Made-with: Cursor
* refactor(core): introduce SfuConnectionResult and unify SFU connect paths
Add SfuConnectionResult sealed class in RtcSession so fastReconnect and
connectAndAwait return typed outcomes instead of throwing. The public
connect() now delegates to connectAndAwait, eliminating duplicated
request-building, tracing, and socket-await logic. fastReconnect is
reduced to its unique pre/post work (peer-connection health check,
subscription restore). Call.kt maps SfuConnectionResult → ReconnectOutcome
without try-catch.
Made-with: Cursor
* refactor(core): fix sendLeaveEvent ordering, deprecate connect, use internalConnect in _join
- Rename leaveWithReason to sendLeaveEvent and make it a suspend fun
that awaits the send instead of fire-and-forget via launch. This
fixes the race where cleanup() cancelled the supervisor job before
the leave message was sent.
- Move sendLeaveEvent call in Call.internalLeave inside the
clientImpl.scope.launch block so it completes before cleanup().
- Reorder prepareRejoin: sendLeaveEvent before cancelActiveWork/cleanup.
- Deprecate RtcSession.connect() in favor of internalConnect() which
returns SfuConnectionResult instead of throwing.
- Update Call._join to use internalConnect directly, replacing the
try-catch block with an exhaustive when on SfuConnectionResult.
- Split SfuConnectionResult into two sealed classes: SfuConnectionResult
(Connected/Failed) for internalConnect, FastReconnectResult
(Connected/PeerConnectionStale/Failed) for fastReconnect.
- Remove redundant try-catch blocks around joinRequest and collectStats
in reconnectFast/reconnectRejoin/reconnectMigrate.
- Replace withTimeout with withTimeoutOrNull in internalConnect to
correctly distinguish timeouts from external cancellation.
- Remove unused CancellationException import from Call.kt.
Made-with: Cursor
* fix(core): add ICE restart after fast reconnect and gate retries on network availability
After a successful fast reconnect, the publisher and subscriber ICE
connections may be stale because the underlying network path changed
(e.g. WiFi ↔ cellular). This adds explicit ICE restarts for both
directions so fresh candidates are gathered and media resumes.
Also introduces a network availability check in the Call.reconnect()
loop to avoid burning limited FAST attempt budgets when the network
is down. Skipped attempts are not counted, preserving the full retry
budget for when connectivity returns.
Key changes:
- RtcSession.restartIceAfterFastReconnect() restarts publisher and
subscriber ICE after a successful fast reconnect
- RtcSession.fastReconnect() now uses isClosed() instead of
isFailedOrClosed() so FAILED peer connections (recoverable via ICE
restart) are not prematurely escalated to REJOIN
- StreamPeerConnection.isClosed() distinguishes truly CLOSED (needs
REJOIN) from FAILED (recoverable)
- Call.reconnect() skips FAST attempts when network is unavailable
- Call.collectStats() wrapped in runCatching to avoid crashes during
reconnection
- Companion object constants documented with KDoc
- Tests added/updated for all new behavior
Made-with: Cursor
* Api dump
* refactor(core): unify SFU reconnection under Call.reconnect() and fix reconnect loop bugs
Strip self-reconnection logic from SfuSocket so it becomes a passive
state reporter. All SFU socket state changes (WebSocketEventLost,
DisconnectedTemporarily, NetworkDisconnected) now route through
RtcSession.stateJob into Call.reconnect(), eliminating the race
condition that caused double-joins and SCHEDULED_CLEANUP peer drops.
Key changes:
- Remove SfuSocket's networkStateListener and self-reconnect paths
- Route WebSocketEventLost from RtcSession.stateJob to Call.reconnect()
- Guard prepareReconnect() against disconnecting an already-disconnected socket
- Fix strategy downgrade bug: REJOIN no longer falls back to FAST
- Increment reconnectAttempts only for REJOIN/MIGRATE (not FAST/UNSPECIFIED)
- Leave the call when ReconnectingFailed is reached (prevent zombie calls)
- Remove redundant inner-loop network guard (entry-point checks suffice)
- Deprecate RestartConnection, NetworkAvailable, and RestartReason for
binary compatibility
- Add high-level reconnection logic documentation
Made-with: Cursor
* test(core): fix ReconnectEscalationTest assertions after leave-on-failure change
Update expected terminal state from ReconnectingFailed to Disconnected
since reconnect() now calls leave() when all recovery attempts are
exhausted. Also adjust FAST retry count from 4 to 3 to match the
corrected loopIteration increment ordering.
Made-with: Cursor
* test(core): fix test isolation and add internalConnect tests
- Add Call.use {} helper to OrphanedTracksTest and ReconnectSessionIdTest
to ensure call.cleanup() cancels background coroutines before runTest
exits, preventing OOM from infinite stats-reporting loop
- Add StreamVideo.removeClient() in @after for FastReconnectIceRestartTest,
ReconnectEscalationTest, SfuConnectionRetryTest, and RtcSessionTest2 to
prevent singleton leakage across test classes
- Use spyk + coJustRun on sendCallStats to fix NPE from serializing mock
objects in RtcSessionTest2
- Add two new internalConnect tests: timeout returns Failed, and
ReconnectDetails are forwarded in the JoinRequest
Made-with: Cursor
* refactor(core): unify SFU error handling and add ICE health monitoring
Replace the fragmented wrapAPICall / safeCallWithResult helpers with a
single sfuCall wrapper that properly rethrows CancellationException.
Merge onTerminalError and onNetworkFailure into one onSessionError
callback so only session-fatal SFU errors (SIGNAL_LOST,
PARTICIPANT_NOT_FOUND, etc.) trigger reconnection — regular API
failures are returned to callers without side-effects.
Add ICE connection state monitoring in RtcSession so the UI surfaces
RealtimeConnection.Reconnecting when publisher or subscriber ICE
degrades, and restores Connected when both recover.
Made-with: Cursor
* fix(core): defer ICE monitoring start until SFU socket is connected
Starting the ICE monitoring job during RtcSession construction caused
it to collect from a mock Subscriber's iceState in tests, blocking the
StandardTestDispatcher's event loop and preventing the stateJob from
processing socket state changes.
Move startIceMonitoring() to the SfuSocketState.Connected handler so
it only runs once the real SFU connection is established. Add an
idempotency guard to avoid duplicate monitoring jobs on reconnect.
Fixes 9 failing tests in SfuConnectionRetryTest.
Made-with: Cursor
* refactor(core): address PR review — rename APIs, unify DISCONNECT flow, improve docs
- Rename `internalConnect` → `connectInternal` for better IDE discoverability
- Rename `sfuCall` → `safeApiCall` to reflect its generic nature
- Add `ReconnectOutcome.Disconnect` to unify the DISCONNECT strategy
handling into the single `when(outcome)` decision flow, removing the
early-return short-circuit and the misleading `error("Handled above")`
- Improve KDoc on `SfuRetryableException` explaining its role as a
retry-signal mechanism for `StreamRetryProcessor`
- Remove default `{ true }` from `HealthMonitor.isNetworkAvailable`,
requiring callers to pass it explicitly
- Add `isNetworkAvailable` to `CoordinatorSocket` health monitor
- Enhance `NetworkStateProvider` debug logs with network/capabilities info
Made-with: Cursor
* - Donot call reconnect on getting DisconnectedPermanently
- revert onLost behaviour
- remove condition (loopIteration >= MAX_FAST_RECONNECT_ATTEMPTS) while escalating to REJOIN
- Removed unwanted logging
* fix(core): centralize network check in reconnect loop, fail-fast on socket disconnect
- Move network availability check to the top of the reconnect while-loop
so no other logic runs when offline; polls without consuming attempt budget
- Remove redundant network guard from DisconnectedTemporarily handler
- connectInternal now observes Disconnected states immediately instead of
waiting for the full 10s timeout, enabling faster retry cycles
Made-with: Cursor
* fix: Correctly update currentStrategy after reconnectDeadlineMillis has reached (#1668)
* - move the logic of incrementing nonFastReconnectAttempts inside the when loop to improve re
- Keep early exit checks in the starting of the while loop
* spotlessApply
* Fix unit test cases
* Make isClosed internal
* FIx unit test
* add a 5s delay before asserting recording label reappearance
---------
Co-authored-by: Rahul Kumar Lohra <tgunix@gmail.com>
Co-authored-by: Aleksandar Apostolov <apostolov.alexandar@gmail.com>
Goal
The
call.reconnect(WebsocketReconnectStrategy, reason)logic continuously monitors network availability. However, once connectivity is restored, it fails to re-evaluatereconnectDeadlineMillis, which can result in selecting an incorrect reconnection strategy (e.g., initial-passed strategy like fastReconnect instead of fastRejoin).Implementation
Update the reconnection flow to recompute the effective strategy after network restoration, ensuring
reconnectDeadlineMillisis respected before proceeding with reconnection incall.reconnect(...)🎨 UI Changes
None
Testing
Summary by CodeRabbit