Correctly update currentStrategy after reconnectDeadlineMillis has reached by rahul-lohra · Pull Request #1668 · GetStream/stream-video-android

rahul-lohra · 2026-04-30T23:31:09Z

Goal

The call.reconnect(WebsocketReconnectStrategy, reason) logic continuously monitors network availability. However, once connectivity is restored, it fails to re-evaluate reconnectDeadlineMillis, which can result in selecting an incorrect reconnection strategy (e.g., initial-passed strategy like fastReconnect instead of fastRejoin).

Implementation

Update the reconnection flow to recompute the effective strategy after network restoration, ensuring reconnectDeadlineMillis is respected before proceeding with reconnection in call.reconnect(...)

🎨 UI Changes

None

Testing

Join a call with two participants
Disable internet for < 10 seconds → expect fastReconnect
Disable internet for ~15 seconds → expect fastRejoin
Verify both participants reconnect successfully and media streams are restored

Summary by CodeRabbit

Bug Fixes
- Improved call reconnection handling by adapting reconnection strategy based on elapsed time, resulting in better management of timeout scenarios during network interruptions.

…as reached

github-actions · 2026-04-30T23:31:21Z

PR checklist ✅

All required conditions are satisfied:

Title length is OK (or ignored by label).
At least one pr: label exists.
Sections ### Goal, ### Implementation, and ### Testing are filled.

🎉 Great job! This PR is ready for review.

coderabbitai · 2026-04-30T23:37:27Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ac5db85e-990c-40ed-9454-baaf9e9e45f4

📥 Commits

Reviewing files that changed from the base of the PR and between a12bdd9 and ccc5a53.

📒 Files selected for processing (1)

stream-video-android-core/src/main/kotlin/io/getstream/video/android/core/Call.kt

Walkthrough

The reconnect function in the Call class now dynamically escalates reconnection strategies based on elapsed time. It maintains a mutable currentStrategy that can be promoted from FAST or UNSPECIFIED to REJOIN when approaching the reconnectDeadlineMillis threshold, with added debug logging to track strategy transitions.

Changes

Cohort / File(s)	Summary
Reconnection Strategy Escalation `stream-video-android-core/src/main/kotlin/io/getstream/video/android/core/Call.kt`	Enhanced `reconnect` function to maintain mutable `currentStrategy` across loop iterations, escalating from `FAST`/`UNSPECIFIED` to `REJOIN` based on elapsed time vs deadline, with debug logging reporting original and escalated strategies.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hoppity hop, the reconnect's grown wise,
Watching the clock as the deadline draws nigh,
FAST becomes REJOIN when time starts to slip,
Strategy escalates on this reconnection trip! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change—updating currentStrategy based on reconnectDeadlineMillis—which is the core fix in the changeset.
Description check	✅ Passed	The description includes Goal and Implementation sections with clear context, and Testing section with specific scenarios; however, UI Changes section is marked 'None' and the contributor/reviewer checklists are empty.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/correctly-update-currentstrategy

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 60 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-30T23:38:13Z

SDK Size Comparison 📏

SDK	Before	After	Difference	Status
stream-video-android-core	12.02 MB	12.04 MB	0.02 MB	🟢
stream-video-android-ui-xml	5.68 MB	5.68 MB	0.00 MB	🟢
stream-video-android-ui-compose	6.28 MB	6.28 MB	0.00 MB	🟢

@deprecated

* refactor(core): add retryable signaling decorator, retry policies, and tracer improvements Replace SignalLostSignalingServiceDecorator with RetryableSignalingServiceDecorator that uses configurable retry policies (exponential backoff) and separates terminal SFU errors from transient network failures. Refactor SignalingServiceTracerDecorator to use a generic traced() helper for consistent request/response/exception tracing. Clean up Publisher/Subscriber error handling and fix sourceSets configuration. Made-with: Cursor * fix(core): phased migration to prevent old-session race condition Replace immediate teardown of the old RtcSession during migration with a phased approach: enterMigration() keeps media flowing while listening for ParticipantMigrationCompleteEvent from the old SFU; finalizeMigration() tears down only after the handoff is confirmed (or timed out at 7s). Refactor prepareRejoin(reason) to send final stats, leave gracefully, and cleanup via the unified cleanup() path. Move mediaScope.cancel() from leaveWithReason() into cleanup() so it is always invoked regardless of the teardown path (rejoin, migrate, or explicit leave). Made-with: Cursor * refactor(core): add unified reconnect loop with strategy escalation Consolidate fastReconnect, rejoin, and migrate into a single Call.reconnect(strategy, reason) entry point with a retry loop that mirrors the JS SDK: - FAST reconnect is attempted up to MAX_FAST_RECONNECT_ATTEMPTS (3), then escalates to REJOIN. - MIGRATE failures also escalate to REJOIN. - A global reconnectMutex replaces the old schedule()/SingleFlight coalescing to provide mutual exclusion across all strategies. - MAX_RECONNECT_ATTEMPTS (10) and leaveAfterDisconnectSeconds act as circuit breakers; exhaustion sets ReconnectingFailed state. - Public fastReconnect(), rejoin(), migrate() wrappers are preserved for backward compatibility. Made-with: Cursor * refactor(core): simplify RtcSession reconnect delegation Thin out RtcSession's stateJob to forward all SFU socket-state transitions directly to Call.reconnect() — the unified retry loop now owns escalation logic (FAST -> REJOIN, MIGRATE -> REJOIN). - Remove sfuConnectionRetryCount / MAX_SFU_CONNECTION_RETRIES; retry counting is handled by the reconnect loop in Call. - Replace onSignalingLost callback with onSfuApiError (maps SFU error codes to strategies) and onSfuNetworkFailure (always FAST). - Wire SfuConnectionModule to use RetryableSignalingServiceDecorator and the new dual-callback interface. - Delete the now-unused SignalLostSignalingServiceDecorator. Made-with: Cursor * test(core): update tests for unified reconnect architecture - SfuConnectionRetryTest: replace per-strategy and retry-counter tests with forwarding tests that verify stateJob delegates each strategy to Call.reconnect(). - ReconnectAttemptsCountTest: test FAST (no increment), REJOIN (increments), and accumulated attempts through the unified loop. - FailedSfuIdsTest: use addFailedSfuId directly instead of calling migrate() which now requires full session setup. - JoinCallTest: skip network-dependent latency test. Made-with: Cursor * fix(core): align reconnect guard with JS SDK — allow migrate while connected The pre-loop guard in Call.reconnect() blocked MIGRATE (and all other strategies) when the call was in Connected state. This prevented both server-initiated migration (GoAway/error event with MIGRATE strategy) and debug-triggered migration from executing. Align with JS SDK: only skip reconnect when already RECONNECTING, MIGRATING, or RECONNECTING_FAILED — exactly matching the JS guard. Remove the Connected and Disconnected checks entirely. Made-with: Cursor * fix(core): prevent reconnect race and clean up migration/fast-reconnect - Set Reconnecting/Migrating state before acquiring reconnectMutex so concurrent callers (stateJob, NetworkStateListener) see it and skip - Launch call.reconnect() from stateJob in a separate coroutine so it survives stateJob cancellation during prepareReconnect() - Simplify fastReconnect: connect synchronously, wait for Connected state, then restore session — removes serialProcessor indirection - Remove ParticipantMigrationComplete await in migration flow to avoid unnecessary synchronization latency - Make prepareReconnect() explicitly disconnect the old SFU socket before reconnecting to prevent stale-socket state machine errors - Promote socketListenerJob to class field in SfuSocket and clean up old WebSocket on reconnect to prevent leaked connections - Move pre-reconnect stats collection before prepareReconnect() so stats are sent while the connection is still alive Made-with: Cursor * fix(core): reliable network transition handling without dropping calls Prevent false disconnect signals during network switches (e.g. cellular→WiFi) by checking actual connectivity in NetworkStateProvider.onLost instead of unconditionally marking the network as down. Add defense-in-depth guards throughout the reconnect pipeline: - Call.kt: leave timer checks connection state before executing; reconnect loop uses tryLock to avoid queuing redundant attempts - RtcSession: centralize cleanup in cancelActiveWork(); add network-aware guards on SFU error/state callbacks; handle DisconnectedPermanently with escalation to rejoin; fast reconnect throws on stale peer connections instead of calling rejoin directly - HealthMonitor: skip reconnect attempts when network is unavailable - SfuSocketStateService: NetworkDisconnected stays parked on socket errors to avoid futile retry loops; handles NetworkAvailable for recovery Made-with: Cursor * api dump and spotless changes * Fix - catch (Exception) swallows CancellationException in reconnect loop * Fix - session.value!! force-unwrap TOCTOU race in reconnectRejoin/reconnectMigrate * spotless apply * Typo fixed * Fix: Added catch (e: CancellationException) { throw e } before the generic catch. This avoids running sendCallStats() and logging a misleading trace when the coroutine was cancelled * Adding back the sfuReconnectTimeoutMillis so that there's no breaking change with a @deprecated annotation * Fixed nitpicks by coderabbit * import fixed * Renamed the local variable to loopStartTime * Renamed the local variable to loopIteration * Add comments * All precondition guards in reconnectFast, reconnectRejoin, reconnectMigrate now throw ReconnectPreconditionException instead of IllegalStateException. Catch block now has three layers in order: 1. CancellationException → re-throw (coroutine cancellation) 2. ReconnectPreconditionException → log + set ReconnectingFailed + break (terminal, no retry) 3. Exception → retry with escalation (transient failures) * refactor(core): replace exception-driven reconnect with sealed ReconnectOutcome Replace ReconnectPreconditionException and try-catch control flow in the reconnect loop with a sealed class ReconnectOutcome (Success, PreconditionNotMet, PeerConnectionStale, Failed). Each reconnect method now returns an outcome instead of throwing, and the loop dispatches via an exhaustive `when` — the compiler enforces that every case is handled. Made-with: Cursor * refactor(core): introduce SfuConnectionResult and unify SFU connect paths Add SfuConnectionResult sealed class in RtcSession so fastReconnect and connectAndAwait return typed outcomes instead of throwing. The public connect() now delegates to connectAndAwait, eliminating duplicated request-building, tracing, and socket-await logic. fastReconnect is reduced to its unique pre/post work (peer-connection health check, subscription restore). Call.kt maps SfuConnectionResult → ReconnectOutcome without try-catch. Made-with: Cursor * refactor(core): fix sendLeaveEvent ordering, deprecate connect, use internalConnect in _join - Rename leaveWithReason to sendLeaveEvent and make it a suspend fun that awaits the send instead of fire-and-forget via launch. This fixes the race where cleanup() cancelled the supervisor job before the leave message was sent. - Move sendLeaveEvent call in Call.internalLeave inside the clientImpl.scope.launch block so it completes before cleanup(). - Reorder prepareRejoin: sendLeaveEvent before cancelActiveWork/cleanup. - Deprecate RtcSession.connect() in favor of internalConnect() which returns SfuConnectionResult instead of throwing. - Update Call._join to use internalConnect directly, replacing the try-catch block with an exhaustive when on SfuConnectionResult. - Split SfuConnectionResult into two sealed classes: SfuConnectionResult (Connected/Failed) for internalConnect, FastReconnectResult (Connected/PeerConnectionStale/Failed) for fastReconnect. - Remove redundant try-catch blocks around joinRequest and collectStats in reconnectFast/reconnectRejoin/reconnectMigrate. - Replace withTimeout with withTimeoutOrNull in internalConnect to correctly distinguish timeouts from external cancellation. - Remove unused CancellationException import from Call.kt. Made-with: Cursor * fix(core): add ICE restart after fast reconnect and gate retries on network availability After a successful fast reconnect, the publisher and subscriber ICE connections may be stale because the underlying network path changed (e.g. WiFi ↔ cellular). This adds explicit ICE restarts for both directions so fresh candidates are gathered and media resumes. Also introduces a network availability check in the Call.reconnect() loop to avoid burning limited FAST attempt budgets when the network is down. Skipped attempts are not counted, preserving the full retry budget for when connectivity returns. Key changes: - RtcSession.restartIceAfterFastReconnect() restarts publisher and subscriber ICE after a successful fast reconnect - RtcSession.fastReconnect() now uses isClosed() instead of isFailedOrClosed() so FAILED peer connections (recoverable via ICE restart) are not prematurely escalated to REJOIN - StreamPeerConnection.isClosed() distinguishes truly CLOSED (needs REJOIN) from FAILED (recoverable) - Call.reconnect() skips FAST attempts when network is unavailable - Call.collectStats() wrapped in runCatching to avoid crashes during reconnection - Companion object constants documented with KDoc - Tests added/updated for all new behavior Made-with: Cursor * Api dump * refactor(core): unify SFU reconnection under Call.reconnect() and fix reconnect loop bugs Strip self-reconnection logic from SfuSocket so it becomes a passive state reporter. All SFU socket state changes (WebSocketEventLost, DisconnectedTemporarily, NetworkDisconnected) now route through RtcSession.stateJob into Call.reconnect(), eliminating the race condition that caused double-joins and SCHEDULED_CLEANUP peer drops. Key changes: - Remove SfuSocket's networkStateListener and self-reconnect paths - Route WebSocketEventLost from RtcSession.stateJob to Call.reconnect() - Guard prepareReconnect() against disconnecting an already-disconnected socket - Fix strategy downgrade bug: REJOIN no longer falls back to FAST - Increment reconnectAttempts only for REJOIN/MIGRATE (not FAST/UNSPECIFIED) - Leave the call when ReconnectingFailed is reached (prevent zombie calls) - Remove redundant inner-loop network guard (entry-point checks suffice) - Deprecate RestartConnection, NetworkAvailable, and RestartReason for binary compatibility - Add high-level reconnection logic documentation Made-with: Cursor * test(core): fix ReconnectEscalationTest assertions after leave-on-failure change Update expected terminal state from ReconnectingFailed to Disconnected since reconnect() now calls leave() when all recovery attempts are exhausted. Also adjust FAST retry count from 4 to 3 to match the corrected loopIteration increment ordering. Made-with: Cursor * test(core): fix test isolation and add internalConnect tests - Add Call.use {} helper to OrphanedTracksTest and ReconnectSessionIdTest to ensure call.cleanup() cancels background coroutines before runTest exits, preventing OOM from infinite stats-reporting loop - Add StreamVideo.removeClient() in @after for FastReconnectIceRestartTest, ReconnectEscalationTest, SfuConnectionRetryTest, and RtcSessionTest2 to prevent singleton leakage across test classes - Use spyk + coJustRun on sendCallStats to fix NPE from serializing mock objects in RtcSessionTest2 - Add two new internalConnect tests: timeout returns Failed, and ReconnectDetails are forwarded in the JoinRequest Made-with: Cursor * refactor(core): unify SFU error handling and add ICE health monitoring Replace the fragmented wrapAPICall / safeCallWithResult helpers with a single sfuCall wrapper that properly rethrows CancellationException. Merge onTerminalError and onNetworkFailure into one onSessionError callback so only session-fatal SFU errors (SIGNAL_LOST, PARTICIPANT_NOT_FOUND, etc.) trigger reconnection — regular API failures are returned to callers without side-effects. Add ICE connection state monitoring in RtcSession so the UI surfaces RealtimeConnection.Reconnecting when publisher or subscriber ICE degrades, and restores Connected when both recover. Made-with: Cursor * fix(core): defer ICE monitoring start until SFU socket is connected Starting the ICE monitoring job during RtcSession construction caused it to collect from a mock Subscriber's iceState in tests, blocking the StandardTestDispatcher's event loop and preventing the stateJob from processing socket state changes. Move startIceMonitoring() to the SfuSocketState.Connected handler so it only runs once the real SFU connection is established. Add an idempotency guard to avoid duplicate monitoring jobs on reconnect. Fixes 9 failing tests in SfuConnectionRetryTest. Made-with: Cursor * refactor(core): address PR review — rename APIs, unify DISCONNECT flow, improve docs - Rename `internalConnect` → `connectInternal` for better IDE discoverability - Rename `sfuCall` → `safeApiCall` to reflect its generic nature - Add `ReconnectOutcome.Disconnect` to unify the DISCONNECT strategy handling into the single `when(outcome)` decision flow, removing the early-return short-circuit and the misleading `error("Handled above")` - Improve KDoc on `SfuRetryableException` explaining its role as a retry-signal mechanism for `StreamRetryProcessor` - Remove default `{ true }` from `HealthMonitor.isNetworkAvailable`, requiring callers to pass it explicitly - Add `isNetworkAvailable` to `CoordinatorSocket` health monitor - Enhance `NetworkStateProvider` debug logs with network/capabilities info Made-with: Cursor * - Donot call reconnect on getting DisconnectedPermanently - revert onLost behaviour - remove condition (loopIteration >= MAX_FAST_RECONNECT_ATTEMPTS) while escalating to REJOIN - Removed unwanted logging * fix(core): centralize network check in reconnect loop, fail-fast on socket disconnect - Move network availability check to the top of the reconnect while-loop so no other logic runs when offline; polls without consuming attempt budget - Remove redundant network guard from DisconnectedTemporarily handler - connectInternal now observes Disconnected states immediately instead of waiting for the full 10s timeout, enabling faster retry cycles Made-with: Cursor * fix: Correctly update currentStrategy after reconnectDeadlineMillis has reached (#1668) * - move the logic of incrementing nonFastReconnectAttempts inside the when loop to improve re - Keep early exit checks in the starting of the while loop * spotlessApply * Fix unit test cases * Make isClosed internal * FIx unit test * add a 5s delay before asserting recording label reappearance --------- Co-authored-by: Rahul Kumar Lohra <tgunix@gmail.com> Co-authored-by: Aleksandar Apostolov <apostolov.alexandar@gmail.com>

fix: Correctly update currentStrategy after reconnectDeadlineMillis h…

ccc5a53

…as reached

rahul-lohra self-assigned this Apr 30, 2026

rahul-lohra requested a review from a team as a code owner April 30, 2026 23:31

rahul-lohra added the pr:internal Internal or infra-only changes label Apr 30, 2026

PratimMallick approved these changes May 1, 2026

View reviewed changes

PratimMallick merged commit 11e4306 into fix/cleanup-old-rtc-session-on-migration May 1, 2026
10 of 13 checks passed

PratimMallick deleted the fix/correctly-update-currentstrategy branch May 1, 2026 06:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctly update currentStrategy after reconnectDeadlineMillis has reached#1668

Correctly update currentStrategy after reconnectDeadlineMillis has reached#1668
PratimMallick merged 1 commit intofix/cleanup-old-rtc-session-on-migrationfrom
fix/correctly-update-currentstrategy

rahul-lohra commented Apr 30, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

github-actions Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 30, 2026

Review ran into problems

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rahul-lohra commented Apr 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Goal

Implementation

🎨 UI Changes

Testing

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR checklist ✅

Uh oh!

coderabbitai Bot commented Apr 30, 2026

Walkthrough

Changes

Estimated code review effort

Poem

Review ran into problems

Uh oh!

github-actions Bot commented Apr 30, 2026

SDK Size Comparison 📏

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rahul-lohra commented Apr 30, 2026 •

edited by coderabbitai Bot

Loading

github-actions Bot commented Apr 30, 2026 •

edited

Loading