Skip to content

[ECO-5646] Fix behaviour when internet connection is not available#1320

Merged
sacOO7 merged 2 commits intomainfrom
fix/primary-fallback-on-reconnection
Nov 24, 2025
Merged

[ECO-5646] Fix behaviour when internet connection is not available#1320
sacOO7 merged 2 commits intomainfrom
fix/primary-fallback-on-reconnection

Conversation

@sacOO7
Copy link
Collaborator

@sacOO7 sacOO7 commented Nov 18, 2025

Fixed #1319

Summary by CodeRabbit

  • Bug Fixes

    • Client now consistently prefers the default realtime host and more explicitly handles instant-retry and connection-key clearing, improving reconnection reliability.
  • Tests

    • Added tests verifying default-host-first behavior after non-retryable errors and network-down scenarios.
  • Chores

    • Added a test helper to simulate non-retryable disconnects and made the retryable-disconnect helper optionally wait for the Disconnected state.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions github-actions bot temporarily deployed to staging/pull/1320/features November 18, 2025 13:27 Inactive
- Updated code to use defaultRealtimeHost in connecting state when timed out
@sacOO7 sacOO7 force-pushed the fix/primary-fallback-on-reconnection branch from 4c40fd1 to 957977c Compare November 19, 2025 13:13
@coderabbitai
Copy link

coderabbitai bot commented Nov 19, 2025

Walkthrough

Host selection in the realtime workflow now defaults connectingHost to the default realtime host and only switches to a fallback for non-timeout triggers; disconnect handling replaces a (retry, clearKey) tuple with an explicit clear-key action plus a single CheckInstantRetryFlag(); two tests and a test helper were added/updated.

Changes

Cohort / File(s) Change Summary
Realtime workflow refactor
src/IO.Ably.Shared/Realtime/Workflows/RealtimeWorkflow.cs
Initialize connectingHost with default realtime host; override to fallback only when the command is triggered by a non-timeout message; replace GetDisconnectFlags() (tuple) with CheckInstantRetryFlag() returning a single bool; explicitly clear the connection key when requested; update SetDisconnectedStateCommand path to use the single-flag logic.
Connection fallback tests
src/IO.Ably.Tests.Shared/Realtime/ConnectionSpecs/ConnectionFallbackSpecs.cs
Added tests WhenNonRetryableError_ShouldAlwaysTryDefaultHostFirst and WhenInternetConnectionIsDown_ShouldAlwaysTryDefaultHostFirst; shorten TTLs and assert all rest/transport attempts use the default realtime host; added using IO.Ably.Tests.Infrastructure.
Test helper extension
src/IO.Ably.Tests.Shared/Realtime/ConnectionSpecs/AblyRealtimeTestExtensions.cs
Updated DisconnectWithRetryableError(this AblyRealtime client, bool waitForDisconnectedState = true) to accept an optional waitForDisconnectedState flag; added DisconnectWithNonRetryableError(this AblyRealtime client, bool waitForDisconnectedState = true) which publishes a Disconnected protocol message with a non-retryable Forbidden error and conditionally awaits the Disconnected state.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Workflow as RealtimeWorkflow
    participant Transport

    rect rgba(200,230,255,0.4)
    Note over Workflow: SetConnectingStateCommand
    Workflow->>Workflow: connectingHost = default realtime host
    alt Triggered by non-timeout message
        Workflow->>Workflow: connectingHost = fallback host
    else Triggered by timeout
        Workflow->>Workflow: keep default host
    end
    Workflow->>Transport: connect(connectingHost)
    end

    rect rgba(230,240,220,0.4)
    Note over Workflow: SetDisconnectedStateCommand
    alt cmd.ClearConnectionKey == true
        Workflow->>Workflow: Clear connection key explicitly
    end
    Workflow->>Workflow: retryInstantly = CheckInstantRetryFlag()
    alt retryInstantly && CanConnectToAbly()
        Workflow->>Workflow: Trigger SetConnectingStateCommand
    else
        Workflow->>Workflow: Remain Disconnected / schedule retry
    end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review host-selection branches in RealtimeWorkflow.cs for correctness (timeout vs non-timeout triggers).
  • Validate CheckInstantRetryFlag() logic and its interactions with CanConnectToAbly() and retry scheduling.
  • Confirm explicit clear-key handling in SetDisconnectedStateCommand.
  • Verify the two new tests and the updated test helper (AblyRealtimeTestExtensions.cs) correctly simulate scenarios and assertions.

Poem

🐇
I hop to default first, paws light and spry,
No needless detours across the sky.
Flags trimmed like carrots, tidy and neat,
Tests snugly placed to keep the path sweet.
A happy hop for realtime feet!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly describes the main change: fixing behavior when internet connection is not available, which aligns with the primary objective and changes in the codebase.
Linked Issues check ✅ Passed The changes implement the core requirement: redesigning host selection logic to avoid fallback attempts when internet connectivity check fails, and ensuring default host is tried first when internet is down [#1319, ECO-5646].
Out of Scope Changes check ✅ Passed All changes are directly related to fixing fallback behavior: workflow logic refactoring, test extensions for simulating connectivity failures, and new test cases validating the fix align with the stated objectives.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/primary-fallback-on-reconnection

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9ef34d3 and 85fbc68.

📒 Files selected for processing (2)
  • src/IO.Ably.Tests.Shared/Realtime/ConnectionSpecs/AblyRealtimeTestExtensions.cs (1 hunks)
  • src/IO.Ably.Tests.Shared/Realtime/ConnectionSpecs/ConnectionFallbackSpecs.cs (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/IO.Ably.Tests.Shared/Realtime/ConnectionSpecs/ConnectionFallbackSpecs.cs
🧰 Additional context used
🧬 Code graph analysis (1)
src/IO.Ably.Tests.Shared/Realtime/ConnectionSpecs/AblyRealtimeTestExtensions.cs (1)
src/IO.Ably.Shared/Types/ProtocolMessage.cs (4)
  • ProtocolMessage (18-289)
  • ProtocolMessage (101-105)
  • ProtocolMessage (107-111)
  • ProtocolMessage (113-117)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: check (net7.0)
  • GitHub Check: check (net8.0)
  • GitHub Check: check (net9.0)
  • GitHub Check: check (net6.0)
  • GitHub Check: check (net6.0)
  • GitHub Check: check (net7.0)
  • GitHub Check: check (net9.0)
  • GitHub Check: check (net8.0)
  • GitHub Check: check (net9.0)
  • GitHub Check: check
  • GitHub Check: check (net6.0)
  • GitHub Check: check
  • GitHub Check: check (net7.0)
  • GitHub Check: check (net8.0)
🔇 Additional comments (2)
src/IO.Ably.Tests.Shared/Realtime/ConnectionSpecs/AblyRealtimeTestExtensions.cs (2)

23-34: LGTM! Good addition of flexibility while maintaining safety.

The optional waitForDisconnectedState parameter with a safe default value (true) provides test flexibility while preserving backward compatibility. The conditional await pattern prevents race conditions when state synchronization is required.


36-47: Previous review concern properly addressed.

The method is now async and conditionally awaits the Disconnected state, eliminating the race condition flagged in the previous review. The implementation mirrors DisconnectWithRetryableError for consistency, and the use of HttpStatusCode.Forbidden (403) correctly represents a non-retryable error.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot temporarily deployed to staging/pull/1320/features November 19, 2025 13:14 Inactive
@sacOO7 sacOO7 marked this pull request as ready for review November 20, 2025 12:40
@github-actions github-actions bot temporarily deployed to staging/pull/1320/features November 20, 2025 12:40 Inactive
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/IO.Ably.Tests.Shared/Realtime/ConnectionSpecs/ConnectionFallbackSpecs.cs (1)

346-385: Consider reducing the ConditionalAwaiter timeout and adding explanatory comments.

The test logic correctly validates that the client always tries the default host first after non-recoverable disconnects. However, there are a few concerns:

  1. Timeout value: The ConditionalAwaiter timeout of 120 seconds (line 374) is very conservative—6× the ConnectionStateTtl of 20 seconds. The actual wait should be around 20-25 seconds based on the retry configuration. Consider reducing this to 30-40 seconds to fail faster if something goes wrong, improving test suite performance.

  2. Test complexity: The test uses an event handler to create a disconnect injection loop, relying on ConnectionStateTtl expiry to break the cycle. This timing-sensitive approach could be brittle. Consider adding a comment explaining the test strategy:

    // Strategy: Inject non-recoverable disconnects on each Disconnected->Connecting transition
    // to create a retry loop. The loop continues until ConnectionStateTtl expires (20s),
    // forcing transition to Suspended. Verify that all connection attempts use the default host.
  3. Comment precision: Line 357's comment states "limited disconnected retries upto 20 seconds" but doesn't clarify that this is achieved by reducing ConnectionStateTtl, not by counting retries.

Apply this diff to improve clarity:

-            // Reduced connectionStateTTL for limited disconnected retries upto 20 seconds
+            // Reduce ConnectionStateTtl to 20 seconds to force Suspended state after retries
             client.State.Connection.ConnectionStateTtl = TimeSpan.FromSeconds(20);

Consider this diff to reduce the timeout:

-            await new ConditionalAwaiter(() => client.Connection.State == ConnectionState.Suspended, null, 120);
+            await new ConditionalAwaiter(() => client.Connection.State == ConnectionState.Suspended, null, 40);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 89ab4cd and 0c15dc3.

📒 Files selected for processing (2)
  • src/IO.Ably.Shared/Realtime/Workflows/RealtimeWorkflow.cs (3 hunks)
  • src/IO.Ably.Tests.Shared/Realtime/ConnectionSpecs/ConnectionFallbackSpecs.cs (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/IO.Ably.Tests.Shared/Realtime/ConnectionSpecs/ConnectionFallbackSpecs.cs (3)
src/IO.Ably.Shared/Realtime/Workflows/RealtimeWorkflow.cs (7)
  • Task (133-192)
  • Task (197-252)
  • Task (254-279)
  • Task (287-560)
  • Task (686-698)
  • Task (700-915)
  • Task (1042-1070)
src/IO.Ably.Tests.Shared/Realtime/ConnectionSpecs/AblyRealtimeTestExtensions.cs (3)
  • Task (23-31)
  • Task (33-44)
  • FakeProtocolMessageReceived (12-15)
src/IO.Ably.Tests.Shared/Infrastructure/AblyRealtimeSpecs.cs (6)
  • AblyRealtime (66-72)
  • AblyRealtime (74-85)
  • AblyRealtime (87-95)
  • AblyRealtime (99-105)
  • AblyRealtime (107-113)
  • AblyRealtime (134-141)
src/IO.Ably.Shared/Realtime/Workflows/RealtimeWorkflow.cs (3)
src/IO.Ably.Shared/ClientOptions.cs (1)
  • FullRealtimeHost (175-197)
src/IO.Ably.Shared/Realtime/AttemptsHelpers.cs (2)
  • AttemptsHelpers (8-79)
  • GetHost (32-78)
src/IO.Ably.Shared/Types/ErrorInfo.cs (2)
  • IsRetryableStatusCode (207-210)
  • IsRetryableStatusCode (246-249)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: check (net8.0)
  • GitHub Check: check (net6.0)
  • GitHub Check: check (net9.0)
  • GitHub Check: check (net7.0)
  • GitHub Check: check
  • GitHub Check: check (net7.0)
  • GitHub Check: check (net6.0)
  • GitHub Check: check (net7.0)
  • GitHub Check: check (net9.0)
  • GitHub Check: check (net8.0)
  • GitHub Check: check (net6.0)
  • GitHub Check: check
🔇 Additional comments (4)
src/IO.Ably.Tests.Shared/Realtime/ConnectionSpecs/ConnectionFallbackSpecs.cs (2)

10-10: LGTM!

The new using directive is required for the ConditionalAwaiter used in the test below.


387-393: LGTM!

The helper method correctly sends a non-recoverable disconnect message (HTTP 403 Forbidden). The extraction improves test readability and follows established patterns in the codebase.

src/IO.Ably.Shared/Realtime/Workflows/RealtimeWorkflow.cs (2)

729-738: Host selection logic correctly prioritizes default host on timeout reconnects.

The implementation aligns with the PR objectives by ensuring that timeout-triggered reconnection attempts always use the default host first, preventing premature fallback host cycling when internet connectivity is lost.

The timeout detection at line 735 uses the established codebase pattern for command trigger identification. String-based trigger messages via .TriggeredBy("StateName.OnTimeOut()") are the standard convention throughout the workflow, consistently applied by all state timeout handlers (DisconnectedState, SuspendedState, ClosingState). The Contains("OnTimeOut()") check reliably distinguishes timeout-triggered commands because only timeout handlers use this pattern, and no alternative mechanism (enum, boolean flag, or constant) exists in the command structure.


800-850: Refactored retry logic correctly integrates internet connectivity check.

The changes successfully implement the PR objectives:

  1. Lines 800-803: Explicit connection key clearing is more readable than the previous tuple-based approach.

  2. Lines 837-850: CheckInstantRetryFlag() properly gates instant retry behind internet connectivity verification:

    • Preserves explicit retry requests via cmd.RetryInstantly
    • Calls CanConnectToAbly() for retryable errors/exceptions (the "internet-up" check)
    • Returns false otherwise, preventing premature fallback attempts

Verification confirms CanConnectToAbly() is production-ready:

  • Uses a 4-second timeout (reasonable, non-blocking)
  • Queries a dedicated internet check endpoint (internet-up.ably-realtime.com) rather than the main API, correctly distinguishing network unavailability from Ably service issues
  • Handles exceptions safely, logging appropriately for debugging
  • Well-tested with existing test coverage

This ensures the SDK only attempts fallbacks when internet connectivity is confirmed, preventing connections to distant fallback regions when the network is actually down.

@sacOO7 sacOO7 force-pushed the fix/primary-fallback-on-reconnection branch from 0c15dc3 to 9ef34d3 Compare November 20, 2025 13:16
@sacOO7 sacOO7 requested a review from ttypic November 20, 2025 13:17
@github-actions github-actions bot temporarily deployed to staging/pull/1320/features November 20, 2025 13:17 Inactive
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
src/IO.Ably.Tests.Shared/Realtime/ConnectionSpecs/ConnectionFallbackSpecs.cs (1)

373-373: Clarify timeout units for better readability.

The ConditionalAwaiter timeout parameter 120 appears on lines 373 and 423. The units (milliseconds vs. seconds) are unclear from the call site alone. Consider adding a comment or using a named constant to improve test readability.

Example:

-await new ConditionalAwaiter(() => client.Connection.State == ConnectionState.Suspended, null, 120);
+const int timeoutSeconds = 120;
+await new ConditionalAwaiter(() => client.Connection.State == ConnectionState.Suspended, null, timeoutSeconds);

Also applies to: 423-423

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Jira integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0c15dc3 and 9ef34d3.

📒 Files selected for processing (2)
  • src/IO.Ably.Tests.Shared/Realtime/ConnectionSpecs/AblyRealtimeTestExtensions.cs (1 hunks)
  • src/IO.Ably.Tests.Shared/Realtime/ConnectionSpecs/ConnectionFallbackSpecs.cs (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/IO.Ably.Tests.Shared/Realtime/ConnectionSpecs/AblyRealtimeTestExtensions.cs (1)
src/IO.Ably.Shared/Types/ProtocolMessage.cs (4)
  • ProtocolMessage (18-289)
  • ProtocolMessage (101-105)
  • ProtocolMessage (107-111)
  • ProtocolMessage (113-117)
src/IO.Ably.Tests.Shared/Realtime/ConnectionSpecs/ConnectionFallbackSpecs.cs (1)
src/IO.Ably.Tests.Shared/Realtime/ConnectionSpecs/AblyRealtimeTestExtensions.cs (3)
  • Task (23-31)
  • Task (41-52)
  • DisconnectWithNonRetryableError (33-39)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: check
  • GitHub Check: check (net9.0)
  • GitHub Check: check (net8.0)
  • GitHub Check: check (net6.0)
  • GitHub Check: check (net7.0)
  • GitHub Check: check (net9.0)
  • GitHub Check: check (net7.0)
  • GitHub Check: check (net6.0)
  • GitHub Check: check (net7.0)
  • GitHub Check: check (net8.0)
  • GitHub Check: check (net6.0)
  • GitHub Check: check (net8.0)
  • GitHub Check: check (net9.0)
  • GitHub Check: check

Copy link
Contributor

@ttypic ttypic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sacOO7 sacOO7 merged commit 30f785e into main Nov 24, 2025
16 checks passed
@sacOO7 sacOO7 deleted the fix/primary-fallback-on-reconnection branch November 24, 2025 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Check behaviour of SDK when internet connectivity lost

2 participants