Skip to content

fix(llc, core): coalesce queryChannels on flaky-network reconnects#2652

Merged
xsahil03x merged 7 commits into
masterfrom
fix/recover-state-on-reconnect
May 18, 2026
Merged

fix(llc, core): coalesce queryChannels on flaky-network reconnects#2652
xsahil03x merged 7 commits into
masterfrom
fix/recover-state-on-reconnect

Conversation

@xsahil03x
Copy link
Copy Markdown
Member

@xsahil03x xsahil03x commented May 18, 2026

Linear: FLU-493

Description of the pull request

Customer report (Abwaab): a live session with ~260 students hit the per-user QueryChannels rate limit. Telemetry showed bursts of 13+ QC requests in 258 ms from a single iOS device on unstable cellular. Three independent SDK behaviors were compounding into the storm:

  1. Two QCs fired per connectionRecovered (client.dart direct call + each StreamChannelListController.refresh()).
  2. connectivity_plus emits multiple events per real network transition; each one forced a fresh openConnection (bypassing the WS internal backoff), each successful reconnect fanned out into more QCs.
  3. The in-flight cache that was supposed to dedup concurrent identical queries had a TOCTOU race — the cache write happened after an offline-await, so N siblings in the same tick all missed the cache and each fired their own HTTP request.

This PR addresses all three at the right layer.

Commits

feat(client): add recoverStateOnReconnect option… — cherry-pick of a5dfcd3b5 from PR #2636 (originally merged into v10.0.0, then reverted). Adds a recoverStateOnReconnect flag on StreamChatClient (default true for raw-LLC consumers); StreamChatCore sets it to false because list controllers handle their own recovery. Also drops backgroundKeepAlive from 1 min → 15 s. Kills the client-level redundant QC on reconnect.

fix(core): debounce connectivity events… — adds a 3 s debounceTime on the connectivity_plus stream in _ChatLifecycleManager. Long enough to absorb a typical cellular handover (≤2 s), short enough to stay below the user-perceptual "chat feels slow" threshold on a real reconnect. Coalesces rapid flap patterns into one trailing-edge reconnect.

test(core): wait for background timer… — small test follow-up to the cherry-pick: the no-handler disconnect path now goes through the keep-alive timer, so the test must run under runAsync with a short keep-alive and await the timer. Matches the pattern of the existing timer-expiry test (and matches commit 308581d24 from PR #2636).

fix(llc): coalesce concurrent queryChannels via InFlightCache — fixes the TOCTOU race. Extracts the dedup pattern into a generic InFlightCache<K, V> utility (mirroring the Android SDK's DistinctChatApi / DistinctCall design). Cache slot is reserved synchronously after the hash check, so concurrent callers find the in-flight future before the first await yields control. Concurrent callers share both success AND failure outcomes — the fall-through-on-error retry path is removed because it would defeat the dedup precisely when it matters most (during a 429 rate-limit storm).

Impact

For the customer's burst pattern (13 QCs / 258 ms screenshot):

  • Per-reconnect QCs: 13 → 1 (client-level call gated, controller refresh deduped, cache race fixed)
  • Per-minute under sustained flap: ~130-260 QCs → ~12 QCs (3+ s coalescing window, single refresh per actual reconnect)

Test plan

  • dart test packages/stream_chat/test/src/client/client_test.dart — 161 tests pass (incl. 4 new regression tests for concurrent dedup, sequential refresh, different-filter no-share, concurrent error sharing)
  • dart test packages/stream_chat/test/src/core/util/in_flight_cache_test.dart — 6 unit tests pass (coalescing, slot lifecycle, error sharing, retry-after-failure, sync-throw safety)
  • flutter test packages/stream_chat_flutter_core/test/stream_chat_core_test.dart — 15 tests pass (incl. 2 new regression tests for debounce coalescing and post-debounce-window behavior)
  • Manually verify on a flaky cellular network that reconnect doesn't trigger a QC storm

Screenshots / Videos

n/a — no UI changes.

Summary by CodeRabbit

  • New Features

    • Add an option to control whether the client re-queries active channels after a reconnect.
    • Coalesce concurrent identical online channel queries so multiple callers share one request.
  • Behavior & Reliability

    • Online channel queries now time out after 30s and surface a warning on timeout.
    • Connectivity events are debounced to avoid rapid reconnect cycles.
  • Performance & Battery

    • Default background keep-alive reduced from 60s to 15s for better battery usage.
  • Tests

    • Added regression tests covering query coalescing and connectivity debounce behavior.

Review Change Stack

xsahil03x added 4 commits May 18, 2026 02:31
connectivity_plus emits multiple events per real network transition
(cell handovers, brief drops, type changes). Each emit was calling
maybeReconnect() which bypasses the WS internal backoff via
closeConnection() + openConnection(), and each successful reconnect
fires connectionRecovered — fanning out into queryChannels refreshes
from every list controller.

Debounce the connectivity stream by 3 s so a flapping window collapses
into a single reconnect at the trailing edge. Long enough to absorb a
typical cellular handover (≤2 s), short enough to stay below the
user-perceptual "chat feels slow" threshold on a real reconnect.

Adds two regression tests and updates the three existing connectivity
tests to pump past the debounce window instead of relying on
pumpAndSettle (which doesn't reliably wait for Timer-based work).
After the recoverStateOnReconnect commit, the no-handler disconnect
path also goes through the backgroundKeepAlive timer rather than
closing immediately. Run the test under runAsync with a short
keep-alive and await the timer before verifying closeConnection,
matching the existing timer-expiry test pattern.
The cache that was supposed to dedup in-flight queryChannels had a
TOCTOU race: the cache write happened after an offline-await, so N
sibling callers in the same event-loop tick all missed the cache and
each fired its own HTTP request. On reconnect, a burst of message.new
events for unknown cids could fan out into 13+ concurrent queryChannels
calls — observed as the rate-limit storm in production telemetry.

Extract the dedup pattern into InFlightCache<K, V> — a generic
map-backed coalescer that reserves the slot synchronously after the
hash check, so siblings find the in-flight future before the first
await yields control. Mirrors the DistinctChatApi / DistinctCall design
already used across the Android SDK.

Concurrent callers share both success and failure outcomes; the
fall-through-on-error retry path is removed because it would defeat the
dedup precisely when it matters most (during a 429 rate-limit storm).
Sequential callers arriving after the future settles still see an
empty cache and start fresh via the whenComplete cleanup.

Adds unit tests for InFlightCache covering coalescing, slot lifecycle,
error sharing, retry-after-failure, and synchronous-throw safety. Adds
four regression tests on client.queryChannels covering the same
behaviors at the SDK integration level.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

📝 Walkthrough

Walkthrough

This PR adds an InFlightCache to coalesce concurrent identical channel queries, exposes client-level recoverStateOnReconnect control, debounces connectivity events in StreamChatCore (3s) and disables client recovery when core manages reconnection, and reduces background keep-alive defaults to 15s; tests added/updated for these behaviors.

Changes

Request Coalescing and Connectivity Recovery Flow

Layer / File(s) Summary
In-flight cache deduplication utility
packages/stream_chat/lib/src/core/util/in_flight_cache.dart, packages/stream_chat/test/src/core/util/in_flight_cache_test.dart
InFlightCache<K, V> shares in-flight futures across concurrent callers requesting the same key, clears the slot after settlement, and is validated with tests covering coalescing, per-key isolation, error sharing, and re-invocation after failures.
StreamChatClient recovery control and queryChannels coalescing
packages/stream_chat/lib/src/client/client.dart, packages/stream_chat/test/src/client/client_test.dart, packages/stream_chat/CHANGELOG.md
StreamChatClient adds recoverStateOnReconnect constructor parameter and setter to control client-side recovery on reconnect; replaces manual per-hash future caching with InFlightCache to coalesce concurrent identical queryChannels calls while emitting offline results per caller; tests verify coalescing into a single HTTP request, sequential call independence, parameter-based cache isolation, and shared error outcomes; changelog updated.
StreamChatCore connectivity debouncing and lifecycle changes
packages/stream_chat_flutter_core/lib/src/stream_chat_core.dart, packages/stream_chat_flutter_core/test/stream_chat_core_test.dart, packages/stream_chat_flutter_core/CHANGELOG.md
StreamChatCore imports rxdart, applies skip(1) + debounceTime(3s) to connectivity events, forwards debounced events to the lifecycle manager, sets client.recoverStateOnReconnect = false on init/client swap, refactors background handling to cancel timers/subscriptions first, and updates tests for deterministic debounce and background timing; changelog updated.
Background keep-alive timeout adjustment
packages/stream_chat_flutter/lib/src/stream_chat.dart, packages/stream_chat_flutter_core/lib/src/stream_chat_core.dart, packages/stream_chat_flutter_core/test/stream_chat_core_test.dart
Default backgroundKeepAlive duration reduced from 1 minute to 15 seconds in StreamChat, StreamChatCore, _ChatLifecycleManager, and test helper, uniformly shortening the app-background disconnection timeout.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • Brazol

Poem

🐰 A rabbit hops through request queues,
Sharing futures, dodging dues—
When wifi flutters near and far,
Debounce keeps the reconnect bar.
Fifteen seconds, quick and spry,
Recovery hops by and by.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title directly summarizes the main fix: coalescing queryChannels requests during network reconnects to prevent burst queries on flaky networks.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/recover-state-on-reconnect

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/stream_chat_flutter_core/lib/src/stream_chat_core.dart (1)

223-227: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Recreate lifecycle manager when client or lifecycle config changes.

Line 273 handles new client flags, but _lifecycleManager (Line 223) still points to the old client/config. After a widget update, connectivity/app lifecycle callbacks can keep driving the old client instance.

💡 Suggested fix
-  late final _lifecycleManager = _ChatLifecycleManager(
-    client: client,
-    backgroundKeepAlive: widget.backgroundKeepAlive,
-    onBackgroundEvent: widget.onBackgroundEventReceived,
-  );
+  late _ChatLifecycleManager _lifecycleManager;
+
+  _ChatLifecycleManager _createLifecycleManager() => _ChatLifecycleManager(
+        client: client,
+        backgroundKeepAlive: widget.backgroundKeepAlive,
+        onBackgroundEvent: widget.onBackgroundEventReceived,
+      );

   `@override`
   void initState() {
     super.initState();
+    _lifecycleManager = _createLifecycleManager();
     WidgetsBinding.instance.addObserver(this);
     _subscribeToConnectivityChange(widget.connectivityStream);
@@
   `@override`
   void didUpdateWidget(StreamChatCore oldWidget) {
     super.didUpdateWidget(oldWidget);
-    if (widget.client != oldWidget.client) {
-      widget.client.recoverStateOnReconnect = false;
-    }
+    final lifecycleInputsChanged =
+        widget.client != oldWidget.client ||
+        widget.backgroundKeepAlive != oldWidget.backgroundKeepAlive ||
+        widget.onBackgroundEventReceived != oldWidget.onBackgroundEventReceived;
+
+    if (widget.client != oldWidget.client) {
+      widget.client.recoverStateOnReconnect = false;
+    }
+
+    if (lifecycleInputsChanged) {
+      _lifecycleManager.dispose();
+      _lifecycleManager = _createLifecycleManager();
+      _subscribeToConnectivityChange(widget.connectivityStream);
+      return;
+    }
+
     final connectivityStream = widget.connectivityStream;
     if (connectivityStream != oldWidget.connectivityStream) {
       _unsubscribeFromConnectivityChange();
       _subscribeToConnectivityChange(connectivityStream);
     }
   }

Also applies to: 271-280

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/stream_chat_flutter_core/lib/src/stream_chat_core.dart` around lines
223 - 227, The _lifecycleManager is created with the initial client and
lifecycle options but isn't recreated when those inputs change; update the
widget's update handling (e.g., didUpdateWidget) to detect changes to client,
widget.backgroundKeepAlive, or widget.onBackgroundEventReceived and dispose the
existing _lifecycleManager and create a new _ChatLifecycleManager bound to the
new client and options; ensure you call the existing _lifecycleManager.dispose()
(or equivalent) before replacing it so the old manager stops observing
connectivity/app lifecycle for the old client.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@packages/stream_chat_flutter_core/lib/src/stream_chat_core.dart`:
- Around line 223-227: The _lifecycleManager is created with the initial client
and lifecycle options but isn't recreated when those inputs change; update the
widget's update handling (e.g., didUpdateWidget) to detect changes to client,
widget.backgroundKeepAlive, or widget.onBackgroundEventReceived and dispose the
existing _lifecycleManager and create a new _ChatLifecycleManager bound to the
new client and options; ensure you call the existing _lifecycleManager.dispose()
(or equivalent) before replacing it so the old manager stops observing
connectivity/app lifecycle for the old client.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1e5fe132-8a33-4810-b8d9-19d8234ceceb

📥 Commits

Reviewing files that changed from the base of the PR and between 1168767 and 33ef5c9.

📒 Files selected for processing (7)
  • packages/stream_chat/lib/src/client/client.dart
  • packages/stream_chat/lib/src/core/util/in_flight_cache.dart
  • packages/stream_chat/test/src/client/client_test.dart
  • packages/stream_chat/test/src/core/util/in_flight_cache_test.dart
  • packages/stream_chat_flutter/lib/src/stream_chat.dart
  • packages/stream_chat_flutter_core/lib/src/stream_chat_core.dart
  • packages/stream_chat_flutter_core/test/stream_chat_core_test.dart

@xsahil03x xsahil03x force-pushed the fix/recover-state-on-reconnect branch from 6f9ffd8 to 33ef5c9 Compare May 18, 2026 02:54
@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

❌ Patch coverage is 65.38462% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.29%. Comparing base (1139f35) to head (1634d38).

Files with missing lines Patch % Lines
packages/stream_chat/lib/src/client/client.dart 30.00% 7 Missing ⚠️
...am_chat_flutter_core/lib/src/stream_chat_core.dart 81.81% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2652      +/-   ##
==========================================
- Coverage   65.30%   65.29%   -0.01%     
==========================================
  Files         422      423       +1     
  Lines       26640    26642       +2     
==========================================
+ Hits        17396    17397       +1     
- Misses       9244     9245       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

xsahil03x and others added 2 commits May 18, 2026 15:54
- llc 🐞 Fixed: queryChannels coalescing via in-flight cache.
- llc ✅ Added: StreamChatClient.recoverStateOnReconnect setter.
- core 🐞 Fixed: 3s connectivity-event debounce in StreamChatCore.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@xsahil03x xsahil03x merged commit ed12a7d into master May 18, 2026
21 of 23 checks passed
@xsahil03x xsahil03x deleted the fix/recover-state-on-reconnect branch May 18, 2026 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants