fix(llc, core): coalesce queryChannels on flaky-network reconnects#2652
Conversation
…dKeepAlive duration
connectivity_plus emits multiple events per real network transition (cell handovers, brief drops, type changes). Each emit was calling maybeReconnect() which bypasses the WS internal backoff via closeConnection() + openConnection(), and each successful reconnect fires connectionRecovered — fanning out into queryChannels refreshes from every list controller. Debounce the connectivity stream by 3 s so a flapping window collapses into a single reconnect at the trailing edge. Long enough to absorb a typical cellular handover (≤2 s), short enough to stay below the user-perceptual "chat feels slow" threshold on a real reconnect. Adds two regression tests and updates the three existing connectivity tests to pump past the debounce window instead of relying on pumpAndSettle (which doesn't reliably wait for Timer-based work).
After the recoverStateOnReconnect commit, the no-handler disconnect path also goes through the backgroundKeepAlive timer rather than closing immediately. Run the test under runAsync with a short keep-alive and await the timer before verifying closeConnection, matching the existing timer-expiry test pattern.
The cache that was supposed to dedup in-flight queryChannels had a TOCTOU race: the cache write happened after an offline-await, so N sibling callers in the same event-loop tick all missed the cache and each fired its own HTTP request. On reconnect, a burst of message.new events for unknown cids could fan out into 13+ concurrent queryChannels calls — observed as the rate-limit storm in production telemetry. Extract the dedup pattern into InFlightCache<K, V> — a generic map-backed coalescer that reserves the slot synchronously after the hash check, so siblings find the in-flight future before the first await yields control. Mirrors the DistinctChatApi / DistinctCall design already used across the Android SDK. Concurrent callers share both success and failure outcomes; the fall-through-on-error retry path is removed because it would defeat the dedup precisely when it matters most (during a 429 rate-limit storm). Sequential callers arriving after the future settles still see an empty cache and start fresh via the whenComplete cleanup. Adds unit tests for InFlightCache covering coalescing, slot lifecycle, error sharing, retry-after-failure, and synchronous-throw safety. Adds four regression tests on client.queryChannels covering the same behaviors at the SDK integration level.
📝 WalkthroughWalkthroughThis PR adds an InFlightCache to coalesce concurrent identical channel queries, exposes client-level ChangesRequest Coalescing and Connectivity Recovery Flow
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/stream_chat_flutter_core/lib/src/stream_chat_core.dart (1)
223-227:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRecreate lifecycle manager when
clientor lifecycle config changes.Line 273 handles new client flags, but
_lifecycleManager(Line 223) still points to the oldclient/config. After a widget update, connectivity/app lifecycle callbacks can keep driving the old client instance.💡 Suggested fix
- late final _lifecycleManager = _ChatLifecycleManager( - client: client, - backgroundKeepAlive: widget.backgroundKeepAlive, - onBackgroundEvent: widget.onBackgroundEventReceived, - ); + late _ChatLifecycleManager _lifecycleManager; + + _ChatLifecycleManager _createLifecycleManager() => _ChatLifecycleManager( + client: client, + backgroundKeepAlive: widget.backgroundKeepAlive, + onBackgroundEvent: widget.onBackgroundEventReceived, + ); `@override` void initState() { super.initState(); + _lifecycleManager = _createLifecycleManager(); WidgetsBinding.instance.addObserver(this); _subscribeToConnectivityChange(widget.connectivityStream); @@ `@override` void didUpdateWidget(StreamChatCore oldWidget) { super.didUpdateWidget(oldWidget); - if (widget.client != oldWidget.client) { - widget.client.recoverStateOnReconnect = false; - } + final lifecycleInputsChanged = + widget.client != oldWidget.client || + widget.backgroundKeepAlive != oldWidget.backgroundKeepAlive || + widget.onBackgroundEventReceived != oldWidget.onBackgroundEventReceived; + + if (widget.client != oldWidget.client) { + widget.client.recoverStateOnReconnect = false; + } + + if (lifecycleInputsChanged) { + _lifecycleManager.dispose(); + _lifecycleManager = _createLifecycleManager(); + _subscribeToConnectivityChange(widget.connectivityStream); + return; + } + final connectivityStream = widget.connectivityStream; if (connectivityStream != oldWidget.connectivityStream) { _unsubscribeFromConnectivityChange(); _subscribeToConnectivityChange(connectivityStream); } }Also applies to: 271-280
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/stream_chat_flutter_core/lib/src/stream_chat_core.dart` around lines 223 - 227, The _lifecycleManager is created with the initial client and lifecycle options but isn't recreated when those inputs change; update the widget's update handling (e.g., didUpdateWidget) to detect changes to client, widget.backgroundKeepAlive, or widget.onBackgroundEventReceived and dispose the existing _lifecycleManager and create a new _ChatLifecycleManager bound to the new client and options; ensure you call the existing _lifecycleManager.dispose() (or equivalent) before replacing it so the old manager stops observing connectivity/app lifecycle for the old client.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@packages/stream_chat_flutter_core/lib/src/stream_chat_core.dart`:
- Around line 223-227: The _lifecycleManager is created with the initial client
and lifecycle options but isn't recreated when those inputs change; update the
widget's update handling (e.g., didUpdateWidget) to detect changes to client,
widget.backgroundKeepAlive, or widget.onBackgroundEventReceived and dispose the
existing _lifecycleManager and create a new _ChatLifecycleManager bound to the
new client and options; ensure you call the existing _lifecycleManager.dispose()
(or equivalent) before replacing it so the old manager stops observing
connectivity/app lifecycle for the old client.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1e5fe132-8a33-4810-b8d9-19d8234ceceb
📒 Files selected for processing (7)
packages/stream_chat/lib/src/client/client.dartpackages/stream_chat/lib/src/core/util/in_flight_cache.dartpackages/stream_chat/test/src/client/client_test.dartpackages/stream_chat/test/src/core/util/in_flight_cache_test.dartpackages/stream_chat_flutter/lib/src/stream_chat.dartpackages/stream_chat_flutter_core/lib/src/stream_chat_core.dartpackages/stream_chat_flutter_core/test/stream_chat_core_test.dart
6f9ffd8 to
33ef5c9
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2652 +/- ##
==========================================
- Coverage 65.30% 65.29% -0.01%
==========================================
Files 422 423 +1
Lines 26640 26642 +2
==========================================
+ Hits 17396 17397 +1
- Misses 9244 9245 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- llc 🐞 Fixed: queryChannels coalescing via in-flight cache. - llc ✅ Added: StreamChatClient.recoverStateOnReconnect setter. - core 🐞 Fixed: 3s connectivity-event debounce in StreamChatCore. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Linear: FLU-493
Description of the pull request
Customer report (Abwaab): a live session with ~260 students hit the per-user QueryChannels rate limit. Telemetry showed bursts of 13+ QC requests in 258 ms from a single iOS device on unstable cellular. Three independent SDK behaviors were compounding into the storm:
connectionRecovered(client.dart direct call + eachStreamChannelListController.refresh()).connectivity_plusemits multiple events per real network transition; each one forced a freshopenConnection(bypassing the WS internal backoff), each successful reconnect fanned out into more QCs.This PR addresses all three at the right layer.
Commits
feat(client): add recoverStateOnReconnect option…— cherry-pick ofa5dfcd3b5from PR #2636 (originally merged intov10.0.0, then reverted). Adds arecoverStateOnReconnectflag onStreamChatClient(defaulttruefor raw-LLC consumers);StreamChatCoresets it tofalsebecause list controllers handle their own recovery. Also dropsbackgroundKeepAlivefrom 1 min → 15 s. Kills the client-level redundant QC on reconnect.fix(core): debounce connectivity events…— adds a 3 sdebounceTimeon theconnectivity_plusstream in_ChatLifecycleManager. Long enough to absorb a typical cellular handover (≤2 s), short enough to stay below the user-perceptual "chat feels slow" threshold on a real reconnect. Coalesces rapid flap patterns into one trailing-edge reconnect.test(core): wait for background timer…— small test follow-up to the cherry-pick: the no-handler disconnect path now goes through the keep-alive timer, so the test must run underrunAsyncwith a short keep-alive and await the timer. Matches the pattern of the existing timer-expiry test (and matches commit308581d24from PR #2636).fix(llc): coalesce concurrent queryChannels via InFlightCache— fixes the TOCTOU race. Extracts the dedup pattern into a genericInFlightCache<K, V>utility (mirroring the Android SDK'sDistinctChatApi/DistinctCalldesign). Cache slot is reserved synchronously after the hash check, so concurrent callers find the in-flight future before the first await yields control. Concurrent callers share both success AND failure outcomes — the fall-through-on-error retry path is removed because it would defeat the dedup precisely when it matters most (during a 429 rate-limit storm).Impact
For the customer's burst pattern (13 QCs / 258 ms screenshot):
Test plan
dart test packages/stream_chat/test/src/client/client_test.dart— 161 tests pass (incl. 4 new regression tests for concurrent dedup, sequential refresh, different-filter no-share, concurrent error sharing)dart test packages/stream_chat/test/src/core/util/in_flight_cache_test.dart— 6 unit tests pass (coalescing, slot lifecycle, error sharing, retry-after-failure, sync-throw safety)flutter test packages/stream_chat_flutter_core/test/stream_chat_core_test.dart— 15 tests pass (incl. 2 new regression tests for debounce coalescing and post-debounce-window behavior)Screenshots / Videos
n/a — no UI changes.
Summary by CodeRabbit
New Features
Behavior & Reliability
Performance & Battery
Tests