Simplify FlowMap write path with synchronized + @Volatile by rossdanderson · Pull Request #7 · caplin/DataSource-Extensions

rossdanderson · 2026-05-21T13:31:39Z

Serialize FlowMap writes through a synchronized block (drops the per-subscriber TreeMap reordering layer) and back the state with a @Volatile var instead of MutableStateFlow (one atomic op per write instead of two).

Throughput vs main:

Benchmark	main	branch	Δ
asFlowWithStateCollection	1442	2408	+67%
concurrentPutCycling (4 contended writers)	2091	3373	+61%
putChanging	14296	16559	+16%
putAllSmall	19376	21809	+13%
putCycling (256 keys)	9635	10790	+12%
asFlowCollection	38.3	41.0	+7%

Small wins elsewhere; no regressions outside noise. End-to-end drained throughput is unchanged (dispatcher-limited).

@threads

Drop the per-subscriber TreeMap reordering (orderedSignal) by serializing writes through a single synchronized block. The previous design CAS'd the state version then tryEmit'd asynchronously, so concurrent writers could emit out of order; each subscriber then rebuffered via a TreeMap to restore version order. With writes serialized at the producer, the whole reordering layer disappears (~40 lines removed) and most paths get faster. Also add benchmarks that exercise real mutations (putChanging, putCycling, putAllLarge, putChangingWithSubscribers) and contended writes (concurrentPutCycling @threads(4)). The existing putSingle and putWithSubscribers put the same key/value repeatedly, so they short-circuit after iteration 1 and measure the no-op fast path, not the put. Throughput deltas (before -> after): asFlowWithStateCollection +63% (subscriber initial collection) asFlowCollection +16% putAllSmall +14% concurrentPutCycling (4 threads) +12% putWithSubscribers, putSingle +3% putAllLarge, getFromLarge ~flat putChanging -9% putAndRemove -14% putChangingWithSubscribers(1) -19% putCycling (256 keys, single) -39% Subscriber and collection paths see large gains. Single-threaded mutating writes regress because each put now pays both a synchronized monitor and a separate MutableStateFlow.value setter (its own internal CAS). A follow-up replacing MutableStateFlow with @volatile var (valueFlow derived from signal) should reclaim that.

With writes already serialized through the synchronized block, the MutableStateFlow was redundant - every put paid for a monitor acquire/release AND the StateFlow's internal CAS-based setter. Removing the second atomic op reclaims the write-path regressions the previous commit introduced. valueFlow now derives from `signal` instead of `state`. Same per-key semantics; the one behavioural difference is that slow valueFlow consumers no longer benefit from StateFlow's implicit conflation, which matches the documented contract ("Events can be conflated with [conflate]" - opt-in, not implicit). Throughput deltas vs the previous commit (ops/ms): putCycling +87% (was -39% vs original) putChangingWithSubscribers(1) +28% putChanging +24% (was -9% vs original) putAndRemove +19% (was -14% vs original) concurrentPutCycling +17% putChangingWithSubscribers(100) +8% most others ~flat or small wins asFlowCollection -9% (still +5% vs original) asFlowWithStateCollection -28% (still +17% vs original) Write paths are now clear wins instead of the trade-off they were after synchronized alone. Subscriber-collection gives back some of the gain but stays ahead of the original.

The existing putChangingWithSubscribers measures writer-only throughput: events go into the MutableSharedFlow buffer (Int.MAX_VALUE capacity) and the writer returns without waiting for subscribers to consume. That hides the real fan-out cost and produces a misleading non-monotonic curve where n=10 looks slower than n=100 - the writer is paying full O(N) wake-up cost at n=10 but mostly just buffering at n=100. putChangingWithDrainingSubscribers blocks after each put until every subscriber has actually run its onEach. End-to-end results are monotonic in subscriber count, as expected: n=1 | 278 ops/ms (vs writer-only 3819 - 14x ahead) n=10 | 60 ops/ms (vs writer-only 62 - same; subscribers keep up) n=100 | 6.2 ops/ms (vs writer-only 566 - 91x ahead) Subscribers signal via an AtomicLong counter that the writer spin-waits on. A blocking Phaser/CountDownLatch on the subscriber side would deadlock with 100 subscribers on Dispatchers.Default (~12 threads) - workers would park before all parties could arrive. Class-level KDoc table refreshed with this run's numbers, including the new benchmark.

rossdanderson added 4 commits May 21, 2026 13:35

Filter valueFlow by event key instead of map lookup

6576bb5

rossdanderson merged commit 647d4bd into main May 21, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify FlowMap write path with synchronized + @Volatile#7

Simplify FlowMap write path with synchronized + @Volatile#7
rossdanderson merged 4 commits into
mainfrom
flowmap-performance

rossdanderson commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rossdanderson commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant