Skip to content

fix: Dispatcher shutdown race leaves late-opened Reactor consumers running (#4075)#4081

Merged
iancooper merged 5 commits intoBrighterCommand:masterfrom
thomhurst:fix/4075-dispatcher-shutdown-race
Apr 26, 2026
Merged

fix: Dispatcher shutdown race leaves late-opened Reactor consumers running (#4075)#4081
iancooper merged 5 commits intoBrighterCommand:masterfrom
thomhurst:fix/4075-dispatcher-shutdown-race

Conversation

@thomhurst
Copy link
Copy Markdown
Contributor

Fixes #4075.

Root cause

Dispatcher.Start() flipped State = DS_RUNNING before opening consumers, so Receive()'s 100ms busy-wait could return while some consumers were still in ConsumerState.Shut. A Shut(subscription) or End() call racing into that window would no-op against those consumers — Consumer.Shut() only enqueues a quit when State == Open. The control task then opened them, leaving orphaned Reactor performers parked in Task.Delay(EmptyChannelDelay).GetAwaiter().GetResult() with no way to receive a quit message. End() returned the control task, which was blocked in Task.WaitAny waiting for performers that would never stop.

This matches the hang dump in #4075 exactly: control task in WaitAny, two Reactor performers parked in Task.Delay.

Fix

Two layers:

  1. Dispatcher.Start() — open every consumer and register its task before publishing State = DS_RUNNING. Replaced the 100ms busy-wait poll with a TaskCompletionSource<bool>, which gives callers of Start() an explicit happens-before edge over the opens — Receive() cannot return until every performer is Open. Exceptions during open propagate to the caller via TrySetException so it never deadlocks.

  2. Consumer — added a state lock + _shutRequested flag. A Shut() arriving before Open() records intent; the subsequent Open() honours it and stays Shut (no performer is started). Dispatcher.Start() filters out consumers whose Job is null when registering tasks. This is defence in depth for any future caller path that opens consumers off the control task.

Verification

New regression test: tests/Paramore.Brighter.Core.Tests/MessageDispatch/Reactor/When_dispatcher_shuts_immediately_after_receive_should_not_hang.cs — three invariants, each looped 25–50 iterations to amplify the race:

  • Receive() post-condition: every consumer is Open.
  • Shut immediately after ReceiveEnd finishes within 10s.
  • End immediately after Receive → finishes within 10s.

Pre-fix: all three fail (matches reporter's iter-16 local repro). Post-fix: 3/3 green. Full MessageDispatch suite (83 tests) passes on both net9.0 and net10.0.

Test plan

  • dotnet test --filter FullyQualifiedName~DispatcherShutImmediatelyAfterReceiveTests — green
  • dotnet test --filter FullyQualifiedName~MessageDispatch — 83/83 green on net9.0 + net10.0
  • dotnet test --filter "FullyQualifiedName~ControlBus|FullyQualifiedName~ServiceActivator" — green
  • CI run for the originally-flaky When_A_Message_Dispatcher_Shuts_A_Connection and When_A_Message_Dispatcher_Restarts_A_Connection_After_All_Connections_Have_Stopped

codescene-delta-analysis[bot]

This comment was marked as outdated.

codescene-delta-analysis[bot]

This comment was marked as outdated.

thomhurst added a commit to thomhurst/Brighter that referenced this pull request Apr 26, 2026
Extract control-loop body into RunControlLoop, then split into
TryOpenConsumers, WaitForPerformersToStop, HandleNextStoppedPerformer,
and RemoveConsumerForTask. Each helper has at most one level of
conditional nesting, killing the "Bumpy Road Ahead" flag on
PR BrighterCommand#4081 without changing behavior.
codescene-delta-analysis[bot]

This comment was marked as outdated.

codescene-delta-analysis[bot]

This comment was marked as outdated.

thomhurst added a commit to thomhurst/Brighter that referenced this pull request Apr 26, 2026
Extract control-loop body into RunControlLoop, then split into
TryOpenConsumers, WaitForPerformersToStop, HandleNextStoppedPerformer,
and RemoveConsumerForTask. Each helper has at most one level of
conditional nesting, killing the "Bumpy Road Ahead" flag on
PR BrighterCommand#4081 without changing behavior.
@thomhurst thomhurst force-pushed the fix/4075-dispatcher-shutdown-race branch from 200c059 to 4c8d675 Compare April 26, 2026 12:54
codescene-delta-analysis[bot]

This comment was marked as outdated.

Copy link
Copy Markdown
Member

@iancooper iancooper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Useful fix.

@iancooper iancooper added 3 - Done Bug .NET Pull requests that update .net code V10.X labels Apr 26, 2026
…) returns (BrighterCommand#4075)

Dispatcher.Start() flipped State to DS_RUNNING before opening consumers, so
Receive()'s busy-wait could return while some consumers were still Shut. A
Shut()/End() racing into that window no-op'd against not-yet-Open consumers
(Consumer.Shut only acts when State == Open). The control task then opened
those consumers, leaving orphan performers parked in Task.Delay forever and
hanging End() in Task.WaitAny.

Open every consumer and register its task before publishing DS_RUNNING, and
replace the 100ms poll with a TaskCompletionSource that gives callers an
explicit happens-before edge over the opens. Add a lock + pending-shut flag
on Consumer so a Shut() arriving before Open() is honoured (defence in depth
for any future caller path that opens off the control task).
Extract control-loop body into RunControlLoop, then split into
TryOpenConsumers, WaitForPerformersToStop, HandleNextStoppedPerformer,
and RemoveConsumerForTask. Each helper has at most one level of
conditional nesting, killing the "Bumpy Road Ahead" flag on
PR BrighterCommand#4081 without changing behavior.
- Collapse TryOpenConsumers (bool, dead false-branch) into void
  OpenConsumers; move TCS signalling and try/catch up into
  RunControlLoop where they belong.
- Iterate Consumers directly instead of OfType<Consumer>(); the
  IAmAConsumer interface already exposes Open/Job/JobId.
@thomhurst thomhurst force-pushed the fix/4075-dispatcher-shutdown-race branch from 4c8d675 to 8a1d90d Compare April 26, 2026 14:13
codescene-delta-analysis[bot]

This comment was marked as outdated.

Copy link
Copy Markdown

@codescene-delta-analysis codescene-delta-analysis Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Health Improved (1 files improve in Code Health)

Gates Passed
4 Quality Gates Passed

See analysis details in CodeScene

View Improvements
File Code Health Impact Categories Improved
Dispatcher.cs 8.55 → 9.39 Complex Method, Deep, Nested Complexity

Quality Gate Profile: Clean Code Collective
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.

@iancooper iancooper merged commit f94246c into BrighterCommand:master Apr 26, 2026
25 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 - Done Bug .NET Pull requests that update .net code V10.X

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dispatcher shutdown race can leave late-opened Reactor consumers running after Shut()/End()

2 participants