Skip to content

fix: serialize TaskController lifecycle ops to prevent orphaned election tasks#146

Merged
Keshoid merged 3 commits into
release/nodectl/v0.5.0from
feature/sma-89-config-elections-enable-race-concurrent-calls-spawn
May 18, 2026
Merged

fix: serialize TaskController lifecycle ops to prevent orphaned election tasks#146
Keshoid merged 3 commits into
release/nodectl/v0.5.0from
feature/sma-89-config-elections-enable-race-concurrent-calls-spawn

Conversation

@Keshoid
Copy link
Copy Markdown
Contributor

@Keshoid Keshoid commented May 16, 2026

Problem

config elections enable/include/exclude HTTP handlers fire tokio::spawn(restart()) without serialization. TaskController::restart() is disable().await; enable().await with the inner std::sync::Mutex<State> released between phases (and dropped across handle.await). Two concurrent restarts can interleave such that a task is spawned by caller B, then its cancel/handle are overwritten by caller A's resumed enable() — orphaning B's task until process shutdown.

Fix

Add a tokio::sync::Mutex<()> (lifecycle) to TaskController. Public enable / disable / restart each acquire it first, then delegate to private _locked helpers (original bodies unchanged). restart() now holds the lock across the full disable→enable sequence, so concurrent callers queue instead of racing.

The existing std::sync::Mutex<State> is kept for short critical sections so status() stays non-blocking and isn't held back by long-running lifecycle work.

Scope (deliberately minimal)

  • HTTP handlers still use detached tokio::spawn(restart()) — now safe because restarts queue on lifecycle. Switching them to await would block the HTTP response on restart duration, which is a separate UX decision.
  • The "config read at spawn time" ordering note from the ticket is not addressed — harmless once atomicity is fixed.

Closes SMA-89

Copilot AI review requested due to automatic review settings May 16, 2026 21:00
@linear
Copy link
Copy Markdown

linear Bot commented May 16, 2026

SMA-89

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR serializes TaskController lifecycle operations to prevent concurrent restarts from interleaving and orphaning spawned service tasks.

Changes:

  • Added an async lifecycle mutex to serialize enable, disable, and restart.
  • Split lifecycle operations into public locked methods and private locked helpers.
  • Added a concurrent restart regression smoke test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/node-control/service/src/task/task_manager.rs Outdated
Comment thread src/node-control/service/src/task/task_manager.rs Outdated
Comment thread src/node-control/service/src/task/task_manager.rs
@Keshoid Keshoid requested a review from Lapo4kaKek May 17, 2026 18:32
@Keshoid Keshoid merged commit 8e48a2e into release/nodectl/v0.5.0 May 18, 2026
8 of 9 checks passed
@Keshoid Keshoid deleted the feature/sma-89-config-elections-enable-race-concurrent-calls-spawn branch May 18, 2026 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants