Skip to content

fix(service): race condition in force_reload()#98

Merged
Keshoid merged 7 commits into
release/nodectl/v0.4.0from
fix/task-reload
Apr 17, 2026
Merged

fix(service): race condition in force_reload()#98
Keshoid merged 7 commits into
release/nodectl/v0.4.0from
fix/task-reload

Conversation

@Keshoid
Copy link
Copy Markdown
Contributor

@Keshoid Keshoid commented Apr 16, 2026

Summary

  • Fix race condition in force_reload() where config snapshot was captured before reload(), but by the time reload() finished rebuilding runtime state (vault, RPC, wallets, pools), the config could have been mutated by another concurrent REST API request — causing the runtime state to be rebuilt from a stale config
  • Split reload into two clear methods: reload_state() (reads current config from store, used by force_reload) and reload(config) (accepts an external config, used for file-change reloads)

Copilot AI review requested due to automatic review settings April 16, 2026 22:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to ensure runtime state/task behavior updates correctly when elections-related configuration changes, and improves CI artifact collection for nodectl e2e runs.

Changes:

  • Add an info log when a forced runtime cache reload is triggered.
  • Change elections include/exclude REST handlers to signal config_changed instead of directly restarting the elections task.
  • Update the nodectl e2e GitHub Actions workflow to always upload logs and set artifact retention/behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/node-control/service/src/runtime_config.rs Adds logging around force_reload() to improve observability of cache reload triggers.
src/node-control/service/src/http/http_server_task.rs Routes elections include/exclude mutations through the central config_changed notification mechanism.
.github/workflows/nodectl-e2e.yml Adjusts e2e artifact upload to run always and adds retention/handling options.
Comments suppressed due to low confidence (1)

.github/workflows/nodectl-e2e.yml:78

  • The e2e runner (run_singlehost_nodectl.py) writes the bootstrap/script log to src/node/tests/test_run_net_py/singlehost-bootstrap.log by default (not under tmp/). This workflow removed the bootstrap log from the uploaded artifact, so failures/successes may be harder to diagnose. Consider adding src/node/tests/test_run_net_py/singlehost-bootstrap.log to the path list (or otherwise ensure the script log is included).
          if-no-files-found: warn
          path: |
            src/node/tests/test_run_net_py/tmp/nodectl-config.json
            src/node/tests/test_run_net_py/tmp/nodectl-service.log


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

/// Use after REST mutations that change structural config (entities, endpoints).
pub async fn force_reload(&self) -> anyhow::Result<()> {
let config = (*self.get()).clone();
tracing::info!("force reloading is triggered");
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log message reads a bit ungrammatically ("force reloading is triggered") and is likely to become a frequently-emitted info log. Consider rewording to something clearer/consistent with other logs (e.g., "force reload triggered") and/or lowering to debug if this is primarily for troubleshooting.

Suggested change
tracing::info!("force reloading is triggered");
tracing::info!("force reload triggered");

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log removed

}
})
.map_err(|e| AppError::internal(e.to_string()))?;

Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching from directly restarting elections_task to config_changed.notify_one() means applying include/exclude now depends on the service loop successfully completing runtime_cfg.force_reload() before tasks are restarted. Since force_reload() reopens the vault and reconnects RPC, a transient vault/RPC failure can prevent elections include/exclude from taking effect even though the config update succeeded. Consider keeping a direct elections-task restart for this endpoint (or using a lighter signal that restarts tasks without a full cache rebuild) so enable/disable can still apply when reload fails for unrelated reasons.

Suggested change
if let Err(e) = state.elections_task.restart().await {
tracing::warn!(
"failed to directly restart elections task after exclude update: {e}"
);
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restored.

Comment on lines 531 to 533

let task = state.elections_task.clone();
tokio::spawn(async move {
let _ = task.restart().await;
});
state.config_changed.notify_one();

Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern as in the exclude handler: using config_changed.notify_one() makes the include operation’s effect contingent on a successful runtime_cfg.force_reload() cycle (vault/RPC reconnect). If reload fails, tasks won’t restart and the new bindings enablement may not take effect until a later reload succeeds. Consider restarting the elections task directly for this change (or separating “restart tasks” from “rebuild caches”).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restored

@Keshoid Keshoid added run-e2e and removed run-e2e labels Apr 17, 2026
@Keshoid Keshoid removed the run-e2e label Apr 17, 2026
@Keshoid Keshoid added run-e2e and removed run-e2e labels Apr 17, 2026
@linear
Copy link
Copy Markdown

linear Bot commented Apr 17, 2026

@Keshoid Keshoid changed the title Fix runtime state reload on elections settings change fix(service): race condition in force_reload() Apr 17, 2026
@Keshoid Keshoid added run-e2e and removed run-e2e labels Apr 17, 2026
@Keshoid Keshoid merged commit 6bfa77d into release/nodectl/v0.4.0 Apr 17, 2026
11 of 12 checks passed
@Keshoid Keshoid deleted the fix/task-reload branch April 17, 2026 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants