chore(orchestrator): log more info about local CUP before persisting it #7487

pierugo-dfinity · 2025-10-30T13:59:09Z

This PR logs more useful information (especially the state hash) about the local CUP just before persisting it in the orchestrator.

This is useful in cases where the orchestrator breaks after an upgrade which would prevent from provisioning readonly SSH keys to recover the subnet. In that case, there is no easy way to know the latest state hash to be included in the recovery CUP except from hoping that the recovery operator's node is up to date. Logging information about the CUP just before rebooting removes this requirement, as long as the latest logs were scraped before the node reboots.

Edit: Following the PR comments, the original solution suffered that it could be possible that the logs were not scraped before rebooting if the node reboots too fast. Since the state hash is logged by the state manager anyways before actually creating the CUP, we can rely on this log instead. The original twin PR intended to test the functionality now relies on the log from the state manager, preventing it to be removed in the future, and is now also open since we do not need to wait for the current PR to be merged to mainnet NNS. The two PRs are independent.

Still, including the state hash in the orchestrator cannot hurt and this PR does just that.

About the original sleep of 2 seconds at the end of the orchestrator to let Vector scrape late logs, there may be a way to persist logs before rebooting and ask systemd-journal-gatewayd to serve logs from the previous boot but I do not think it is worth the effort (we would need to change the Vector configs f.ex.) just to see a few lines of logs missing.

This reverts commit a9e12a7.

This reverts commit 3bb735e.

This reverts commit d0b8963.

This reverts commit f4ac3b0.

This reverts commit 5c0a596.

rs/orchestrator/src/main.rs

This reverts commit f07e934.

…it (#7487) This PR logs more useful information (especially the state hash) about the local CUP just before persisting it in the orchestrator. This is useful in cases where the orchestrator breaks after an upgrade which would prevent from provisioning readonly SSH keys to recover the subnet. In that case, there is no easy way to know the latest state hash to be included in the recovery CUP except from hoping that the recovery operator's node is up to date. Logging information about the CUP just before rebooting removes this requirement, as long as the latest logs were scraped before the node reboots. Edit: Following the PR comments, the original solution suffered that it could be possible that the logs were not scraped before rebooting if the node reboots too fast. Since the state hash is logged by the state manager anyways before actually creating the CUP, we can rely on this log instead. The original twin [PR](#7525) intended to test the functionality now relies on the log from the state manager, preventing it to be removed in the future, and is now also open since we do not need to wait for the current PR to be merged to mainnet NNS. The two PRs are independent. Still, including the state hash in the orchestrator cannot hurt and this PR does just that. About the original sleep of 2 seconds at the end of the orchestrator to let Vector scrape late logs, there may be a way to persist logs before rebooting and ask `systemd-journal-gatewayd` to serve logs from the previous boot but I do not think it is worth the effort (we would need to change the Vector configs f.ex.) just to see a few lines of logs missing.

chore(orchestrator): add local CUP state hash metric

a9e12a7

github-actions bot added the chore label Oct 30, 2025

pierugo-dfinity added 2 commits October 31, 2025 15:37

Revert "chore(orchestrator): add local CUP state hash metric"

3408fd9

This reverts commit a9e12a7.

chore: log local CUP info

6f87909

pierugo-dfinity changed the title ~~chore(orchestrator): add local CUP state hash metric~~ chore(orchestrator): log local CUP info after persisting it Nov 4, 2025

pierugo-dfinity mentioned this pull request Nov 4, 2025

test(orchestrator): compare latest computed root hash pre-upgrade with CUP state hash post-upgrade #7525

Merged

pierugo-dfinity added 2 commits November 4, 2025 16:41

refactor: reword old log and move confirmation log

0f99dee

refactor: move sleep at the end of orchestrator stopping

58badb0

pierugo-dfinity added the CI_ALL_BAZEL_TARGETS Runs all bazel targets and uploads them to S3 label Nov 4, 2025

pierugo-dfinity added 9 commits November 4, 2025 17:05

style

2cdeb9c

chore: remove sleep

d0b8963

fix: clippy

3bb735e

Revert "fix: clippy"

5c0a596

This reverts commit 3bb735e.

Revert "chore: remove sleep"

f4ac3b0

This reverts commit d0b8963.

Reapply "chore: remove sleep"

bf7b931

This reverts commit f4ac3b0.

Reapply "fix: clippy"

4e6dc25

This reverts commit 5c0a596.

Merge branch 'master' into pierugo/orchestrator/local-cup-hash-metric

0dd87af

fix: sleep a bit before exiting

f07e934

pierugo-dfinity marked this pull request as ready for review November 13, 2025 12:54

pierugo-dfinity requested a review from a team as a code owner November 13, 2025 12:54

github-actions bot added the @consensus label Nov 13, 2025

eichhorl reviewed Nov 13, 2025

View reviewed changes

rs/orchestrator/src/main.rs Outdated Show resolved Hide resolved

pierugo-dfinity added 2 commits November 19, 2025 08:45

Revert "fix: sleep a bit before exiting"

6c27d6d

This reverts commit f07e934.

chore: remove second log

3900a73

pierugo-dfinity changed the title ~~chore(orchestrator): log local CUP info after persisting it~~ chore(orchestrator): log more info about local CUP before persisting it Nov 19, 2025

Merge branch 'master' into pierugo/orchestrator/local-cup-hash-metric

47c12a1

eichhorl approved these changes Nov 20, 2025

View reviewed changes

pierugo-dfinity added this pull request to the merge queue Nov 26, 2025

Merged via the queue into master with commit b0ea5e9 Nov 26, 2025
69 of 70 checks passed

pierugo-dfinity deleted the pierugo/orchestrator/local-cup-hash-metric branch November 26, 2025 13:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(orchestrator): log more info about local CUP before persisting it #7487

chore(orchestrator): log more info about local CUP before persisting it #7487

Uh oh!

pierugo-dfinity commented Oct 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chore(orchestrator): log more info about local CUP before persisting it #7487

chore(orchestrator): log more info about local CUP before persisting it #7487

Uh oh!

Conversation

pierugo-dfinity commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pierugo-dfinity commented Oct 30, 2025 •

edited

Loading