Skip to content

feat(ic-admin): Added take-subnet-offline-for-repairs.#7361

Merged
daniel-wong-dfinity-org merged 1 commit intomasterfrom
take-subnet-offline-for-repairs-daniel-wong
Oct 22, 2025
Merged

feat(ic-admin): Added take-subnet-offline-for-repairs.#7361
daniel-wong-dfinity-org merged 1 commit intomasterfrom
take-subnet-offline-for-repairs-daniel-wong

Conversation

@daniel-wong-dfinity-org
Copy link
Copy Markdown
Contributor

@daniel-wong-dfinity-org daniel-wong-dfinity-org commented Oct 21, 2025

@github-actions github-actions bot added the feat label Oct 21, 2025
@daniel-wong-dfinity-org daniel-wong-dfinity-org force-pushed the take-subnet-offline-for-repairs-daniel-wong branch 2 times, most recently from 1f82bcf to 54691b1 Compare October 21, 2025 13:33
@daniel-wong-dfinity-org daniel-wong-dfinity-org marked this pull request as ready for review October 21, 2025 13:33
@daniel-wong-dfinity-org daniel-wong-dfinity-org requested a review from a team as a code owner October 21, 2025 13:33
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request changes code owned by the Governance team. Therefore, make sure that
you have considered the following (for Governance-owned code):

  1. Update unreleased_changelog.md (if there are behavior changes, even if they are
    non-breaking).

  2. Are there BREAKING changes?

  3. Is a data migration needed?

  4. Security review?

How to Satisfy This Automatic Review

  1. Go to the bottom of the pull request page.

  2. Look for where it says this bot is requesting changes.

  3. Click the three dots to the right.

  4. Select "Dismiss review".

  5. In the text entry box, respond to each of the numbered items in the previous
    section, declare one of the following:

  • Done.

  • $REASON_WHY_NO_NEED. E.g. for unreleased_changelog.md, "No
    canister behavior changes.", or for item 2, "Existing APIs
    behave as before.".

Brief Guide to "Externally Visible" Changes

"Externally visible behavior change" is very often due to some NEW canister API.

Changes to EXISTING APIs are more likely to be "breaking".

If these changes are breaking, make sure that clients know how to migrate, how to
maintain their continuity of operations.

If your changes are behind a feature flag, then, do NOT add entrie(s) to
unreleased_changelog.md in this PR! But rather, add entrie(s) later, in the PR
that enables these changes in production.

Reference(s)

For a more comprehensive checklist, see here.

GOVERNANCE_CHECKLIST_REMINDER_DEDUP

@daniel-wong-dfinity-org daniel-wong-dfinity-org dismissed github-actions[bot]’s stale review October 21, 2025 13:41
  1. No canister behavior changes.
  2. Non-breaking
  3. All existing data is fine.
  4. Will happen.
Base automatically changed from set_subnet_operational_level-impl-daniel-wong to master October 21, 2025 13:48
@daniel-wong-dfinity-org daniel-wong-dfinity-org force-pushed the take-subnet-offline-for-repairs-daniel-wong branch from 54691b1 to ee18e25 Compare October 22, 2025 08:59
@daniel-wong-dfinity-org daniel-wong-dfinity-org added this pull request to the merge queue Oct 22, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Oct 22, 2025
@daniel-wong-dfinity-org daniel-wong-dfinity-org added this pull request to the merge queue Oct 22, 2025
Merged via the queue into master with commit 0711ee3 Oct 22, 2025
29 checks passed
@daniel-wong-dfinity-org daniel-wong-dfinity-org deleted the take-subnet-offline-for-repairs-daniel-wong branch October 22, 2025 12:42
///
/// This is the first step of subnet recovery. Previously the first step was
/// done using propose-to-update-subnet. However, that does not support
/// changing ssn_node_state_write_access, which is needed when a subnet has
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// changing ssn_node_state_write_access, which is needed when a subnet has
/// changing ssh_node_state_write_access, which is needed when a subnet has

let node_id = PrincipalId::from_str(node_id).map_err(|err| format!("{err}"))?;
let node_id = NodeId::from(node_id);

// Parse public_keys, by simply splitting on ','.<
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Parse public_keys, by simply splitting on ','.<
// Parse public_keys, by simply splitting on ','.

let node_id = parts.next().ok_or("Missing node ID.")?;
let public_keys = parts
.next()
.ok_or("Missing semicolon separating node ID and SSH public key.")?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.ok_or("Missing semicolon separating node ID and SSH public key.")?
.ok_or("Missing semicolon separating node ID and SSH public keys.")?

/// clobber. However, in practice, it would probably be empty to begin with,
/// so most likely, this won't be an issue.
#[clap(long, value_parser, num_args(1..))]
pub ssh_node_state_write_access: Vec<NodeSshAccessFlagValue>,
Copy link
Copy Markdown
Contributor

@eichhorl eichhorl Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we need to pass at least one value? Ideally we would be able to switch to these new proposals entirely. That means it should also be possible to use them to recover subnets without SEV, in which case we don't need to deploy any ssh_node_state_write_access.

The same applies for ssh_readonly_access: You could imagine sending the proposal without any public keys, just to halt the subnet.

github-merge-queue bot pushed a commit that referenced this pull request Oct 23, 2025
github-merge-queue bot pushed a commit that referenced this pull request Feb 27, 2026
…very (#8718)

As part of SEV Recovery, recovery operators might eventually need to
provision write-access SSH keys to patch the state. Registry changes
were implemented in previous PRs (notably
#7361), and orchestrator changes in
#7265.

This PR implements the required changes in `ic-recovery`. The tool now
uses the new `ProposeToTakeSubnetOfflineForRepairs` `ic-admin` command
to provision an SSH key to a given `NodeId`, that the tool asks for in
the `Halt` step. To help the recovery operator choose, this step will
now print the heights of all nodes, next to their `NodeId`. In contrast
to regular recoveries, that first step will have to be ran on a local
machine instead of directly on a node.

Here is how the new `Halt` step would look like:

<img width="3816" height="1106" alt="image"
src="https://github.com/user-attachments/assets/77bb1119-b211-4baf-b2c9-abf21adfebe6"
/>

---------

Co-authored-by: Jason Zhu <jason.zhu@dfinity.org>
Co-authored-by: Ruediger Birkner <ruediger.birkner@dfinity.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants