Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'kpop_con_922' into 'master'
feat(Consensus, CON-922): add a state to the recovery tool so that it's possible to resume the previous execution. In order to make the recovery tool resumable the following changes have been made: 1. The Recovery structs (i.e. AppSubnetRecovery, NNSRecoverySameNodes, NNSRecoveryFalloverNodes) have been made serializable. 2. Added two new user prompts at the beginning of the program which asks the user whether to resume the recovery if an appropriate state file has been found and if the arguments passed to the program have changed. 3. Explicitly skip the steps which have been completed in the previous run. 4. After execution of each step save the state to the disk. Jira ticket(s): [CON-922](https://dfinity.atlassian.net/browse/CON-922?atlOrigin=eyJpIjoiMDhjZjU4NDY5MzE0NGM4MTkyYjZhNDcyMmU2NTVkMTMiLCJwIjoiaiJ9) One pager: [CON-922](https://docs.google.com/document/d/1ALqyUXprY30aweyk123PVMtT2N4RUdZ8Cb9TSavK97w/edit#) Tests: 1. Added some unit tests; 2. Ran the tool: 1. broke an app subnet; 2. ran the tool to recover it; 3. CTRL-C'ed out of it after the "DownloadState" step; 4. reran the tool (see the output below); 5. confirmed that the subnet is recovered (see the attached image). ``` kpop@zh1-spm34:~/repos/ic/rs$ cargo run --bin ic-recovery -- --dir /home/kpop/recovery-test-2 --nns-url http://[2602:fb2b:100:10:5000:12ff:feec:cae3]:8080 --replica-version 1ff87140826d378d74854a01364583d12c18fc3d --test app-subnet-recovery --subnet-id nmb2x-aqod7-k257h-gho3i-tl3q2-v3poy-kp2a4-daytu-sqtv6-4l7kg-jae warning: /home/kpop/repos/ic/rs/boundary_node/ic_balance_exporter/Cargo.toml: unused manifest key: bin.0.src Finished dev [unoptimized + debuginfo] target(s) in 1.15s Running `target/debug/ic-recovery --dir /home/kpop/recovery-test-2 --nns-url 'http://[2602:fb2b:100:10:5000:12ff:feec:cae3]:8080' --replica-version 1ff87140826d378d74854a01364583d12c18fc3d --test app-subnet-recovery --subnet-id nmb2x-aqod7-k257h-gho3i-tl3q2-v3poy-kp2a4-daytu-sqtv6-4l7kg-jae` Mar 06 09:20:16.135 INFO Recovery state file found with parameters { "recovery_args": { "dir": "/home/kpop/recovery-test-2", "nns_url": "http://[2602:fb2b:100:10:5000:12ff:feec:cae3]:8080/", "replica_version": { "version_id": "1ff87140826d378d74854a01364583d12c18fc3d" }, "key_file": null, "test_mode": true }, "subcommand_args": { "AppSubnetRecovery": { "subnet_id": "nmb2x-aqod7-k257h-gho3i-tl3q2-v3poy-kp2a4-daytu-sqtv6-4l7kg-jae", "upgrade_version": null, "replacement_nodes": null, "pub_key": "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIG7mlrQoHBB+Iq16f/R0hMjCWVf8Xc/WgbfRME6cvJ00 kamil.popielarz@Kamil-Popielarzs-MacBook-Pro.local", "download_node": "2602:fb2b:100:10:5000:c3ff:feeb:5b52", "keep_downloaded_state": true, "upload_node": null, "ecdsa_subnet_id": null, "last_executed_step": "DownloadState" } }, "neuron_args": null } Mar 06 09:20:16.135 INFO Resume previously started recovery? [y/n] y Mar 06 09:20:33.097 INFO Mar 06 09:20:33.097 INFO ############################### Mar 06 09:20:33.097 INFO V App Subnet Recovery V Mar 06 09:20:33.097 INFO ############################### Mar 06 09:20:33.098 INFO NNS Url: http://[2602:fb2b:100:10:5000:12ff:feec:cae3]:8080/ Mar 06 09:20:33.098 INFO Starting recovery of subnet with ID: Mar 06 09:20:33.098 INFO -> nmb2x-aqod7-k257h-gho3i-tl3q2-v3poy-kp2a4-daytu-sqtv6-4l7kg-jae Mar 06 09:20:33.098 INFO Binary version: Mar 06 09:20:33.098 INFO -> Some(ReplicaVersion { version_id: "1ff87140826d378d74854a01364583d12c18fc3d" }) Mar 06 09:20:33.098 INFO Creating recovery directory in "/home/kpop/recovery-test-2" Mar 06 09:20:33.098 INFO Press [ENTER] to continue... Mar 06 09:20:37.545 INFO ic-admin exists, skipping download. Mar 06 09:20:37.545 INFO nns.pem exists, skipping download of NNS public key Mar 06 09:20:37.545 INFO Syncing registry local store Mar 06 09:20:37.545 INFO Continuing with public key: -----BEGIN PUBLIC KEY----- MIGCMB0GDSsGAQQBgtx8BQMBAgEGDCsGAQQBgtx8BQMCAQNhAJYUx5HHCrc6jJ94 tF18BKuqBbfuvhxfP162gypbLSJ/pXydTON52qZ47im/J8lGLBX91IqmmicWY2CC wg9SawgklTe1UAeJ/tMJEWKqHaiQ++3LIAlgrI01kIHrUWoJXQ== -----END PUBLIC KEY----- Mar 06 09:20:37.545 WARN Downloaded key is NOT equal to included NNS public key Mar 06 09:20:37.549 INFO s:/n:/ic_registry_replicator/ic_registry_replicator Local registry store is not empty, skipping initialization. Mar 06 09:20:37.553 INFO Skipping already executed step Halt Mar 06 09:20:37.553 INFO Skipping already executed step DownloadCertifications Mar 06 09:20:37.554 INFO Skipping already executed step MergeCertificationPools Mar 06 09:20:37.554 INFO Skipping already executed step DownloadState Mar 06 09:20:37.554 INFO Mar 06 09:20:37.554 INFO #################### Mar 06 09:20:37.554 INFO V ICReplay V Mar 06 09:20:37.554 INFO #################### Mar 06 09:20:37.554 INFO Delete old checkpoints found in /home/kpop/recovery-test-2/recovery/working_dir/data/ic_state/checkpoints, and execute: ic-replay /home/kpop/recovery-test-2/recovery/working_dir/ic.json5 --subnet-id nmb2x-aqod7-k257h-gho3i-tl3q2-v3poy-kp2a4-daytu-sqtv6-4l7kg-jae Mar 06 09:20:37.554 INFO Execute now? [y/n] ``` ![image](/uploads/eb086ddbb9189a68bff1bc6cb6d37333/image.png) See merge request dfinity-lab/public/ic!11107
- Loading branch information