Skip to content

Implement control flow for restoring zones#589

Merged
bal-e merged 15 commits intomainfrom
zonedata-restores
Apr 29, 2026
Merged

Implement control flow for restoring zones#589
bal-e merged 15 commits intomainfrom
zonedata-restores

Conversation

@bal-e
Copy link
Copy Markdown
Contributor

@bal-e bal-e commented Apr 15, 2026

This PR implements the zone data state machine logic for #550, and re-organizes Cascade's support for persistence (which now includes restoration) into a new persistence module. It does not break Cascade; we can merge it and then make a PR that implements the actual persistence and restoration operations (based on #550).


  • If you are changing Rust code or integration tests (Cargo.*, crates/, etc/, integration-tests/, src/):
    • Did you run the integration tests with act through the act-wrapper (as described in TESTING.md)?

@bal-e bal-e self-assigned this Apr 15, 2026
@bal-e bal-e force-pushed the zonedata-restores branch from c02d7c8 to b3fbe08 Compare April 20, 2026 08:10
@bal-e
Copy link
Copy Markdown
Contributor Author

bal-e commented Apr 20, 2026

The ixfr-in test is broken due to #567, but all other tests pass, and I believe the PR is ready to go.

@bal-e bal-e marked this pull request as ready for review April 20, 2026 08:12
@bal-e bal-e requested a review from ximon18 April 20, 2026 08:15
@bal-e bal-e changed the title Implement restoring functionality in cascade-zonedata Implement control flow for restoring zones Apr 20, 2026
Copy link
Copy Markdown
Member

@ximon18 ximon18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a while to understand but looks good to me. The only slight thought I have is that there are multiple things called persist and restore and only one of them does actual persist and restore, e.g. a Persister doesn't persist anything nor does a Restorer restore anything, instead they give the required read/write access (respectively) to the underlying zone data storage that needs to be written to/read from to do persistence/restoration. So maybe the names could be made a bit more distinct.

Comment thread src/manager.rs Outdated
Comment thread src/center.rs
center,
}
.storage()
.abandon_loaded_restoration(restorer);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find abandoning something here that isn't clear that it is in progress (unless you deep dive into the zone data storage state machine) confusing.

Would it not be simpler and clearer to let the zone go through restoration and not actually restore anything because there is nothing to restore?

Or since this is the only place that Zone::new() is called, make Zone::new() initialize the state machine into the passive state or have it do this transition past restoration?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that "abandon" here is hard to understand. I had initially attempted to add a constructor for ZoneDataStorage that directly provides the zone viewer objects; but that complicates the zone initialization procedure (either those viewer objects are also stored in StorageState, or they are directly passed out of Zone::new()). The current approach seemed more compact, but I didn't realize how strange it looks.

Would it not be simpler and clearer to let the zone go through restoration and not actually restore anything because there is nothing to restore?

Hmm... that's not a bad idea. I'm just worried that the resulting logs will confuse operators ("why is Cascade trying to restore a newly created zone from disk?").

OTOH, the end result of this "just try restoring" approach would be the same -- abandon_loaded_restoration() would be called.

Or since this is the only place that Zone::new() is called, make Zone::new() initialize the state machine into the passive state or have it do this transition past restoration?

See above; I was concerned about the (cognitive and syntactic) overhead of juggling the three viewer objects through Zone::new() and its internals.

I agree that the current approach is not ideal, but the alternatives also have drawbacks. I'm open to changing this if we can ascertain that one option is definitely better.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would of course be own suggestion:

let the zone go through restoration and not actually restore anything because there is nothing to restore

You also don't need to log anything if there's nothing to restore so that would resolve your concern:

Hmm... that's not a bad idea. I'm just worried that the resulting logs will confuse operators ("why is Cascade trying to restore a newly created zone from disk?").

Comment thread src/persistence/mod.rs Outdated
Comment thread crates/zonedata/src/restorer.rs Outdated
Comment thread crates/zonedata/src/restorer.rs Outdated
Comment thread src/state/v1.rs
// The caller will extract 'zones' and 'policies' beforehand.
zones: _,
policies: _,
// TODO: More fields.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which "more fields"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I should have elaborated here. I have organized this code in a future-proof way; should we ever store more data in the global state (e.g. information about TSIG keys that does not belong in the TSIG store file), and add it to Spec, this pattern-match will fail so we remember to update it. The "more fields" here refers to those future field additions. Without that future consideration, this function does not have a purpose and can be removed; but IMO it's better to keep it here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could remove this specific comment and make a more general one at the top of the function body.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to simplify the code by removing the function and, maybe, leave a comment at the top to say this could be extended in future (though I wonder if that really adds any value).

Comment thread src/state/v1.rs
// TODO: More fields.
};

// TODO: Initialize fields from 'Spec'.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which "fields from 'Spec'"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same "more fields"; see the prior discussion.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I could remove this comment in favor of one at the top of the function body.

Comment thread src/zone/state/mod.rs
Comment thread src/zone/machine.rs
Comment thread src/zone/storage.rs Outdated
ximon18 added a commit that referenced this pull request Apr 28, 2026
Will need changing based on PR #589.
@ximon18 ximon18 mentioned this pull request Apr 28, 2026
4 tasks
bal-e added 15 commits April 29, 2026 12:06
This allows 'cascaded' to handle persistence instead of
'cascade-zonedata'.
This better distinguishes the control flow for creating a _new_ zone
from the control flow for restoring a zone from state files at startup.
This aids in detecting which zones need data restore operations.
These components will handle zone data persistence.
The new control flow is:

- A loaded instance is approved.
- 'ZoneHandle::approve_loaded()' is called:
 - 'ZoneStorageHandle::accept_loaded()' is called.
   - It now returns a 'LoadedZonePersister'.
 - 'ZonePersistenceHandle::start_loaded_persistence()' is called.
   - It spawns a background task that calls 'persist_loaded()'.
- Persistence finishes (successfully).
- 'ZoneHandle::begin_signing()' is called.
  - 'ZoneStorageHandle::start_sign()' is called.
    - It now returns a 'SignedZoneBuilder'.
  - 'SignerZoneHandle::enqueue_new_sign()' is called.
@bal-e bal-e force-pushed the zonedata-restores branch from c7f61aa to 54d532d Compare April 29, 2026 10:28
@bal-e
Copy link
Copy Markdown
Contributor Author

bal-e commented Apr 29, 2026

As per internal discussion, we are merging this and leaving leftover issues (which are all quite minor) to be resolved later.

@bal-e bal-e merged commit 2ff775f into main Apr 29, 2026
9 checks passed
@bal-e bal-e deleted the zonedata-restores branch April 29, 2026 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants