Conversation
66437c2 to
868714e
Compare
There was a problem hiding this comment.
This looks great, @tertsdiepraam. This is a really important aspect of Cascade and I'm glad that we're finally addressing it. I have a lot of comments, but they are nits or starting points for important top-level discussions. Adding to them:
- Can we merge this PR sooner rather than later (possibly ignoring many of my comments), and does it preserve Cascade's existing functionality? E.g. I think some history events are no longer recorded.
- Where/how will we store and present loader errors to the user?
- The state transition methods need some basic documentation (eventually they should describe the state transitions and mention panics; for now a one-liner would be good).
- Some logging at state transitions would be helpful.
- I forgot something: when an instance is soft-rejected, we may want to stay in that state until a new operation (e.g. loading or re-signing) is requested. Can that fit under this implementation?
|
|
||
| // Initiate the load immediately, if the data storage is not busy. | ||
| if let Some(builder) = self.zone().storage().start_load() { | ||
| if let Some(builder) = self.zone().try_start_load() { |
There was a problem hiding this comment.
Hmmm, I'm not sure whether the zone state machine needs to handle this step. Making a LoadedZoneBuilder available is IMO the zone data storage's job. Perhaps this could remain a .storage() method and in the success case, we call a .start_load() or .mark_load_started() method on the zone state machine?
There was a problem hiding this comment.
We talked about this a bit more and the actions to take are:
- Make
try_start_loadcheck the storage before the zone state machine. - Make
on_passivereact to the zone storage passive state again.
src/signer/mod.rs
Outdated
| Ok(()) => { | ||
| let built = builder.finish().unwrap_or_else(|_| unreachable!()); | ||
| handle.storage().finish_sign(built); | ||
| handle.start_signed_review(built); |
There was a problem hiding this comment.
This reminds me of an important matter to discuss: while it makes sense that signed review should begin immediately after signing finishes, it was nice to just mark that an operation is complete.
| } | ||
| } | ||
|
|
||
| #[derive(Debug, Default)] |
There was a problem hiding this comment.
For each of these states, we should document their expectations (i.e. our invariants) regarding the zone data storage state machine, and other parts of Cascade.
| if review { | ||
| info!("Initiating review of newly-loaded instance"); | ||
| info!("Initiating review of newly-loaded instance"); | ||
|
|
||
| // TODO: 'on_seek_approval_for_zone' tries to lock zone state. | ||
| std::mem::drop(state); | ||
| // TODO: 'on_seek_approval_for_zone' tries to lock zone state. | ||
| std::mem::drop(state); | ||
|
|
||
| center.unsigned_review_server.on_seek_approval_for_zone( | ||
| ¢er, | ||
| &zone, | ||
| domain::base::Serial(serial.into()), | ||
| ); | ||
| center.unsigned_review_server.on_seek_approval_for_zone( | ||
| ¢er, | ||
| &zone, | ||
| domain::base::Serial(serial.into()), | ||
| ); | ||
|
|
||
| state = zone.state.lock().unwrap(); | ||
| } | ||
| state = zone.state.lock().unwrap(); |
There was a problem hiding this comment.
We need to discuss how the state machine should flow if review is disabled.
868714e to
5e4a680
Compare
5e4a680 to
29017b9
Compare
bal-e
left a comment
There was a problem hiding this comment.
There are a few leftover comments and minor nits, but overall LGTM. I'm really excited to see this land.
crates/cli/src/commands/zone.rs
Outdated
| zone, | ||
| review_stage: _, | ||
| }) => { | ||
| println!("Overrode {stage} review for '{zone}'"); |
There was a problem hiding this comment.
- "Overrode" -> "Overridden"?
for '{zone}'->for '{zone}'
| ) -> Result<(), ZoneReloadError> { | ||
| let mut zone_state = zone.state.lock().expect("lock is not poisoned"); | ||
| if let Some(reason) = zone_state.halted(true) { | ||
| if let Some(reason) = zone_state.halted_reason() { |
There was a problem hiding this comment.
Right, this was the case I was thinking of when I mentioned that .halted_reason() shouldn't be easily accessible. It's not urgent, but I think this check should be moved to the HTTP server -- we just need to check where it's called from.
There was a problem hiding this comment.
Yeah that would be fine
src/zone/machine.rs
Outdated
| } | ||
| } | ||
|
|
||
| impl<'a> ZoneHandle<'a> {} |
There was a problem hiding this comment.
Surprised Clippy doesn't catch that!
Adds a state machine per zone to verify that all the operations we do to it are correct and will make it easier to report what's currently going on.
Additionally, this overhauls the halting mechanism to have 3 different halting states: a rejected loaded zone, a failed signing operation and a rejected signed zone. Each of these can be fixed with
cascade zone reset <zone>, which will put the state back to waiting. The rejected states can also be overridden withcascade zone override --(un)signed <zone>(bikesheddable).If you are changing Rust code or integration tests (
Cargo.*,crates/,etc/,integration-tests/,src/):actthrough theact-wrapper(as described inTESTING.md)?If you are adding/deleting man pages:
man_pagesconfig indoc/manual/source/conf.py?Cargo.toml?If you are modifying man pages: