storage: when envd reconnects, report current status at latest timestamp#31535
Conversation
teskje
left a comment
There was a problem hiding this comment.
Seems fine! I agree it's not ideal with the reliance on clock synchronization between different clusterd processes.
Have you considered selecting the occurred_at time in the controller, before writing down a new status? That would make sure that times are always increasing and cannot race. It would change the semantics of occurred_at from "when the cluster learned about it" to "when the controller learned about it" but does that matter?
That's a good point, but I didn't want to futz with the semantics too much, plus if we just took the "now" timestamp on envd that also wouldn't protect against clock skew. I think the proper solution would be to use the timestamp oracle, but the controller (who writes down these status updates) doesn't have access to that right now. So it's a whole long tail of things that would have to be changed. 🙈 |
This makes it so that status updates from the latest deployment, for example when coming out of read-only mode, sort last in mz_source/sink_status_history, so get surfaced as the latest status in mz_source/sink_statuses. This fixes (or makes less likely) a bug where a failing/stalled source/sink in the old deployment would crowd out the running status of objects in the new deployment. This fix work, but doesn't feel ideal. A problem is that the raw status collections are append-only collections, so `envd` will simply append status updates to the end. One could argue that the smarts should be in `envd`/the controller. But ultimately, it would also have to append a status update that "trumps" the earlier updates, so would do the same thing that we're achieving with this change. Fixes MaterializeInc/database-issues#8863
7ef41e6 to
25101bf
Compare
|
True... Another proper solution would be to remove the |
This makes it so that status updates from the latest deployment, for example when coming out of read-only mode, sort last in mz_source/sink_status_history, so get surfaced as the latest status in mz_source/sink_statuses.
This fixes (or makes less likely) a bug where a failing/stalled source/sink in the old deployment would crowd out the running status of objects in the new deployment.
This fix work, but doesn't feel ideal. A problem is that the raw status collections are append-only collections, so
envdwill simply append status updates to the end. One could argue that the smarts should be inenvd/the controller. But ultimately, it would also have to append a status update that "trumps" the earlier updates, so would do the same thing that we're achieving with this change.Fixes MaterializeInc/database-issues#8863
Motivation
Tips for reviewer
Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel.