-
Notifications
You must be signed in to change notification settings - Fork 36
introduce resumable backups #492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
26afc6f to
f33f408
Compare
tstirrat15
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, had a question
f33f408 to
4eb56d4
Compare
derives the file name from the zed context name
adds a flag that allows the user defining the number of relationships they'd like to be sent per bulk export page
4eb56d4 to
01a2035
Compare
this commit changes the backup create command so that a canceled backup can be resumed. A new marker file is added with the last written bulk export cursor. This was not added to the OCF file because OCF container is not meant for in-place updates, but streaming append operations. The marker is updated every time a bulk export page is successfully written. For convenience the command now also supports creating a backup without a file name: in that case it will derive the backup file name from the zed context name. The new logic detects several scenarios like: - a backup exists, but no marker exists (meaning it completed) - backup does not exist, but marker is left behind (it gets truncated) The marker file will be removed only when the backup completes. This new logic does not guarantee relationships may not get duplicated. When terminated gracefully the system should write the last page received and close and flush the files. But if the process was abruptly terminated (e.g. SIGKILL) it could lead to relationships being written to the OCF file, but the marker not being updated.
01a2035 to
b13830c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker, but I was able to get an error by running against SpiceDB seeded using the test data:
ERR terminated with errors error="error receiving relationships: %!w(<nil>): set flag --max-message-size=9793576 to increase the maximum allowable size
- print backup stats always, regardless termination reason - improve handling of ResourceExhausted error, instead of fixing all exit paths, wrap the command function so the error is always evaluated. - fixes malformed error message due to error variable overlap - made resilient to existing backup with empty marker file
@alecmerdler this is a known issue, which can be addressed by setting the I fixed the confusing error message in cea939a, thanks for reporting it! |
| module github.com/authzed/zed | ||
|
|
||
| go 1.23.8 | ||
| go 1.24 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be in a separate pull request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no because I needed it for t.Context()
alecmerdler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix!
This PR modifies the backup create command to allow an incomplete backup to be resumed.
A new marker file is added with the last written bulk export cursor. This was not added to the OCF file because OCF container is not meant for in-place updates, but streaming append operations. The marker is updated every time a bulk export page is successfully written.
For convenience, the command now also supports creating a backup without a file name; in this case, it will derive the backup file name from the Zed context name. A new
page-limitflag has also been added to allow users to specify the number of relationships to ingest per page. This may be useful as a mechanism to throttle the number of relationships read from the datastore.The new logic detects several scenarios like:
The marker file will be removed only when the backup completes.
This new logic does not guarantee that relationships will not be duplicated. When terminated gracefully, the system should write the last page received, close, and flush the files. But if the process was abruptly terminated (i.e. SIGKILL) it could lead to relationships being written to the OCF file, but the marker not being updated.