Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resumable jobs #11

Closed
tomwhite opened this issue May 28, 2022 · 1 comment
Closed

Resumable jobs #11

tomwhite opened this issue May 28, 2022 · 1 comment

Comments

@tomwhite
Copy link
Member

During the course of running a job intermediate output is written to Zarr files. If a job is interrupted it could be resumed by continuing from the point at which it was stopped.

Cubed Array objects can be pickled, so it is already possible to reload them (and the underlying Plan object containing the DAG) as long as they have been saved first.

To detect where in the DAG the resume from, we can skip any intermediate Zarr files that have all of their chunks written. The check we need to do is nchunks_initialized == nchunks. (Since we specify that write_empty_chunks is True we know that every chunk will be written out, even if it composed of empty fill values.)

In the future it might be possible to resume at the level of individual chunks, but for now starting at the level of a Zarr array (and rewriting any chunks that have already been written) is sufficient.

This depends on #10 for testing - we can use it to see how many tasks actually ran, to check that earlier arrays were not re-written after resuming a partially completed job.

@tomwhite
Copy link
Member Author

Fixed in c66edff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant