-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correctly handle deletion of empty datasets #248
Correctly handle deletion of empty datasets #248
Conversation
9c166a5
to
8b8e9f6
Compare
It's been a while, but it seems correct. If you want to be extra sure you could try testing more extensive combinations of different versions including or not including the dataset with and without non-fill data. I do rather wish I had made use of hypothesis when developing this library (#239), as its state machine strategies would be a perfect fit for extensively testing this sort of thing. It's something I would heavily suggest adding if we ever decide to add any significant new functionality. |
8b8e9f6
to
4897d17
Compare
I really like the idea of making hypothesis do the heavy lifting for the tests. I'm going to try to spend a little more time doing some general upkeep on this library, so there's a good chance I'll do this in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I defer to @asmeurer, looks good to me, though!
Your observation that empty datasets are "special" and cannot be virtual seems to explain what's going on here. I think whenever we ran into this problem it did involve versions which were empty datasets.
This PR modifies how empty datasets are deleted, closing #244.
Changes
delete_versions
and_recreate_raw_data
to help separate logic a little more. I don't think we gain anything by redefining_walk
,_get_np_fillvalue
, and_delete_dataset
during each call, but I can change this back if it's just confusing.This is done because empty datasets cannot be virtual; see 85507e9. In other words, empty datasets at
_version_data/versions/<version>/<name>
are real, and since they don't refer to data in_version_data/<name>/raw_data
as they would if they held actual data, they can safely just be deleted. As far as I can tell, this is a limitation ofh5py
andhdf5
being unable to store empty virtual datasets.