-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move read_hdf to Blockwise #7625
Conversation
Thanks @rjzamora! @ian-r-rose could you take a look at this when you get a moment? |
lock=None, | ||
mode="a", | ||
): | ||
class HDFFunctionWrapper: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would love to be able to use a Protocol
here. Some day...
@@ -123,6 +125,16 @@ def test_to_hdf_multiple_nodes(): | |||
out = dd.read_hdf(fn, "/data*") | |||
assert_eq(df16, out) | |||
|
|||
# Test getitem optimization | |||
with tmpfile("h5") as fn: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
if division and global_divisions: | ||
global_divisions = global_divisions[:-1] + division | ||
elif division: | ||
global_divisions = division |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch, this is tricky. Can we make the divisions global to begin with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this PR is just a refactor, most of the actual logic is just a copy of what already existed (iow, this is not my code, and I haven't thought through it carefully). With that said, I can try to revist this later and simplify the logic a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can appreciate not wanting to mess with that logic too much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delayed response here @rjzamora. Would you mind taking care of the merge conflict here and then we can merge this in
No worries! Conflict should be resolved. I will also fix any problems if a fresh ci run turns up any failures. |
I'm going to merge this since it's passing CI :) |
Superceeds #7284 (since that PR is now quite stale)
Follows same approach as #7415 and #7615 to use Blockwise for Dask-Dataframe's read_hdf. A slight refactor of the original logic was required.