New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate that rechunking is possible earlier in graph generation #10336
Validate that rechunking is possible earlier in graph generation #10336
Conversation
if old_shape != new_shape: | ||
if not ( | ||
math.isnan(old_shape) and math.isnan(new_shape) | ||
) or not np.array_equal(old_dim, new_dim, equal_nan=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using np.array_equal
avoids issues with the provenance of nan values (#10157 (comment), https://github.com/dask/distributed/pull/7856/files#r1205199004)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @wence-
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks overall fine, though I am confused why the test disappears.
def test_changing_raises(): | ||
with pytest.raises(ValueError) as record: | ||
old_to_new(((np.nan, np.nan), (4, 4)), ((np.nan, np.nan, np.nan), (4, 4))) | ||
|
||
assert "unchanging" in str(record.value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this test now no longer valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the test is no longer valid. It triggered validation within old_to_new
, but with this change, old_to_new
assumes that validation has already taken place (see changes to docstring).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the same test case in a different place where it's still valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, missed that subtlety.
if old_shape != new_shape: | ||
if not ( | ||
math.isnan(old_shape) and math.isnan(new_shape) | ||
) or not np.array_equal(old_dim, new_dim, equal_nan=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I think.
def test_changing_raises(): | ||
with pytest.raises(ValueError) as record: | ||
old_to_new(((np.nan, np.nan), (4, 4)), ((np.nan, np.nan, np.nan), (4, 4))) | ||
|
||
assert "unchanging" in str(record.value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, missed that subtlety.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good as far as I can see. 👍
This function expects that the arguments have been pre-processed by | ||
:func:`dask.array.core.normalize_chunks`. In particular any ``nan`` values should | ||
have been replaced (and are so by :func:`dask.array.core.normalize_chunks`) | ||
by the canonical ``np.nan``. | ||
by the canonical ``np.nan``. It also expects that the arguments have been validated | ||
with `_validate_rechunk` and rechunking is thus possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we want double backticks around _validate_rechunk
for sphinx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make note of it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a good way to lint this? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I've run into (though I've also not specifically looked for one). Though I'd definitely use it if one existed!
This PR consolidates (almost) all validation and moves it to an earlier point in graph generation to avoid duplication between task-based and P2P-based rechunking (https://github.com/dask/distributed/pull/7856/files#r1205195557)
pre-commit run --all-files