Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate that rechunking is possible earlier in graph generation #10336

Merged
merged 3 commits into from Jun 12, 2023

Conversation

hendrikmakait
Copy link
Member

@hendrikmakait hendrikmakait commented Jun 7, 2023

This PR consolidates (almost) all validation and moves it to an earlier point in graph generation to avoid duplication between task-based and P2P-based rechunking (https://github.com/dask/distributed/pull/7856/files#r1205195557)

  • Tests added / passed
  • Passes pre-commit run --all-files

@github-actions github-actions bot added the array label Jun 7, 2023
if old_shape != new_shape:
if not (
math.isnan(old_shape) and math.isnan(new_shape)
) or not np.array_equal(old_dim, new_dim, equal_nan=True):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using np.array_equal avoids issues with the provenance of nan values (#10157 (comment), https://github.com/dask/distributed/pull/7856/files#r1205199004)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @wence-

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I think.

@hendrikmakait hendrikmakait added the needs review Needs review from a contributor. label Jun 7, 2023
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks overall fine, though I am confused why the test disappears.

Comment on lines -775 to -779
def test_changing_raises():
with pytest.raises(ValueError) as record:
old_to_new(((np.nan, np.nan), (4, 4)), ((np.nan, np.nan, np.nan), (4, 4)))

assert "unchanging" in str(record.value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test now no longer valid?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the test is no longer valid. It triggered validation within old_to_new, but with this change, old_to_new assumes that validation has already taken place (see changes to docstring).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the same test case in a different place where it's still valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, missed that subtlety.

if old_shape != new_shape:
if not (
math.isnan(old_shape) and math.isnan(new_shape)
) or not np.array_equal(old_dim, new_dim, equal_nan=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I think.

Comment on lines -775 to -779
def test_changing_raises():
with pytest.raises(ValueError) as record:
old_to_new(((np.nan, np.nan), (4, 4)), ((np.nan, np.nan, np.nan), (4, 4)))

assert "unchanging" in str(record.value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, missed that subtlety.

Copy link
Contributor

@milesgranger milesgranger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as far as I can see. 👍

@hendrikmakait hendrikmakait removed the needs review Needs review from a contributor. label Jun 12, 2023
@hendrikmakait hendrikmakait merged commit cb7d780 into dask:main Jun 12, 2023
24 checks passed
@hendrikmakait hendrikmakait deleted the restructure-rechunk-validation branch June 12, 2023 10:43
This function expects that the arguments have been pre-processed by
:func:`dask.array.core.normalize_chunks`. In particular any ``nan`` values should
have been replaced (and are so by :func:`dask.array.core.normalize_chunks`)
by the canonical ``np.nan``.
by the canonical ``np.nan``. It also expects that the arguments have been validated
with `_validate_rechunk` and rechunking is thus possible.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we want double backticks around _validate_rechunk for sphinx

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make note of it!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good way to lint this? 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I've run into (though I've also not specifically looked for one). Though I'd definitely use it if one existed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants