Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ordering when indexing a dask array with a bool dask array #5151

Merged
merged 2 commits into from Aug 5, 2019

Conversation

jrbourbeau
Copy link
Member

@jrbourbeau jrbourbeau commented Jul 24, 2019

This PR partially addresses #3184 and builds on #3193 . To get the correct ordering when indexing, we ravel both the array being indexed and the boolean mask when their chunk sizes are known (ravel currently isn't supported for arrays with nan chunks). In the case that there are unknown chunks, we raise an informative warning and the current behavior is maintained.

  • Tests added / passed
  • Passes black dask / flake8 dask

"When slicing a dask array of unknown chunks with a boolean mask "
"dask array, the output array may have a different ordering "
"compared to the equivalent NumPy operation.",
stacklevel=3,
Copy link
Member

@jcrist jcrist Jul 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unsafe, should we eventually error here instead (a warning is certainly better now than silently being incorrect before)? If we can't maintain order reductions are still fine, but other code may break.

Copy link
Member Author

@jrbourbeau jrbourbeau Jul 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this isn't ideal. I'm slightly in favor of just raising a warning in order to not break any existing user code. The warning will at least inform them that the existing behavior doesn't match up with NumPy. Do you have other thoughts / feel strongly about warning vs. erroring?

Copy link
Member

@jakirkham jakirkham Jul 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose another option would be to have a config flag that users set to get NumPy ordering. We could then warn that this only works correctly for some cases (like arrays of known size). This would mean users have to opt-in to this behavior and they are warned in advance of the consequences.

Copy link
Member

@jcrist jcrist Jul 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm against adding another config option - we're already swimming in a sea of config options, I think we should be opinionated in places like this. If we really do need to make this configurable, then it should be with a keyword argument to allow the non-compatible behavior (strict_ordering=False?), with the default to numpy compatible/errors (the error could point to this flag). I think the number of users a breaking change like this would affect would be small, and the risk of incorrect output is worse IMO than breaking their code. Thoughts?

Copy link
Member Author

@jrbourbeau jrbourbeau Jul 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also prefer not to add a configuration option for this. Rather, treat the ordering mismatch as a bug and either warn or raise.

I think the number of users a breaking change like this would affect would be small, and the risk of incorrect output is worse IMO than breaking their code

I'm not sure how many would be affected by erroring here. A good compromise might be to warn now and elevate to an error after we've issued a release with the warning included.

Copy link
Member

@jcrist jcrist Jul 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how many would be affected by erroring here. A good compromise might be to warn now and elevate to an error after we've issued a release with the warning included.

A warning that's elevated to an error later would be fine with me.

Copy link
Member Author

@jrbourbeau jrbourbeau Jul 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added a note about the warning changing to an error in a future release

Copy link
Member

@jakirkham jakirkham Aug 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we happy with UserWarning or is there another warning type we should be using here?

Copy link
Member

@jcrist jcrist Aug 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use UserWarning for most things in dask, fine by me.

@jcrist
Copy link
Member

@jcrist jcrist commented Jul 25, 2019

cc @jakirkham, since you were active on the related issues/PRs.

@jrbourbeau
Copy link
Member Author

@jrbourbeau jrbourbeau commented Jul 31, 2019

@jcrist @jakirkham just checking in. Any other comments or concerns?

Copy link
Member

@jakirkham jakirkham left a comment

One final question above (in the existing thread). Otherwise LGTM. Thanks for working on this @jrbourbeau!

@jakirkham
Copy link
Member

@jakirkham jakirkham commented Aug 5, 2019

@jcrist, are you happy enough to merge here or is there more needed from your perspective?

@jcrist jcrist merged commit 4bd4ae3 into dask:master Aug 5, 2019
2 checks passed
@jcrist
Copy link
Member

@jcrist jcrist commented Aug 5, 2019

Thanks James

@jrbourbeau jrbourbeau deleted the bool-indexing branch Aug 5, 2019
@jrbourbeau
Copy link
Member Author

@jrbourbeau jrbourbeau commented Aug 5, 2019

Thanks for the review @jcrist @jakirkham!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants