Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ordering when indexing a dask array with a bool dask array #5151

Merged
merged 2 commits into from Aug 5, 2019

Conversation

@jrbourbeau
Copy link
Member

commented Jul 24, 2019

This PR partially addresses #3184 and builds on #3193 . To get the correct ordering when indexing, we ravel both the array being indexed and the boolean mask when their chunk sizes are known (ravel currently isn't supported for arrays with nan chunks). In the case that there are unknown chunks, we raise an informative warning and the current behavior is maintained.

  • Tests added / passed
  • Passes black dask / flake8 dask

@jrbourbeau jrbourbeau added the array label Jul 24, 2019

"When slicing a dask array of unknown chunks with a boolean mask "
"dask array, the output array may have a different ordering "
"compared to the equivalent NumPy operation.",
stacklevel=3,

This comment has been minimized.

Copy link
@jcrist

jcrist Jul 25, 2019

Member

This seems unsafe, should we eventually error here instead (a warning is certainly better now than silently being incorrect before)? If we can't maintain order reductions are still fine, but other code may break.

This comment has been minimized.

Copy link
@jrbourbeau

jrbourbeau Jul 25, 2019

Author Member

I agree this isn't ideal. I'm slightly in favor of just raising a warning in order to not break any existing user code. The warning will at least inform them that the existing behavior doesn't match up with NumPy. Do you have other thoughts / feel strongly about warning vs. erroring?

This comment has been minimized.

Copy link
@jakirkham

jakirkham Jul 25, 2019

Member

I suppose another option would be to have a config flag that users set to get NumPy ordering. We could then warn that this only works correctly for some cases (like arrays of known size). This would mean users have to opt-in to this behavior and they are warned in advance of the consequences.

This comment has been minimized.

Copy link
@jcrist

jcrist Jul 29, 2019

Member

I'm against adding another config option - we're already swimming in a sea of config options, I think we should be opinionated in places like this. If we really do need to make this configurable, then it should be with a keyword argument to allow the non-compatible behavior (strict_ordering=False?), with the default to numpy compatible/errors (the error could point to this flag). I think the number of users a breaking change like this would affect would be small, and the risk of incorrect output is worse IMO than breaking their code. Thoughts?

This comment has been minimized.

Copy link
@jrbourbeau

jrbourbeau Jul 29, 2019

Author Member

I'd also prefer not to add a configuration option for this. Rather, treat the ordering mismatch as a bug and either warn or raise.

I think the number of users a breaking change like this would affect would be small, and the risk of incorrect output is worse IMO than breaking their code

I'm not sure how many would be affected by erroring here. A good compromise might be to warn now and elevate to an error after we've issued a release with the warning included.

This comment has been minimized.

Copy link
@jcrist

jcrist Jul 29, 2019

Member

I'm not sure how many would be affected by erroring here. A good compromise might be to warn now and elevate to an error after we've issued a release with the warning included.

A warning that's elevated to an error later would be fine with me.

This comment has been minimized.

Copy link
@jrbourbeau

jrbourbeau Jul 29, 2019

Author Member

Just added a note about the warning changing to an error in a future release

This comment has been minimized.

Copy link
@jakirkham

jakirkham Aug 5, 2019

Member

Are we happy with UserWarning or is there another warning type we should be using here?

This comment has been minimized.

Copy link
@jcrist

jcrist Aug 5, 2019

Member

We use UserWarning for most things in dask, fine by me.

@jcrist

This comment has been minimized.

Copy link
Member

commented Jul 25, 2019

cc @jakirkham, since you were active on the related issues/PRs.

@jrbourbeau

This comment has been minimized.

Copy link
Member Author

commented Jul 31, 2019

@jcrist @jakirkham just checking in. Any other comments or concerns?

@jakirkham
Copy link
Member

left a comment

One final question above (in the existing thread). Otherwise LGTM. Thanks for working on this @jrbourbeau!

@jakirkham

This comment has been minimized.

Copy link
Member

commented Aug 5, 2019

@jcrist, are you happy enough to merge here or is there more needed from your perspective?

@jcrist jcrist merged commit 4bd4ae3 into dask:master Aug 5, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jcrist

This comment has been minimized.

Copy link
Member

commented Aug 5, 2019

Thanks James

@jrbourbeau jrbourbeau deleted the Quansight-Labs:bool-indexing branch Aug 5, 2019

@jrbourbeau

This comment has been minimized.

Copy link
Member Author

commented Aug 5, 2019

Thanks for the review @jcrist @jakirkham!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.