Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for __array_function__ protocol #4567
Add support for __array_function__ protocol #4567
Changes from 12 commits
5f2efe2
a8f706c
ab7c79f
eae9515
8c09a19
4e385a7
017e0da
6ff03a5
3c87f00
4b7ef9b
c49945f
9441111
9d129f3
990472c
025c4f7
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in many cases a dask array function will work with numpy objects (or presumably anything on which it can effectively call
asarray
. I suggest a test likenp.tensordot(dask_array, numpy_array)
and replacing theall
above with anany
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
__array_function__
protocol guarantees that it will only get called ifdask.array.Array
is in types. So if that's all you wanted to check, you could drop this entirely.The problem is that you want to support operations involving some but not all other array types. For example,
xarray.DataArray
wraps dask arrays, not the other way around. Sodask.Array.__array_function__
should returnNotImplemented
in that case, leaving it upxarray.DataArray.__array_function__
to define the operation.We basically need some protocol or registration system to mark arrays as "can be coerced into dask array", e.g., either
or
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, that's an interesting problem. Maybe we try to turn all of the args into dask arrays with
da.asarray
and, if that works, proceed?I'm not sure that this would produce the correct behavior for
xarray.DataArray
though. I genuinely don't know what the right behavior is there.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think my idea for mixins can help with this. Specifically, we could check the specific mixins that Dask absolutely requires, I’m assuming that’d be the indexing mixin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually like this idea for its simplicity.
I also like this idea in general, but I think then it would be best before to agree on all mixins we expect to have/support. Plus, if I understood correctly, it feels like this will require much more type checking, which may become complicated to handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that that would only help to address the second point above. I'm not sure that it would solve it though. As a worst case scenario I suspect that a project like TensorFlow would not inherit from those classes. We could get NumPy, Dask, Sparse, and Xarray on board but I'm not sure about any of the others out there.
I personally think that duck typing is more likely to work across projects than explicit typing.
Mixins may still be a good idea for the reason you mention, you can easily implement a lot of NumPy-like functionality without too much effort, but I don't think that they're a good choice to test for Numpy-ness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree that it's hard to get traction for mixins or add dependencies, but adding a protocol isn't hard.
While you're right that TensorFlow doesn't plan on implementing any of these protocols, it has always historically done its own thing, and it has its own distributed system. I suspect that with Google's historical "rewrite the world and do it better" philosophy, it'll be hard to get them to play along anyway.
In contrast, anyone specifically aiming to be a NumPy duck array won't have a problem implementing protocols, and that is what I'm planning to do, to allow mixins to implement methods and so forth based on the implementation of protocols.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the time being, I've used @shoyer and @mrocklin suggestions replacing only that last condition in
__array_function__
to:It seems to me that the mixin discussion has grown enough that we should move it to a Dask issue, discussed there and later review the current code. Anybody disagrees with my proposal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since nobody opposed, I moved the discussion to #4583.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that with @shoyer 's statements below we can drop this conditional entirely. It should always pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this change needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because in this, the right argument of
assert_eq()
is used as baseline, andx
being a NumPy array, it fails here:Therefore, to me it makes sense to make both arguments NumPy arrays and let it fallback to NumPy's implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reverse should be done, with
weights
being coerced toda.Array
with the same chunks asx
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. I think I agree with @pentschev here. The left argument,
e
, is testing the dask/dask situation while the right argument is testing the numpy/numpy situation.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would cause
da.bincount()
to be dispatched, then we would be testing againste
which computes exactly that.