-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add item method to Dask Array #3630
base: main
Are you sure you want to change the base?
Conversation
dask/array/core.py
Outdated
else: | ||
args = np.unravel_index(args[0], self.shape) | ||
|
||
return self[args].astype(object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a size 0 scalar dask array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From playing around with it locally, seems to be.
Though now that you ask this am thinking we need a check to ensure args
has the same length as self.ndim
.
Are there any other cases we should be checking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The slicing gets the scalar. The object
conversion seems to be needed due to a type mismatch that occurs otherwise between the NumPy and Dask implementations (e.g. float
vs. float64
).
Adds the `item` method for Dask Arrays much like the same named method in NumPy. Can turn an N-D singleton into a scalar. Also can allow indexing a single value with raveled or unraveled coordinates. Converts the result to a Python object consistent with the way that NumPy behaves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general the approach seems sensible to me. One small comment.
dask/array/tests/test_array_core.py
Outdated
for s in range(ndim): | ||
a = np.ones(s * (1,)) | ||
d = da.from_array(a, chunks=1) | ||
assert_eq(a.item(), d.item()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to expand assert_eq
to also check that the type of the two sides is equal after computing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWICT that appears to already happening. Originally had a type mismatch before adding astype(object)
above, which this test caught. So added that in.
Compare the results of `item` on NumPy Arrays and Dask Arrays to ensure they both act the same.
Included testing of a few more types for good measure. |
If we have a scalar dask array, what can it compute to? Now it looks like
the answer includes: numpy scalars, Python scalars and 0d numpy arrays? It
might be better to use dask.delayed objects here, at least for Python
scalars.
…On Sun, Jun 17, 2018 at 10:38 AM jakirkham ***@***.***> wrote:
Included testing of a few more types for good measure.
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#3630 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1mX12gZiuROmZtT7_JqF3Y1MBdTSks5t9pQjgaJpZM4UqtGJ>
.
|
Would it be ok if we returned a NumPy scalar instead? IOW dropping the |
Thoughts? |
In my view, the main use of |
Checking in, what's the status here? |
There appears to be some disagreement over what the return type should be. |
It looks like the reason to provide a Python scalar is because that's what Numpy does (this makes sense to me). I may have missed it above, but is there an argument for why to return a Numpy scalar? If not, then I suggest that we return a Python scalar. |
FWIW, #3630 (comment) makes sense to me too. I've only ever used |
Yeah understood. It's just unfortunate that all of the relevant metadata gets lost in the process of going to |
Looks like the tests will have to be updated. |
@jakirkham are you interested in finishing up this one up? It seems pretty close. |
I updated to return the a In [4]: da.ones(1)
Out[4]: dask.array<ones, shape=(1,), dtype=float64, chunksize=(1,), chunktype=numpy.ndarray>
In [5]: da.ones(1).item()
Out[5]: Delayed('item-4d13499d-428c-4164-bf69-752d2d330e48')
In [6]: _.compute()
Out[6]: 1.0
In [7]: type(_)
Out[7]: float
Yes, it is lost which may be unfortunate. But I think in the typical use-case, this is OK. If people want a Python object back, then they're probably OK losing things like the dtype. |
Is this ready for merge? |
We should probably wait for a +1 from @jakirkham, since he had some reservations about this behavior. |
re-pinging @jakirkham to see if you're OK with the changes I pushed to your branch in 20529cc. |
Hi @jakirkham are you happy with what Tom has added here? It seems like this one is very close to being merged. |
Fixes #2959
Adds an
item
method for Dask Arrays equivalent to NumPy ndarray'sitem
method.flake8 dask