Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add item method to Dask Array #3630

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

jakirkham
Copy link
Member

@jakirkham jakirkham commented Jun 17, 2018

Fixes #2959

Adds an item method for Dask Arrays equivalent to NumPy ndarray's item method.

  • Tests added / passed
  • Passes flake8 dask

else:
args = np.unravel_index(args[0], self.shape)

return self[args].astype(object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a size 0 scalar dask array?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From playing around with it locally, seems to be.

Though now that you ask this am thinking we need a check to ensure args has the same length as self.ndim.

Are there any other cases we should be checking?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The slicing gets the scalar. The object conversion seems to be needed due to a type mismatch that occurs otherwise between the NumPy and Dask implementations (e.g. float vs. float64).

Adds the `item` method for Dask Arrays much like the same named method
in NumPy. Can turn an N-D singleton into a scalar. Also can allow
indexing a single value with raveled or unraveled coordinates. Converts
the result to a Python object consistent with the way that NumPy
behaves.
Copy link
Member

@mrocklin mrocklin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the approach seems sensible to me. One small comment.

for s in range(ndim):
a = np.ones(s * (1,))
d = da.from_array(a, chunks=1)
assert_eq(a.item(), d.item())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to expand assert_eq to also check that the type of the two sides is equal after computing?

Copy link
Member Author

@jakirkham jakirkham Jun 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWICT that appears to already happening. Originally had a type mismatch before adding astype(object) above, which this test caught. So added that in.

Compare the results of `item` on NumPy Arrays and Dask Arrays to ensure
they both act the same.
@jakirkham
Copy link
Member Author

Included testing of a few more types for good measure.

@shoyer
Copy link
Member

shoyer commented Jun 17, 2018 via email

@jakirkham
Copy link
Member Author

Would it be ok if we returned a NumPy scalar instead? IOW dropping the .astype(object).

@jakirkham
Copy link
Member Author

Thoughts?

@shoyer
Copy link
Member

shoyer commented Jun 19, 2018

In my view, the main use of ndarray.item() is convert a 0d array into a Python scalar. So I would be inclined to create a dask.delayed object holding a Python scalar, not a 0d dask array.

@mrocklin
Copy link
Member

Checking in, what's the status here?

@jakirkham
Copy link
Member Author

There appears to be some disagreement over what the return type should be.

@mrocklin
Copy link
Member

There appears to be some disagreement over what the return type should be.

It looks like the reason to provide a Python scalar is because that's what Numpy does (this makes sense to me). I may have missed it above, but is there an argument for why to return a Numpy scalar?

If not, then I suggest that we return a Python scalar.

@TomAugspurger
Copy link
Member

TomAugspurger commented Aug 8, 2019

FWIW, #3630 (comment) makes sense to me too. I've only ever used .item() to convert a NumPy object to a Python scalar.

@jakirkham
Copy link
Member Author

Yeah understood. It's just unfortunate that all of the relevant metadata gets lost in the process of going to Delayed.

@TomAugspurger
Copy link
Member

Looks like the tests will have to be updated.

@jsignell
Copy link
Member

@jakirkham are you interested in finishing up this one up? It seems pretty close.

@TomAugspurger
Copy link
Member

I updated to return the a Delayed object that holds the Python type. Hope that's OK @jakirkham

In [4]: da.ones(1)
Out[4]: dask.array<ones, shape=(1,), dtype=float64, chunksize=(1,), chunktype=numpy.ndarray>

In [5]: da.ones(1).item()
Out[5]: Delayed('item-4d13499d-428c-4164-bf69-752d2d330e48')

In [6]: _.compute()
Out[6]: 1.0

In [7]: type(_)
Out[7]: float

It's just unfortunate that all of the relevant metadata gets lost in the process of going to Delayed.

Yes, it is lost which may be unfortunate. But I think in the typical use-case, this is OK. If people want a Python object back, then they're probably OK losing things like the dtype.

@jsignell
Copy link
Member

Is this ready for merge?

@TomAugspurger
Copy link
Member

We should probably wait for a +1 from @jakirkham, since he had some reservations about this behavior.

@TomAugspurger
Copy link
Member

re-pinging @jakirkham to see if you're OK with the changes I pushed to your branch in 20529cc.

Base automatically changed from master to main March 8, 2021 20:19
@GenevieveBuckley
Copy link
Contributor

Hi @jakirkham are you happy with what Tom has added here? It seems like this one is very close to being merged.

@scharlottej13 scharlottej13 added the enhancement Improve existing functionality or make things work better label Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
array enhancement Improve existing functionality or make things work better
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implementing item method for Dask Arrays
7 participants