Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multimethods for indexing routines #67

Merged
merged 14 commits into from
Aug 16, 2020

Conversation

joaosferreira
Copy link
Collaborator

This PR intends to expand unumpy's coverage of indexing routines. The multimethods added so far are the following:

Generating index arrays

  • indices
  • ix_
  • ravel_multi_index
  • unravel_index
  • diag_indices
  • diag_indices_from
  • mask_indices
  • tril_indices
  • tril_indices_from
  • triu_indices
  • triu_indices_from

Some observations:

  1. The failing tests are in the Sparse backend. Because ones creates a COO with a fill_value of 1.0, the call to the next method fails since a zero fill_value is necessary.
  2. I'm working on default implementations for ravel_multi_index and unravel_index. I'll push them as soon as I have a working prototype but tips on how to implement these are welcomed.
  3. There are three classes that should be added as well since they are part of these routines, they are c_, r_ and s_.

Note: I'll be adding more multimethods that target these routines later.

@hameerabbasi
Copy link
Collaborator

@hameerabbasi
Copy link
Collaborator

hameerabbasi commented Jul 27, 2020

  1. The failing tests are in the Sparse backend. Because ones creates a COO with a fill_value of 1.0, the call to the next method fails since a zero fill_value is necessary.

xfail the tests for this backend.

@joaosferreira
Copy link
Collaborator Author

Some notes regarding the default implementation for ravel_multi_index added in 0981d47:

  1. It iterates through the 2nd dimension of multi_index to apply the corresponding mode to each index since a tuple of modes is supported. There may be a way of doing this by iterating over dims like in this implementation which would be more efficient. If not then this implementation might be discarded because of an expensive loop.

  2. On implementing the order kwarg, when using column-major (Fortran-style) order with 2 dimensions I think you only need to swap rows with columns but for more dimensions I'm not sure.

@joaosferreira
Copy link
Collaborator Author

Same as for ravel_multi_index, the default implementation for unravel_index added in bda0ab9 is also missing the order kwarg.

@hameerabbasi
Copy link
Collaborator

For the order='F' kwarg, one just has to traverse the shape/indices in reverse. If the end-result is multiple indices, then it might also be necessary to flip them around, as you have calculated them with a reversed shape.

@joaosferreira
Copy link
Collaborator Author

The multimethods added in the last commit:

Indexing-like operations

  • take
  • take_along_axis
  • choose
  • diagonal
  • select

Inserting data into arrays

  • place
  • put
  • put_along_axis
  • putmask
  • fill_diagonal

Iterating over arrays

  • nditer
  • ndenumerate
  • ndindex
  • nested_iters

Some observations:

  1. Should as_stridedbe added? If yes place it in a separate file with a path like unumpy/lib/stride_tricks/_multimethods.py and set the mutimethod's domain to numpy.lib.stride_tricks? Similarly with Arrayterator?
  2. Less related to unumpy but I found that NumPy's insert behaves weirdly when you pass it a list with only one out-of-bounds index to its obj argument. Here is an example:
In [2]: onp.insert([1, 2, 3, 4, 5], [99], -5)                                   
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-5f700fb21b7e> in <module>
----> 1 onp.insert([1, 2, 3, 4, 5], [99], -5)

<__array_function__ internals> in insert(*args, **kwargs)

~/miniconda3/envs/unumpy/lib/python3.7/site-packages/numpy/lib/function_base.py in insert(arr, obj, values, axis)
   4575             raise IndexError(
   4576                 "index %i is out of bounds for axis %i with "
-> 4577                 "size %i" % (obj, axis, N))
   4578         if (index < 0):
   4579             index += N

TypeError: %i format: a number is required, not list

The error message is not correct. Only problem with unumpy is that some of the defaults that use insert might produce this strange error.

  1. The default for fill_diagonal seems rather simple using item assignment but this not viable because of read-only arrays. Can you think of another way of implementing it?

  2. The class flatiter can't be instantiated. What to do with classes such as this one that don't have a __call__ method? Also, it seems that c_, r_, and s_ are called with brackets so I'm guessing that these need a metaclass which overrides __getitem__.

  3. putmask seems very similar to place. Maybe the former's default should be the latter.

  4. Not sure how to fix the failing test for place in the Dask backend. Maybe it needs a custom a default.

Note: Looking into implementing other defaults as well.

@joaosferreira joaosferreira marked this pull request as ready for review August 3, 2020 10:48
@hameerabbasi
Copy link
Collaborator

hameerabbasi commented Aug 3, 2020

  1. hould as_stridedbe added? If yes place it in a separate file with a path like unumpy/lib/stride_tricks/_multimethods.py and set the mutimethod's domain to numpy.lib.stride_tricks? Similarly with Arrayterator?

stride_tricks, probably not. Arrayterator is doable.

  1. Less related to unumpy but I found that NumPy's insert behaves weirdly when you pass it a list with only one out-of-bounds index to its obj argument. Here is an example:

The best path is a good error message.

  1. The default for fill_diagonal seems rather simple using item assignment but this not viable because of read-only arrays. Can you think of another way of implementing it?

It shouldn't have a default, it requires mutability of arrays.

  1. The class flatiter can't be instantiated. What to do with classes such as this one that don't have a __call__ method? Also, it seems that c_, r_, and s_ are called with brackets so I'm guessing that these need a metaclass which overrides __getitem__.

You're correct on the ?_ classes. For flatiter, can you check what NumPy does to instantiate it?

  1. putmask seems very similar to place. Maybe the former's default should be the latter.

Correct, although I'd use a lambda to guard it, as the argument names are different. See #67 (comment)

  1. Not sure how to fix the failing test for place in the Dask backend. Maybe it needs a custom a default.

Skip it, Dask arrays are immutable.

@peterbell10 can you check this PR for CuPy?

@peterbell10
Copy link
Contributor

  1. putmask seems very similar to place. Maybe the former's default should be the latter.

Their behaviour is slightly different. putmask is similar to a[mask] = b[mask] whereas place is more like a[mask] = b.

(np.fill_diagonal, ([[0, 0], [0, 0]], 1), {}),
(np.nditer, ([[1, 2, 3]],), {}),
(np.ndenumerate, ([[1, 2], [3, 4]],), {}),
(np.ndindex, (3, 2, 1), {}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is failing because overridden_class isn't implemented for the cupy backend.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could try implement it but I will not be able to test it. CI does not report the tests in CuPy, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to just copy from the other backends.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since CI doesn't complain about CuPy I think I will add overriden_class to it in a separate PR if that's okay with you.

unumpy/_multimethods.py Show resolved Hide resolved
@@ -326,6 +346,16 @@ def test_functions_coerce_with_dtype(backend, method, args, kwargs):
(np.dsplit, ([[[1, 2], [3, 4]], [[5, 6], [7, 8]]], 2), {}),
(np.hsplit, ([[1, 2], [3, 4]], 2), {}),
(np.vsplit, ([[1, 2], [3, 4]], 2), {}),
(np.ix_, ([0, 1], [2, 4]), {}),
(np.unravel_index, ([22, 41, 37], (7, 6)), {}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs an xfail. It's broken on cupy's side.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What reason message do you think it's appropriate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"cupy.unravel_index is broken in version 6.0"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ix_ is working or broken as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ix_ works.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what error is being raised by cupy.unravel_index (e.g., ValueError, TypeError)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypeError.

@joaosferreira
Copy link
Collaborator Author

You're correct on the ?_ classes. For flatiter, can you check what NumPy does to instantiate it?

According to the docs, it's instantiated by x.flat for an array x.

Skip it, Dask arrays are immutable.

By skipping do you mean adding it to the exceptions list, xfail or removing it entirely?

@hameerabbasi
Copy link
Collaborator

Add it to the exceptions list.

@joaosferreira
Copy link
Collaborator Author

Add it to the exceptions list.

Adding it to the list still produces the error. Maybe xfailing it?

@hameerabbasi
Copy link
Collaborator

It seems most likely a compute_blocks call is needed somewhere.

@joaosferreira
Copy link
Collaborator Author

It seems most likely a compute_blocks call is needed somewhere.

Not sure what you mean.

@hameerabbasi
Copy link
Collaborator

Not sure what you mean.

My bad, it was compute_chunk_sizes.

nonzero_idxs = unumpy.nonzero(filt)[0].compute_chunk_sizes()

You need the same approach taken for trim_zeros.

@joaosferreira
Copy link
Collaborator Author

joaosferreira commented Aug 4, 2020

You need the same approach taken for trim_zeros.

Dask doesn't have delete and copyto which are needed for Dask's implementation of place. Use Numpy's methods instead (not sure if this works)? If not then I think the implementation will just return NotImplemented which is not necessarily an issue.

Edit: Actually if NotImplemented is returned in the custom implementation then unumpy falls back to the default which is what we want to avoid in the first place. In the end maybe we should just xfail it.

if method is np.s_:
assert isinstance(ret, slice)
else:
assert isinstance(ret, onp.ndarray)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting, do these return numpy.ndarrays for other backends as well? If so, that seems to be a bug in the backends.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These classes are not dispatching their arguments and since the backends don't have them (at least Dask and Sparse) unumpy overrides them to NumPy classes which in turn convert the arguments into numpy.ndarrays. I think that they should dispatch their arguments but either way the NumPy classes will be operating on ndarrays from other backends (it works in some cases). Other option could be to just return NotImplemented.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will debug this, see what's going wrong, but if what you say is correct, NotImplemented is the way to go.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe create a list of not implemented classes in each backend and inside overridden_class check if self is in that list, if yes then return NotImplemented. I tried dispatching the list-like arguments in __getitem__. The Dask tests pass because of this behaviour:

In [4]: a = da.from_array(onp.asarray([1, 2, 3]))                               

In [5]: res = onp.c_[a]                                                         
/Users/joaoferreira/miniconda3/envs/unumpy/lib/python3.7/site-packages/dask/array/core.py:1356: FutureWarning: The `numpy.ndim` function is not implemented by Dask array. You may want to use the da.map_blocks function or something similar to silence this warning. Your code may stop working in a future release.
  FutureWarning,

In [6]: res                                                                     
Out[6]: 
array([[1],
       [2],
       [3]])

In [7]: type(res)                                                               
Out[7]: numpy.ndarray

And Sparse tests fail because of this:

In [8]: a = sparse.as_coo(onp.asarray([1, 2, 3]))                               

In [9]: res = onp.c_[a]                                                         
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-9-f3f273ef9019> in <module>
----> 1 res = onp.c_[a]

~/miniconda3/envs/unumpy/lib/python3.7/site-packages/numpy/lib/index_tricks.py in __getitem__(self, key)
    385             else:
    386                 item_ndim = ndim(item)
--> 387                 newobj = array(item, copy=False, subok=True, ndmin=ndmin)
    388                 if trans1d != -1 and item_ndim < ndmin:
    389                     k2 = ndmin - item_ndim

~/miniconda3/envs/unumpy/lib/python3.7/site-packages/sparse/_sparse_array.py in __array__(self, **kwargs)
    221         if not AUTO_DENSIFY:
    222             raise RuntimeError(
--> 223                 "Cannot convert a sparse array to dense automatically. "
    224                 "To manually densify, use the todense method."
    225             )

RuntimeError: Cannot convert a sparse array to dense automatically. To manually densify, use the todense method.

Even if they all passed I'm guessing that we want to avoid calling NumPy's methods/classes with ndarrays from other backends.

Copy link
Collaborator Author

@joaosferreira joaosferreira Aug 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed by the latest commits. The classes that are not present in a given backend return NotImplemented.

Note: I had to overwrite _unwrapped in the call methods because when the backends returned NotImplemented the classes were defaulting to _unwrapped at the end of the call stack which made them fail.

@joaosferreira
Copy link
Collaborator Author

I will fast-forward merge master if that's okay with you. That way I can keep working on this PR knowing that #72 fixes some of its issues.

@joaosferreira
Copy link
Collaborator Author

This should be good to merge now.

@hameerabbasi hameerabbasi merged commit f5a1d62 into Quansight-Labs:master Aug 16, 2020
@hameerabbasi
Copy link
Collaborator

Thanks @joaosferreira!

@joaosferreira joaosferreira deleted the indexing-routines branch August 18, 2020 16:00
joaosferreira added a commit to joaosferreira/unumpy that referenced this pull request Aug 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants