Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate CuPy after NEP-18 support #4462

Closed
mrocklin opened this issue Feb 6, 2019 · 34 comments
Closed

Investigate CuPy after NEP-18 support #4462

mrocklin opened this issue Feb 6, 2019 · 34 comments
Labels

Comments

@mrocklin
Copy link
Member

mrocklin commented Feb 6, 2019

Building off of #3007 it would be good to further investigate CuPy, especially after they added support for NEP-18 in cupy/cupy#1650 .

To investigate this further, I recommend the following steps

  1. Install cupy with that change installed (depending on when the last release was, this might only be in development versions)

  2. install dask

  3. Allocate a cupy array of random data

    import cupy
    x = cupy.random.random((1000, 1000))
  4. Split that array into a dask array

    import dask.array as da
    d = da.from_array(x, chunks=(100, 100), asarray=False)
  5. Start trying out Numpy syntax, and see what breaks

    x.sum().compute()
    x.mean().compute()
    (x + x.T).sum(axis=1).compute()
    ...  # there are many other things that we could do here
    u, s, v = da.linalg.svd(x)
    s.compute()

Probably after this we will learn about many flaws either in Dask Array, CuPy, or both.

cc @zronaghi @pentschev

@pentschev
Copy link
Member

This should work after my PR cupy/cupy#2026

I know the example given in the description is just a reference, but since some corrections are necessary, below is a fully functional sample (after the PR mentioned above is merged):

import cupy
x = cupy.random.random((5000, 1000))

import dask.array as da
d = da.from_array(x, chunks=(1000, 1000), asarray=False)

x.sum()
x.mean()
(x[:1000] + x[:1000].T).sum(axis=1)

u, s, v = da.linalg.svd(d)
s.compute()

@mrocklin
Copy link
Member Author

That's pretty exciting! I'll add a couple comments to the CuPy PR.

Some follow ups:

  1. We'll also need to call .compute() at the end of the other computations like x.mean() as well. This triggers the actual computation and verifies that things work at runtime.
  2. It would be interesting to characterize the performance difference for the SVD implementation between using cupy and numpy. At some point we should probably also try the distributed scheduler, which has more diagnostics. See http://docs.dask.org/en/latest/diagnostics-distributed.html and the youtube video within that page.
  3. It would be useful to build up to the algorithms in dask-glm and see if any of them can be made to work easily. Here is an example: http://examples.dask.org/machine-learning/glm.html

@mrocklin
Copy link
Member Author

A, whoops. @pentschev mentioned in side-channels that I probably meant d.mean().compute() above. The .compute() method is a dask method that turns a lazy Dask thing into a concrete Numpy or CuPy thing.

@pentschev
Copy link
Member

Currently, d.mean().compute() fails:

Traceback (most recent call last):
  File "dask_ex.py", line 9, in <module>
    d.mean().compute()
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/base.py", line 156, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/base.py", line 398, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/threaded.py", line 76, in get
    pack_exception=pack_exception, **kwargs)
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/local.py", line 459, in get_async
    raise_exception(exc, tb)
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/compatibility.py", line 112, in reraise
    raise exc
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/local.py", line 230, in execute_task
    result = _execute_task(task, data)
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/core.py", line 119, in _execute_task
    return func(*args2)
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/optimization.py", line 942, in __call__
    dict(zip(self.inkeys, args)))
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/core.py", line 149, in get
    result = _execute_task(task, cache)
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/core.py", line 119, in _execute_task
    return func(*args2)
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/compatibility.py", line 93, in apply
    return func(*args, **kwargs)
  File "/home/nfs/pentschev/.local/lib/python3.5/site-packages/dask/array/reductions.py", line 336, in mean_chunk
    result['n'] = n
ValueError: object __array__ method not producing an array

@mrocklin
Copy link
Member Author

So the relevant function is here:

def mean_chunk(x, sum=chunk.sum, numel=numel, dtype='f8', **kwargs):
    n = numel(x, dtype=dtype, **kwargs)
    total = sum(x, dtype=dtype, **kwargs)
    empty = empty_lookup.dispatch(type(n))
    result = empty(n.shape, dtype=[('total', total.dtype), ('n', n.dtype)])
    result['n'] = n
    result['total'] = total
    return result

It looks like we have our own custom empty_dispatch object, which will probably end up creating a numpy array. Then when we assign a cupy array into that array cupy complains because it doesn't like coercing itself into Numpy. The empty_dispatch dispatcher was probably made before NEP-18 arrived.

In principle, we probably want to use np.empty_like instead, which takes an array object and dispatches based on that (either numpy or cupy or sparse or whatever) and the NEP-18 dispatching magic can take over.

However this requires some finesse because we still need to support non-NEP-18 enabled installs (which are currently the majority). I'm not yet sure what the right approach is here yet.

@pentschev
Copy link
Member

@mrocklin thanks for the detailed description. I will try to dive a bit into the code and see if I can suggest a fix that covers both cases. Is it safe to assume that existing tests cover the pre-NEP-18 case?

@mrocklin
Copy link
Member Author

You should make sure that you also have the sparse library installed, which is another numpy-like array that we like to support

pip install sparse

@mrocklin
Copy link
Member Author

I also wouldn't dive too deeply into the mean issue if it's proving to be a rabbit hole. I think it's probably more valuable to search broadly first, uncover many issues, and then we can think a bit more holistically about how to handle things. There are others we can also rely on when diving in deeply here.

@mrocklin
Copy link
Member Author

Here is a minimal example with dask-glm, which I think serves as a good general application here.

import numpy as np
import dask.array as da

X = np.random.random((1000, 10))  # ideally we swap this out with cupy
dX = da.from_array(X, chunks=(100, 10))
y = np.ones(1000)  # ideally we swap this out with cupy
dy = da.from_array(y, chunks=(100,))

import dask_glm.algorithms
dask_glm.algorithms.admm(dX, dy, max_iter=5)
dask_glm.algorithms.proximal_grad(dX, dy, max_iter=5)

@pentschev
Copy link
Member

After NEP-18, Dask will be capable of working on arrays from NumPy-like libraries, e.g., CuPy. For such unimplemented methods, such as argtopk() and take_along_axis() (a dependency of the former), a TypeError will simply be emitted:

TypeError: no implementation found for 'numpy.take_along_axis' on types that implement __array_function__: [<class 'cupy.core.core.ndarray'>]

I think a discussion on handling this case is important for the near future.

@mrocklin
Copy link
Member Author

I see two general approaches to situations like these:

  1. Implement take_along_axis in cupy, either by asking them to implementing it, or implementing it ourselves. Raising it as an issue is probably useful regardless
  2. Change Dask code to avoid specialized functions in favor of more generic and commonly implemented ones. If there is not a significant performance or complexity concern then I suspect that this is fine.

@pentschev
Copy link
Member

Please, don't get too attached to the example I presented, it was randomly chosen. Implementing all missing functions is probably out of question, there's hundreds of them:

https://docs-cupy.chainer.org/en/stable/reference/comparison.html

@mrocklin
Copy link
Member Author

mrocklin commented Feb 13, 2019

Understood. The two approaches above are probably the same two approaches for other cases as well. If it's easy to add to cupy then lets do it. Otherwise lets see if we can easily avoid the operation in Dask Array. Either approach is probably slightly useful outside of this endeavor as well.

@pentschev
Copy link
Member

From my latest bug hunt I've found some new issues, starting with those directly related to lack of fully NumPy-compliant implementations in CuPy:

  • cupy.einsum(): doesn’t implement things like order= and casting=, Dask sets them and CuPy therefore fails for not recognizing such arguments
  • cupy.histogram(): doesn’t implement weights= and fail due to Dask passing it. Additionally, several other arguments don’t exist, making it a capped implementation of NumPy

For the two issues mentioned above, the options I see for now are:

  1. Open issues on GitHub (but they are likely not to be implemented in quite some time)
  2. Implement them ourselves, eventually (may take a lot of time as well)
  3. Warn that Dask+CuPy doesn’t work for those functions
  • dask.array.eye() , dask.array.fromfunction: should we add *_like() (as in empty_like()) to allow the creation of Dask arrays based on CuPy?

Please note that I'm mentioning CuPy only here because that's what I've been working on mostly, but all these issues will eventually happen when we want to work with other NumPy-like libraries. Also, all issues mentioned above are not an extensive list, just the ones I've found so far, they will most likely happen for other functions too.

@shoyer @jakirkham this may interest you.

@pentschev
Copy link
Member

On a more general note, for me personally I think it may be helpful to set milestones for Dask's NEP-18 support. At first, it seemed that there wouldn't be so many issues, but unfortunately we have many due to lack of features that Dask implements and CuPy does not, as well as various bugs in both libraries.

My suggestion is that we come up with subsets of functions (probably ordered by importance) and tackle them by that order, to prevent us from working only on least important issues and forgetting in the short-term about the important ones. Since I'm relatively new to Dask, I don't think I'm qualified to come up with such a list, so any input here is very welcome.

@mrocklin
Copy link
Member Author

In a side-channel conversation about trying to get dask-glm working, @pentschev brought up cases like the following, where we create an array of zeros of a type similar to an input array, but with a new shape

(n, p) = X.shape
...
z = np.zeros(p)

In this particular case in dask-glm it probably doesn't matter, because p is likely to be very small, and so keeping that array on the CPU is probably fine (if not preferable).

In general though, how do we handle this? One approach would be to get a numpy-like module out of an array

module = get_module(input_array)

x = module.zeros(shape=(...))

In this way we still watch the dispatchability of the __array_function__ approach, but without explicitly requiring that an array is an input.

cc @shoyer @hameerabbasi

@shoyer
Copy link
Member

shoyer commented Feb 25, 2019

What about adding a shape keyword argument to NumPy's zeros_like, ones_like and full_like? There's some precedence for overwriting attributes of the original array with the dtype argument already.

@hameerabbasi
Copy link
Contributor

I would much rather get started on a "back-end" NEP, similar to @mrocklin's suggestion. Dispatching creation functions and other API calls that do not take in an array are a recurring pattern... I'd prefer a NEP to playing whack-a-mole with issues that pop up with them.

@pentschev
Copy link
Member

Just a reminder, I think @mrocklin is now referring to a more general NumPy case, but we had a similar discussion on approaches to solve this within Dask in #4490.

@mrocklin
Copy link
Member Author

When looking at the dask-glm solvers in particular, I suspect that admm, bfgs, and proximal-grad are likely the most performant.

@jakirkham
Copy link
Member

After reading through PR ( #4543 ), was thinking about whether we might be able to get a rough empty_like implementation to use until PR ( numpy/numpy#13046 ) lands. This might unblock some things.

Tried to come up with something on the plane and made this.

import numpy


def resize(arr, shape):
    if type(arr) is numpy.ma.masked_array:
        # NumPy Masked Arrays error resize method
        out = numpy.ma.resize(arr, shape)
    else:
        out = arr.copy()
        out.resize(shape)
    
    return out


def zeros_like(prototype, shape=None, dtype=None):
    # Standardize input arguments
    dtype = (dtype or prototype.dtype)
    shape = (shape or prototype.shape)

    # Create an empty array
    empty = resize(prototype, prototype.ndim * (0,))

    # Generate a new array from the empty array
    arr = empty.astype(dtype)
    arr = resize(arr, shape)

    return arr

This works on NumPy arrays, NumPy masked arrays, and SciPy sparse arrays.

Did not have Sparse or CuPy at the time. So hadn't tested those. Now that I'm looking at them, they both lack resize. Meaning they wouldn't work as is.

Maybe we could add resize implementations to them ( pydata/sparse#241 ) ( cupy/cupy#2101 )? Alternatively there might be another path forward that doesn't use resize. Anyways figured I'd share it in case there were other thoughts on this approach.

@pentschev
Copy link
Member

@jakirkham supposing there was a resize() on all libraries, what I don't like from your proposal is the arr.copy(), for large arrays this would be very prohibitive. But if we add specializations for different libraries (as you did for masked_array), we could always check the input array and create it a new one based on its type. Sure, it would be sub-optimal from a code perspective, as we would need to explicitly add support for each one, but would work, plus there aren't too many libraries supporting __array_function__ yet.

If doing this is acceptable until we get shape added to _like() functions, then it would indeed help us in some cases.

@jakirkham
Copy link
Member

That's a fair complaint. Here's why the copy is there.

  1. It would be nice if the resize implementation did not mutate the input array.
  2. resize requires that the data be owned by the object used.

If we didn't care about 1, we could replace this with arr = numpy.require(arr, requirements="O"), which will only copy if the array doesn't own the data.

@pentschev
Copy link
Member

@jakirkham sorry, maybe I wasn't clear with what I meant. I do understand why copy was there, but I think that's too prohibitive. In fact, avoiding a copy was the whole reason we're pushing to have a shape argument in the *_like() functions.

@pentschev
Copy link
Member

In other words: I agree what you suggested would be a nice workaround if we could avoid this, potentially huge, overhead of data duplication.

@jakirkham
Copy link
Member

No worries. Just wanted to make sure we were on the same page.

If we go the numpy.require route, we would sometimes avoid the copy. Am not sure how often "sometimes" is currently.

It's worth thinking about. Let me know if you have any ideas. ;)

@jakirkham
Copy link
Member

As to the specializations, I think we didn't want to go down that road for a permanent solution. Maybe we could reconsider for a temporary one?

@pentschev
Copy link
Member

As to the specializations, I think we didn't want to go down that road for a permanent solution. Maybe we could reconsider for a temporary one?

I was attempting to avoid this even as a temporary one, but if that's acceptable, we could go down that road. There's no need to make it a permanent solution anyway, I believe that once we can get like arrays with shape, we could solve that issue for good, at least I don't currently see any other issues that may prevent that from being a permanent solution.

@system123
Copy link

The following example is failing for me with CuPy 6.1.0 and Master branch of Dask:

        import cupy
        x = cupy.random.random((5000, 1000))
       
        import dask.array as da
        d = da.from_array(x, chunks=(1000, 1000), asarray=False)

        u, s, v = da.linalg.svd(d)
        s.compute()

The error I am getting is: ValueError: object __array__ method not producing an array

@pentschev
Copy link
Member

The error I am getting is: ValueError: object __array__ method not producing an array

You need NumPy 1.16.x and environment variable NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=1. Starting with NumPy 1.17, the environment variable won't be necessary anymore.

@jrbourbeau
Copy link
Member

@system123 does setting the environment variable NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=1 as @pentschev suggested fix your issue?

@system123
Copy link

Just got around to testing it and it is working after setting the env variable.

@jrbourbeau
Copy link
Member

Great, glad to hear it. Thanks for the fix @pentschev

@quasiben
Copy link
Member

I think this is largely supported now and we can close. If there are edge cases we should open new issues and resolve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants