Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify moment chunk functions to return dicts #4519

Merged
merged 1 commit into from Feb 25, 2019
Merged

Conversation

@pentschev
Copy link
Member

@pentschev pentschev commented Feb 22, 2019

This is a follow-up on #4513.

@mrocklin

  • Tests added / passed
  • Passes flake8 dask
@mrocklin
Copy link
Member

@mrocklin mrocklin commented Feb 22, 2019

Thanks @pentschev !

I'm curious, do cupy operations like std work after these changes?

@mrocklin
Copy link
Member

@mrocklin mrocklin commented Feb 22, 2019

@pentschev you might also consider the following change to the sparse tests

diff --git a/dask/array/tests/test_sparse.py b/dask/array/tests/test_sparse.py
index 459609de..35b89f6d 100644
--- a/dask/array/tests/test_sparse.py
+++ b/dask/array/tests/test_sparse.py
@@ -33,6 +33,7 @@ functions = [
     lambda x: x.T,
     lambda x: da.transpose(x, (1, 2, 0)),
     lambda x: x.sum(),
+    lambda x: x.std(),
     lambda x: x.dot(np.arange(x.shape[-1])),
     lambda x: x.dot(np.eye(x.shape[-1])),
     lambda x: da.tensordot(x, np.ones(x.shape[:2]), axes=[(0, 1), (0, 1)]),

When I do this I get the following traceback:

dask/core.py:119: in _execute_task
    return func(*args2)
dask/compatibility.py:93: in apply
    return func(*args, **kwargs)
dask/array/reductions.py:385: in moment_chunk
    n = numel(A, **kwargs).astype(np.int64)
dask/array/reductions.py:323: in numel
    return chunk.sum(np.ones_like(x), **kwargs)
../../Software/anaconda/lib/python3.6/site-packages/numpy/core/numeric.py:263: in ones_like
    res = empty_like(a, dtype=dtype, order=order, subok=subok)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <COO: shape=(1, 2, 2), dtype=float64, nnz=2, fill_value=0.0>, kwargs = {}, densify = False

    def __array__(self, **kwargs):
        from . import _AUTO_DENSIFICATION_ENABLED as densify
        if not densify:
>           raise RuntimeError('Cannot convert a sparse array to dense automatically. '
                               'To manually densify, use the todense method.')
E           RuntimeError: Cannot convert a sparse array to dense automatically. To manually densify, use the todense method.

../sparse/sparse/sparse_array.py:213: RuntimeError

I think that numel currently computes the number of elements in an array by creating an array of ones and then doing the same reduction on that array of ones. We should be able to avoid this (and be considerably faster) by looking at the shape and the axis argument and doing a bit of thinking.

@pentschev
Copy link
Member Author

@pentschev pentschev commented Feb 22, 2019

Thanks @pentschev !

I'm curious, do cupy operations like std work after these changes?

Well, AFAIK, there are no really "elaborate" tests for that, on my own simple tests it did. I will add that as well to #4486.

@pentschev
Copy link
Member Author

@pentschev pentschev commented Feb 22, 2019

Regarding the x.std() sparse test, I tried it as well in current master (i.e., without changes in this PR and without those in #4513), it also fails, just like x.mean() that you added to #4513. Should we also tackle this one in a different issue too?

@mrocklin
Copy link
Member

@mrocklin mrocklin commented Feb 22, 2019

I think that in general using the sparse library might be a good proxy for testing cupy. The sparse library is probably more sensitive to accidentally interacting with numpy arrays than cupy is, and is much easier to run on CI systems :)

@mrocklin
Copy link
Member

@mrocklin mrocklin commented Feb 22, 2019

Regarding the x.std() sparse test, I tried it as well in current master (i.e., without changes in this PR and without those in #4513), it also fails, just like x.mean() that you added to #4513. Should we also tackle this one in a different issue too?

Right, we haven't supported these operations well before. I think that with this PR we're one step closer to supporting operations like std. I'd be happy to see that change either in another PR or in this one. I don't have a strong preference.

@pentschev
Copy link
Member Author

@pentschev pentschev commented Feb 22, 2019

My preference is generally to do things in smaller tasks rather than trying to fix all at once. I will then open an issue to tackle std() and mean() in a separate PR so we can free ourselves of this issue here instead of having a thread that extends forever, ok?

@mrocklin
Copy link
Member

@mrocklin mrocklin commented Feb 22, 2019

Sounds good. Thanks @pentschev

@mrocklin
Copy link
Member

@mrocklin mrocklin commented Feb 22, 2019

Merging in 24 hours if there are no further comments.

@pentschev
Copy link
Member Author

@pentschev pentschev commented Feb 22, 2019

Opened new issue to tackle the sparse error in #4523.

@mrocklin mrocklin merged commit b1430f0 into dask:master Feb 25, 2019
2 checks passed
@pentschev pentschev deleted the moment-dict branch Mar 12, 2019
jorge-pessoa pushed a commit to jorge-pessoa/dask that referenced this issue May 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants