Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inplace operators behaviour for dask arrays inconsistent with numpy #5199

Open
ivirshup opened this issue Aug 1, 2019 · 6 comments
Open
Labels

Comments

@ivirshup
Copy link

ivirshup commented Aug 1, 2019

Inplace operators modify numpy arrays inplace, but end up making a copy of dask arrays.

# Using dask v2.2.0, numpy 1.17.0
import numpy as np
import dask.array as da
x_np = np.zeros(5)
y_np = x_np
y_np += 1

x_da = da.zeros(5)
y_da = x_da
y_da += 1

print(x_np == y_np)
# [ True  True  True  True  True]
print(id(x_np) == id(y_np))
# True
print(x_da.compute() == y_da.compute())
# [False False False False False]
print(id(x_da) == id(y_da))
# False

I'm not sure if you'd qualify this as a bug, since the behaviour between the two is known to not line up completely. I have also seen related issues have been brought up, like #2588 and #2000.

I think a warning for this behavior could be useful. Alternatively, I think this could just be an inplace operation. Here's are some proof of concept implementations. This should probably be done with ufuncs if implemented.

import dask.array as da
import numpy as np

# Something like this
def dask_iadd(x, y):
    np.add(x, y, out=x)
    return x
da.core.Array.__iadd__ = dask_iadd

a = da.ones((10, 10))
b = a
b += 1
print(id(a) == id(b))
# True
print(np.all(a.compute() == b.compute()))
# True
Previous crude implementation
import dask.array as da

def _replace(x, y):
    x._meta = y._meta
    x.dask = y.dask
    x.name = y.name
    x._chunks = y.chunks
    return x

da.core.Array.__iadd__ = lambda x, y: _replace(x, x + y)
# etc

z = da.zeros((2, 2))
z += 1
z.compute()
#  array([[1., 1.],
#         [1., 1.]])
@ivirshup ivirshup changed the title Inplace operators behaviour dask arrays inconsistent with numpy Inplace operators behaviour for dask arrays inconsistent with numpy Aug 1, 2019
@martindurant
Copy link
Member

Truly in-place operations would not be appropriate, since Dask tasks are functional, and tokenized, you do not want inputs changing. Consider, how would you make this work when you have workers in different processes/machines, you might modify the in-memory version, but that would have no effect on copies of the data held elsewhere, or if the worker went down.

@ivirshup
Copy link
Author

ivirshup commented Aug 1, 2019

Does the current implementation of __setitem__ avoid these issues?

@martindurant
Copy link
Member

Yes, exactly

In [14]: dict(x.dask)
Out[14]:
{('ones-c4a83f4b990021618d55e0fa61a351d6',
  0): (functools.partial(<function ones at 0x10871a2f0>, dtype=dtype('float64')), (5,)),
 ('ones-c4a83f4b990021618d55e0fa61a351d6',
  1): (functools.partial(<function ones at 0x10871a2f0>, dtype=dtype('float64')), (5,))}

In [15]: x[x] =5

In [16]: dict(x.dask)
Out[16]:
{('where-fc316acce29c6d2b5cef2f2593835bd6', 0): (subgraph_callable,
  ('ones-c4a83f4b990021618d55e0fa61a351d6', 0),
  5,
  ('ones-c4a83f4b990021618d55e0fa61a351d6', 0)),
 ('where-fc316acce29c6d2b5cef2f2593835bd6', 1): (subgraph_callable,
  ('ones-c4a83f4b990021618d55e0fa61a351d6', 1),
  5,
  ('ones-c4a83f4b990021618d55e0fa61a351d6', 1)),
 ('ones-c4a83f4b990021618d55e0fa61a351d6',
  0): (functools.partial(<function ones at 0x10871a2f0>, dtype=dtype('float64')), (5,)),
 ('ones-c4a83f4b990021618d55e0fa61a351d6',
  1): (functools.partial(<function ones at 0x10871a2f0>, dtype=dtype('float64')), (5,))}

Note how the "ones" remain in the graph.

Of, with concrete values, see how below the dask array's values change, but the original array is not

In [19]: inarr = np.ones(10)
In [20]: x = da.from_array(inarr, chunks=5)
In [22]: x[x] = 5
In [23]: x.compute()
Out[23]: array([5., 5., 5., 5., 5., 5., 5., 5., 5., 5.])

In [24]: inarr
Out[24]: array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

@martindurant
Copy link
Member

OK to close?

@ivirshup
Copy link
Author

ivirshup commented Aug 2, 2019

I'm not sure I understand your point. Aren't in-place operators equivalent to something like this?

a = da.ones(5)
b = a
b[da.ones(5, dtype=bool)] = b + 1

I think I may not have been clear in my meaning of "in-place operators". I don't think that the actual computation has to happen in place, that is totally up to the execution engine. I meant the semantics of augmented assignment operators as used by numpy and, more broadly, mutable objects in python aren't being followed by dask arrays, though I think they could be.

In the issues I referenced, it sounded like this was a goal which dask was pursuing. I would be happy to make a PR implementing these.

@jsignell
Copy link
Member

I was looking at this and trying to figure out what the intention of those linked PRs was. My read is that they were more focused on allowing easy assignment patterns and less concerned with modifying dask collections in-place. That being said, I don't see a good reason why the behavior that you are talking about here and that you sketched out in your original post couldn't be supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants