Inplace operators behaviour for dask arrays inconsistent with numpy #5199

ivirshup · 2019-08-01T11:04:08Z

Inplace operators modify numpy arrays inplace, but end up making a copy of dask arrays.

# Using dask v2.2.0, numpy 1.17.0
import numpy as np
import dask.array as da
x_np = np.zeros(5)
y_np = x_np
y_np += 1

x_da = da.zeros(5)
y_da = x_da
y_da += 1

print(x_np == y_np)
# [ True  True  True  True  True]
print(id(x_np) == id(y_np))
# True
print(x_da.compute() == y_da.compute())
# [False False False False False]
print(id(x_da) == id(y_da))
# False

I'm not sure if you'd qualify this as a bug, since the behaviour between the two is known to not line up completely. I have also seen related issues have been brought up, like #2588 and #2000.

I think a warning for this behavior could be useful. Alternatively, I think this could just be an inplace operation. Here's are some proof of concept implementations. This should probably be done with ufuncs if implemented.

import dask.array as da
import numpy as np

# Something like this
def dask_iadd(x, y):
    np.add(x, y, out=x)
    return x
da.core.Array.__iadd__ = dask_iadd

a = da.ones((10, 10))
b = a
b += 1
print(id(a) == id(b))
# True
print(np.all(a.compute() == b.compute()))
# True

Previous crude implementation

import dask.array as da

def _replace(x, y):
    x._meta = y._meta
    x.dask = y.dask
    x.name = y.name
    x._chunks = y.chunks
    return x

da.core.Array.__iadd__ = lambda x, y: _replace(x, x + y)
# etc

z = da.zeros((2, 2))
z += 1
z.compute()
#  array([[1., 1.],
#         [1., 1.]])

martindurant · 2019-08-01T15:24:29Z

Truly in-place operations would not be appropriate, since Dask tasks are functional, and tokenized, you do not want inputs changing. Consider, how would you make this work when you have workers in different processes/machines, you might modify the in-memory version, but that would have no effect on copies of the data held elsewhere, or if the worker went down.

ivirshup · 2019-08-01T15:42:25Z

Does the current implementation of __setitem__ avoid these issues?

martindurant · 2019-08-01T15:50:33Z

Yes, exactly

In [14]: dict(x.dask)
Out[14]:
{('ones-c4a83f4b990021618d55e0fa61a351d6',
  0): (functools.partial(<function ones at 0x10871a2f0>, dtype=dtype('float64')), (5,)),
 ('ones-c4a83f4b990021618d55e0fa61a351d6',
  1): (functools.partial(<function ones at 0x10871a2f0>, dtype=dtype('float64')), (5,))}

In [15]: x[x] =5

In [16]: dict(x.dask)
Out[16]:
{('where-fc316acce29c6d2b5cef2f2593835bd6', 0): (subgraph_callable,
  ('ones-c4a83f4b990021618d55e0fa61a351d6', 0),
  5,
  ('ones-c4a83f4b990021618d55e0fa61a351d6', 0)),
 ('where-fc316acce29c6d2b5cef2f2593835bd6', 1): (subgraph_callable,
  ('ones-c4a83f4b990021618d55e0fa61a351d6', 1),
  5,
  ('ones-c4a83f4b990021618d55e0fa61a351d6', 1)),
 ('ones-c4a83f4b990021618d55e0fa61a351d6',
  0): (functools.partial(<function ones at 0x10871a2f0>, dtype=dtype('float64')), (5,)),
 ('ones-c4a83f4b990021618d55e0fa61a351d6',
  1): (functools.partial(<function ones at 0x10871a2f0>, dtype=dtype('float64')), (5,))}

Note how the "ones" remain in the graph.

Of, with concrete values, see how below the dask array's values change, but the original array is not

In [19]: inarr = np.ones(10)
In [20]: x = da.from_array(inarr, chunks=5)
In [22]: x[x] = 5
In [23]: x.compute()
Out[23]: array([5., 5., 5., 5., 5., 5., 5., 5., 5., 5.])

In [24]: inarr
Out[24]: array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

martindurant · 2019-08-01T18:50:16Z

OK to close?

ivirshup · 2019-08-02T02:30:59Z

I'm not sure I understand your point. Aren't in-place operators equivalent to something like this?

a = da.ones(5)
b = a
b[da.ones(5, dtype=bool)] = b + 1

I think I may not have been clear in my meaning of "in-place operators". I don't think that the actual computation has to happen in place, that is totally up to the execution engine. I meant the semantics of augmented assignment operators as used by numpy and, more broadly, mutable objects in python aren't being followed by dask arrays, though I think they could be.

In the issues I referenced, it sounded like this was a goal which dask was pursuing. I would be happy to make a PR implementing these.

jsignell · 2022-03-23T17:02:09Z

I was looking at this and trying to figure out what the intention of those linked PRs was. My read is that they were more focused on allowing easy assignment patterns and less concerned with modifying dask collections in-place. That being said, I don't see a good reason why the behavior that you are talking about here and that you sketched out in your original post couldn't be supported.

ivirshup changed the title ~~Inplace operators behaviour dask arrays inconsistent with numpy~~ Inplace operators behaviour for dask arrays inconsistent with numpy Aug 1, 2019

ivirshup mentioned this issue Aug 1, 2019

Check return types of distributed computations scverse/scanpy#733

Closed

jrbourbeau added the array label Aug 2, 2019

ivirshup mentioned this issue Nov 11, 2019

dtype fixes for downsample and normalization scverse/scanpy#865

Merged

tomwhite mentioned this issue Mar 23, 2022

Array API implementation #8750

Open

3 tasks

jsignell closed this as completed Mar 23, 2022

jsignell reopened this Mar 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inplace operators behaviour for dask arrays inconsistent with numpy #5199

Inplace operators behaviour for dask arrays inconsistent with numpy #5199

ivirshup commented Aug 1, 2019 •

edited

martindurant commented Aug 1, 2019

ivirshup commented Aug 1, 2019

martindurant commented Aug 1, 2019

martindurant commented Aug 1, 2019

ivirshup commented Aug 2, 2019

jsignell commented Mar 23, 2022

Inplace operators behaviour for dask arrays inconsistent with numpy #5199

Inplace operators behaviour for dask arrays inconsistent with numpy #5199

Comments

ivirshup commented Aug 1, 2019 • edited

martindurant commented Aug 1, 2019

ivirshup commented Aug 1, 2019

martindurant commented Aug 1, 2019

martindurant commented Aug 1, 2019

ivirshup commented Aug 2, 2019

jsignell commented Mar 23, 2022

ivirshup commented Aug 1, 2019 •

edited