New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inplace operators behaviour for dask arrays inconsistent with numpy #5199
Comments
Truly in-place operations would not be appropriate, since Dask tasks are functional, and tokenized, you do not want inputs changing. Consider, how would you make this work when you have workers in different processes/machines, you might modify the in-memory version, but that would have no effect on copies of the data held elsewhere, or if the worker went down. |
Does the current implementation of |
Yes, exactly
Note how the "ones" remain in the graph. Of, with concrete values, see how below the dask array's values change, but the original array is not
|
OK to close? |
I'm not sure I understand your point. Aren't in-place operators equivalent to something like this? a = da.ones(5)
b = a
b[da.ones(5, dtype=bool)] = b + 1 I think I may not have been clear in my meaning of "in-place operators". I don't think that the actual computation has to happen in place, that is totally up to the execution engine. I meant the semantics of augmented assignment operators as used by numpy and, more broadly, mutable objects in python aren't being followed by dask arrays, though I think they could be. In the issues I referenced, it sounded like this was a goal which dask was pursuing. I would be happy to make a PR implementing these. |
I was looking at this and trying to figure out what the intention of those linked PRs was. My read is that they were more focused on allowing easy assignment patterns and less concerned with modifying dask collections in-place. That being said, I don't see a good reason why the behavior that you are talking about here and that you sketched out in your original post couldn't be supported. |
Inplace operators modify numpy arrays inplace, but end up making a copy of dask arrays.
I'm not sure if you'd qualify this as a bug, since the behaviour between the two is known to not line up completely. I have also seen related issues have been brought up, like #2588 and #2000.
I think a warning for this behavior could be useful. Alternatively, I think this could just be an inplace operation. Here's are some proof of concept implementations. This should probably be done with ufuncs if implemented.
Previous crude implementation
The text was updated successfully, but these errors were encountered: