New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return da.Array
rather than dd.Series
for non-ufunc elementwise functions on dd.Series
#8558
Conversation
@jorisvandenbossche or @rjzamora would either of you be able to sanity check this? |
Have we tested this on older versions of NumPy (like 1.18)? |
That's the version of numpy installed on the python 3.7 tests https://github.com/dask/dask/runs/4781480587?check_suite_focus=true for instance |
Do you mean that |
In fact this is how it did work in the past ( #1669 ), which interestingly is the same PR that adds Given how old that PR is and how little that code has changed, I wonder if something else is causing issues here. |
That is what I mean, it seems like with this PR we come a lot closer to matching the numpy + pandas object behavior: import numpy as np
import pandas as pd
import dask.array as da
import dask.dataframe as dd
s = pd.Series(np.random.randn(100))
ds = dd.from_pandas(s, 3)
da.sin(ds) # dd.Series
np.sin(s) # pd.Series
da.real(ds) # da.Array
np.real(s) # np.Array |
The
Isn't it the other way around, i.e. further from numpy/pandas' behaviour? With the numpy / pandas combo, calling a numpy ufunc on a pandas Series returns a pandas Series ( While you mention that with this PR it will start returning a |
Sorry @jorisvandenbossche I think I'm missing something, can you provide an example that gives surprising behavior? The example that I gave above seems to match the numpy on pandas behavior. |
Ah, sorry, I misinterpreted your code snippet in #8558 (comment). I wasn't looking carefully enough to notice the difference between So just to be very explicit, the following wasn't actually a correct statement:
So for ufuncs the output did not change (according to the code snippet in #8558 (comment), the ufunc So yes, that indeed seems to bring the behaviour closer to numpy/pandas. |
Ah sorry for the confusion @jorisvandenbossche I don't know why I thought those were all ufuncs 🙃 |
@jsignell Does this PR need a review? Do you have a suggestion for someone to do that? |
This PR needs a decision more than it needs review. I'll add a deprecation cycle and then it'll be ready for review. |
48150cd
to
bb40230
Compare
grrr it's hard to deprecate this in a way that is at all usable. I'll think some more about it unless anyone else has ideas. |
@ncclementi in case you are looking into "test upstream" failures this resolves 12 or so, but it is kind of hard to implement in a backwards compatible way. |
@jsignell Thanks for the heads up, it looks like there are not many left. I think if we find a solution to #8722 (comment) that should solve all upstream problems. |
bb40230
to
bb3bd23
Compare
I talked with @jcrist and we agreed that since the previous behavior was really more of a bug than anything else, we think it's ok to merge this without a deprecation. |
__array_wrap__
da.Array
rather than dd.Series
for non-ufunc elementwise functions on dd.Series
I'll try kicking off ci later when github is healthier |
I'm planning on merging this when tests pass. |
Thanks @jsignell |
This seems to work fine, but it does change the output type of ufuncs. I think this is better than it was originally and this code is very old, but I'd appreciate a second set of eyes. Maybe someone from @dask/array