-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upstream fix for numeric_only default [WIP] #9241
Conversation
Thanks for starting this Naty! I would think that the approach would be to change the default on the method definition itself. The part that is up in the air is whether the default value of |
Do you mean like here for example for the dask/dask/dataframe/groupby.py Lines 1475 to 1480 in ecbab9d
Do you want to change it on the method to be |
Oh. I hadn't fully internalized that pandas is explicitly trying to make people specify |
When you go to add numeric_only to groupby methods you can probably use the |
@ncclementi @ian-r-rose and I looked at this today. According to this changelog entry:
Next,
Should we support this intermediate It'll be nice to get a consensus beforehand because we'll be touching many functions with this. Sidenote, current docs (example here) says we don't support |
I think short term we should add support for
Yeah I think that's right and it raises a warning unless you explicitly set numeric_only: In [1]: import pandas as pd
...:
...: df = pd.DataFrame({"a": [1,2,3], "b": ["a", "b", "c"]})
In [2]: df.mean()
<ipython-input-2-c61f0c8f89b5>:1: FutureWarning: The default value of numeric_only in DataFrame.mean is deprecated. In a future version, it will default to False. In addition, specifying 'numeric_only=None' is deprecated. Select only valid columns or specify the value of numeric_only to silence this warning.
df.mean()
Out[2]:
a 2.0
dtype: float64
I think it would be fairly easy to support. And it would eagerly fail (like pandas does) most of the time since we try to run functions on the meta first.
I don't like this idea very much.
I think even longer term in pandas the intention is to get rid of this kwarg so probably we can just jump to False and delete the whole thing when pandas goes to 2.0 and we can just stop supporting versions of pandas where |
Linking here the reply on the pandas repo about the defaults status for all the affected operations. |
I started to work on a fix for the numeric_only CI failure we are seeing. I've only covered one file, but I wonder if this is the approach we want to pursue. If so I can continue adding all the other files.
pre-commit run --all-files
cc: @jsignell @ian-r-rose and @pavithraes because I know you were looking at this one.