Fix std to work with numeric_only for pandas 2.0#9960
Conversation
|
Tests are now passing. |
| if PANDAS_GT_200: | ||
| numeric_only = False | ||
| else: | ||
| warn_numeric_only = True |
There was a problem hiding this comment.
Can we set numeric_only=True here so this function always returns {"numeric_only": numeric_only}? Or is returning {} meaningful later on?
There was a problem hiding this comment.
Yes, the point of returning {} here is to make sure we never pass no_default further to pandas, to use their default instead.
| ddof=ddof, | ||
| enforce_metadata=False, | ||
| numeric_only=numeric_only, | ||
| **numeric_kwargs, |
There was a problem hiding this comment.
Could you merge main into this PR? We recently bumped pre-commit hook versions and they may want this to be after parent_meta.
| kwargs = {} if numeric_only is None else {"numeric_only": numeric_only} | ||
|
|
||
| ctx = contextlib.nullcontext() | ||
| if numeric_only is False or (PANDAS_GT_200 and numeric_only is None): |
There was a problem hiding this comment.
Doesn't need to happen in this PR, but I can imagine pulling this logic out into a little utility helper if we end up using it in lots of places
There was a problem hiding this comment.
Yeah. It's a little different in different places, but most can be generalized.
| pctx = contextlib.nullcontext() | ||
| dctx = contextlib.nullcontext() | ||
| if numeric_only is False or (PANDAS_GT_200 and numeric_only is None): | ||
| dctx = pytest.raises(NotImplementedError, match="numeric_only") | ||
| pctx = pytest.raises(TypeError) | ||
| elif numeric_only is None: | ||
| dctx = pytest.warns(FutureWarning, match="numeric_only") | ||
| if PANDAS_GT_130: | ||
| pctx = pytest.warns(FutureWarning, match="numeric_only") |
There was a problem hiding this comment.
Why is there a mismatch here between pandas and dask?
There was a problem hiding this comment.
Right now, in both quantile and std, I'm using the same helper function, _numeric_only_maybe_warn. It replaces the old @_numeric_only decorator that the rest of the methods use.
If you look here, I'm not checking for a lower threshhold of pandas version to issue a warning, just that we're below 2.0:
Line 152 in db4761a
Pandas has different warning behavior for these two functions below 1.5, one of them warns, the other doesn't. I think it actually makes sense to warn of future changes to numeric behavior with both. And this way, I can also reuse the helper function.
| with dctx: | ||
| assert_eq(ddf2.std(axis=1, **kwargs), expected2) |
| with warnings.catch_warnings(record=True): | ||
| warnings.filterwarnings("ignore", category=FutureWarning) | ||
| # pandas issues a warning with 1.5, but not 1.3 |
There was a problem hiding this comment.
Hmm, based on the comment here I don't quite understand this change as shouldn't be expecting a warning in 1.3
dask/dask/dataframe/tests/test_dataframe.py
Lines 1522 to 1530 in e893d53
Maybe I'm missing something tough
There was a problem hiding this comment.
| with check_numeric_only_deprecation(): | ||
| assert_eq(result, expected) |
There was a problem hiding this comment.
It looks like we're now warning at compute time -- is there a way we can avoid that?
Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com>
std to work with numeric_only for pandas 2.0 compatibilitystd to work with numeric_only for pandas 2.0
This fixes the test failure in upstream:
Unlike #9952, this PR does not make
numeric_onlychanges anywhere in DataFrame/Series methods, besidesstd.Xref #9736.
pre-commit run --all-files