Skip to content

Fix std to work with numeric_only for pandas 2.0#9960

Merged
jrbourbeau merged 9 commits intodask:mainfrom
j-bennet:j-bennet/9736-fix-test-datetime-std-2
Feb 17, 2023
Merged

Fix std to work with numeric_only for pandas 2.0#9960
jrbourbeau merged 9 commits intodask:mainfrom
j-bennet:j-bennet/9736-fix-test-datetime-std-2

Conversation

@j-bennet
Copy link
Copy Markdown
Contributor

This fixes the test failure in upstream:

dask/dataframe/tests/test_arithmetics_reduction.py::test_datetime_std_across_axis1_null_results[False]: TypeError: float() argument must be a string or a real number, not 'Timestamp'
dask/dataframe/tests/test_arithmetics_reduction.py::test_datetime_std_across_axis1_null_results[True]: TypeError: float() argument must be a string or a real number, not 'Timestamp'

Unlike #9952, this PR does not make numeric_only changes anywhere in DataFrame/Series methods, besides std.

Xref #9736.

  • Tests added / passed
  • Passes pre-commit run --all-files

@j-bennet
Copy link
Copy Markdown
Contributor Author

Tests are now passing.

if PANDAS_GT_200:
numeric_only = False
else:
warn_numeric_only = True
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set numeric_only=True here so this function always returns {"numeric_only": numeric_only}? Or is returning {} meaningful later on?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the point of returning {} here is to make sure we never pass no_default further to pandas, to use their default instead.

ddof=ddof,
enforce_metadata=False,
numeric_only=numeric_only,
**numeric_kwargs,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you merge main into this PR? We recently bumped pre-commit hook versions and they may want this to be after parent_meta.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged.

kwargs = {} if numeric_only is None else {"numeric_only": numeric_only}

ctx = contextlib.nullcontext()
if numeric_only is False or (PANDAS_GT_200 and numeric_only is None):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to happen in this PR, but I can imagine pulling this logic out into a little utility helper if we end up using it in lots of places

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. It's a little different in different places, but most can be generalized.

Comment on lines +1659 to +1667
pctx = contextlib.nullcontext()
dctx = contextlib.nullcontext()
if numeric_only is False or (PANDAS_GT_200 and numeric_only is None):
dctx = pytest.raises(NotImplementedError, match="numeric_only")
pctx = pytest.raises(TypeError)
elif numeric_only is None:
dctx = pytest.warns(FutureWarning, match="numeric_only")
if PANDAS_GT_130:
pctx = pytest.warns(FutureWarning, match="numeric_only")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a mismatch here between pandas and dask?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, in both quantile and std, I'm using the same helper function, _numeric_only_maybe_warn. It replaces the old @_numeric_only decorator that the rest of the methods use.

If you look here, I'm not checking for a lower threshhold of pandas version to issue a warning, just that we're below 2.0:

warn_numeric_only = True

Pandas has different warning behavior for these two functions below 1.5, one of them warns, the other doesn't. I think it actually makes sense to warn of future changes to numeric behavior with both. And this way, I can also reuse the helper function.

Comment on lines +1699 to +1700
with dctx:
assert_eq(ddf2.std(axis=1, **kwargs), expected2)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment here

Comment on lines +1613 to +1615
with warnings.catch_warnings(record=True):
warnings.filterwarnings("ignore", category=FutureWarning)
# pandas issues a warning with 1.5, but not 1.3
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, based on the comment here I don't quite understand this change as shouldn't be expecting a warning in 1.3

@contextlib.contextmanager
def assert_numeric_only_default_warning(numeric_only):
if numeric_only is None and PANDAS_GT_150 and not PANDAS_GT_200:
ctx = pytest.warns(FutureWarning, match="default value of numeric_only")
else:
ctx = contextlib.nullcontext()
with ctx:
yield

Maybe I'm missing something tough

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I made Dask warn below 1.5 here:

warn_numeric_only = True

See this comment for the reasoning.

Comment on lines +1620 to +1621
with check_numeric_only_deprecation():
assert_eq(result, expected)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we're now warning at compute time -- is there a way we can avoid that?

@jrbourbeau jrbourbeau changed the title Selectively fix std to work with numeric_only for pandas 2.0 compatibility Fix std to work with numeric_only for pandas 2.0 Feb 17, 2023
Copy link
Copy Markdown
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @j-bennet

@jrbourbeau jrbourbeau merged commit 0eb4bd0 into dask:main Feb 17, 2023
@j-bennet j-bennet deleted the j-bennet/9736-fix-test-datetime-std-2 branch February 18, 2023 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants