Fix `std` to work with `numeric_only` for `pandas` 2.0 by j-bennet · Pull Request #9960 · dask/dask

j-bennet · 2023-02-14T23:17:17Z

This fixes the test failure in upstream:

dask/dataframe/tests/test_arithmetics_reduction.py::test_datetime_std_across_axis1_null_results[False]: TypeError: float() argument must be a string or a real number, not 'Timestamp'
dask/dataframe/tests/test_arithmetics_reduction.py::test_datetime_std_across_axis1_null_results[True]: TypeError: float() argument must be a string or a real number, not 'Timestamp'

Unlike #9952, this PR does not make numeric_only changes anywhere in DataFrame/Series methods, besides std.

Xref #9736.

Tests added / passed
Passes pre-commit run --all-files

…est-datetime-std-2

j-bennet · 2023-02-16T06:41:01Z

Tests are now passing.

jrbourbeau · 2023-02-16T18:49:07Z

dask/dataframe/core.py

+            if PANDAS_GT_200:
+                numeric_only = False
+            else:
+                warn_numeric_only = True


Can we set numeric_only=True here so this function always returns {"numeric_only": numeric_only}? Or is returning {} meaningful later on?

Yes, the point of returning {} here is to make sure we never pass no_default further to pandas, to use their default instead.

jrbourbeau · 2023-02-16T18:50:08Z

dask/dataframe/core.py

                ddof=ddof,
                enforce_metadata=False,
-                numeric_only=numeric_only,
+                **numeric_kwargs,


Could you merge main into this PR? We recently bumped pre-commit hook versions and they may want this to be after parent_meta.

jrbourbeau · 2023-02-16T19:00:16Z

dask/dataframe/tests/test_arithmetics_reduction.py

+    kwargs = {} if numeric_only is None else {"numeric_only": numeric_only}
+
+    ctx = contextlib.nullcontext()
+    if numeric_only is False or (PANDAS_GT_200 and numeric_only is None):


Doesn't need to happen in this PR, but I can imagine pulling this logic out into a little utility helper if we end up using it in lots of places

Yeah. It's a little different in different places, but most can be generalized.

jrbourbeau · 2023-02-16T19:02:08Z

dask/dataframe/tests/test_arithmetics_reduction.py

+    pctx = contextlib.nullcontext()
+    dctx = contextlib.nullcontext()
+    if numeric_only is False or (PANDAS_GT_200 and numeric_only is None):
+        dctx = pytest.raises(NotImplementedError, match="numeric_only")
+        pctx = pytest.raises(TypeError)
+    elif numeric_only is None:
+        dctx = pytest.warns(FutureWarning, match="numeric_only")
+        if PANDAS_GT_130:
+            pctx = pytest.warns(FutureWarning, match="numeric_only")


Why is there a mismatch here between pandas and dask?

Right now, in both quantile and std, I'm using the same helper function, _numeric_only_maybe_warn. It replaces the old @_numeric_only decorator that the rest of the methods use.

If you look here, I'm not checking for a lower threshhold of pandas version to issue a warning, just that we're below 2.0:

dask/dask/dataframe/core.py

Line 152 in db4761a

warn_numeric_only = True

Pandas has different warning behavior for these two functions below 1.5, one of them warns, the other doesn't. I think it actually makes sense to warn of future changes to numeric behavior with both. And this way, I can also reuse the helper function.

dask/dataframe/tests/test_arithmetics_reduction.py

jrbourbeau · 2023-02-16T19:03:15Z

dask/dataframe/tests/test_arithmetics_reduction.py

+    with dctx:
+        assert_eq(ddf2.std(axis=1, **kwargs), expected2)


Similar comment here

dask/dataframe/tests/test_dataframe.py

jrbourbeau · 2023-02-16T20:25:58Z

dask/dataframe/tests/test_dataframe.py

+        with warnings.catch_warnings(record=True):
+            warnings.filterwarnings("ignore", category=FutureWarning)
+            # pandas issues a warning with 1.5, but not 1.3


Hmm, based on the comment here I don't quite understand this change as shouldn't be expecting a warning in 1.3

dask/dask/dataframe/tests/test_dataframe.py

Lines 1522 to 1530 in e893d53

@contextlib.contextmanager

def assert_numeric_only_default_warning(numeric_only):

if numeric_only is None and PANDAS_GT_150 and not PANDAS_GT_200:

ctx = pytest.warns(FutureWarning, match="default value of numeric_only")

else:

ctx = contextlib.nullcontext()

with ctx:

yield

Maybe I'm missing something tough

Yes, I made Dask warn below 1.5 here:

dask/dask/dataframe/core.py

Line 152 in db4761a

warn_numeric_only = True

See this comment for the reasoning.

jrbourbeau · 2023-02-16T20:26:18Z

dask/dataframe/tests/test_dataframe.py

+        with check_numeric_only_deprecation():
+            assert_eq(result, expected)


It looks like we're now warning at compute time -- is there a way we can avoid that?

Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com>

jrbourbeau

Thanks @j-bennet

Selectively fix DataFrame.std to work with numeric_only.

7983fc6

j-bennet requested a review from jrbourbeau February 14, 2023 23:17

github-actions bot added the dataframe label Feb 14, 2023

j-bennet mentioned this pull request Feb 14, 2023

Align numeric_only errors and warnings in DataFrame aggregations for pandas 2.0 compatibility #9952

Closed

2 tasks

j-bennet added the upstream label Feb 15, 2023

j-bennet added 4 commits February 15, 2023 08:57

test-upstream

8e0056d

Merge remote-tracking branch 'upstream/main' into j-bennet/9736-fix-t…

45021f0

…est-datetime-std-2

Warn below 1.5 as well.

cd7d457

Fix for pandas < 1.3.

806eabc

jrbourbeau reviewed Feb 16, 2023

View reviewed changes

j-bennet and others added 4 commits February 16, 2023 12:31

Merge branch 'main' into j-bennet/9736-fix-test-datetime-std-2

db4761a

Apply suggestions from code review

b583ec3

Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com>

Review feedback.

2a33820

Set default in quantile

568d5b1

jrbourbeau changed the title ~~Selectively fix std to work with numeric_only for pandas 2.0 compatibility~~ Fix std to work with numeric_only for pandas 2.0 Feb 17, 2023

jrbourbeau approved these changes Feb 17, 2023

View reviewed changes

jrbourbeau merged commit 0eb4bd0 into dask:main Feb 17, 2023

j-bennet deleted the j-bennet/9736-fix-test-datetime-std-2 branch February 18, 2023 02:13

	@contextlib.contextmanager
	def assert_numeric_only_default_warning(numeric_only):
	if numeric_only is None and PANDAS_GT_150 and not PANDAS_GT_200:
	ctx = pytest.warns(FutureWarning, match="default value of numeric_only")
	else:
	ctx = contextlib.nullcontext()

	with ctx:
	yield

		with check_numeric_only_deprecation():
		assert_eq(result, expected)

Uh oh!

Conversation

j-bennet commented Feb 14, 2023

Uh oh!

j-bennet commented Feb 16, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jrbourbeau left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants