Add ``numeric_only`` support to ``min``, ``max`` and ``prod`` by phofl · Pull Request #10219 · dask/dask

phofl · 2023-04-24T15:06:20Z

Closes #xxxx
Tests added / passed
Passes pre-commit run --all-files

Sits on top of #10194

# Conflicts: # dask/dataframe/_compat.py # dask/dataframe/tests/test_arithmetics_reduction.py

hendrikmakait

The code generally looks good to me, someone with more knowledge of this part of the codebase should have another look though.

j-bennet · 2023-04-28T22:53:41Z

dask/dataframe/core.py

            return result

-    @_numeric_only
    @derived_from(pd.DataFrame)


I see you removed the decorator. The decorator did two things, raise on numeric_only=False, and filter the underlying data on numeric_only=True. Are they both irrelevant now?

pandas takes care of filtering for numeric_only=True, we technically don't need it here. Is there a reason why we would want to do it ourselves?

+1 for offloading to pandas where we can

j-bennet · 2023-04-28T22:55:03Z

dask/dataframe/tests/test_arithmetics_reduction.py


    ddf = dd.from_pandas(df, 3)
-    funcs = ["sum"]
+    funcs = ["sum", "prod", "product", "min", "max"]


It might be good to move these in pytest.mark.parametrize, if possible. There, you could also mark some as xfail.

Yeah I thought about this as well, modelled after the test below that uses the same pattern. Not sure why that was, performance might be a reason? Happy to change though if you prefer.

One thing about xfails: They really slow down the test suite if you have too many of them.

Not sure why the original tests were formatted this way. Not a huge deal either way, but FWIW I also prefer pytest.mark.parametrize. Pushed a tiny commit that moves funcs into a parameterization.

One thing about xfails: They really slow down the test suite if you have too many of them.

Yeah, fair point. We could avoid pytest.mark.parametrize in these cases or even just mark them as skip instead of xfail -- this isn't exactly the same, but may be close enough

Yeah I prefer parametrisation as well. Saw the loop logic a couple of times in the test suite, so wasn't sure if this was preferred here.

Yeah skip is better performance wise

jrbourbeau

Thanks @phofl for the updates here and @j-bennet for reviewing

jrbourbeau · 2023-04-30T16:46:52Z

dask/dataframe/tests/test_arithmetics_reduction.py

-                getattr(ddf, func)()
+                getattr(ddf, func)(numeric_only=False)
+
+            warning = FutureWarning


Nice -- this is a neat way to determine warning behavior later on

jrbourbeau · 2023-04-30T16:52:23Z

dask/dataframe/core.py

            return result

-    @_numeric_only
    @derived_from(pd.DataFrame)


+1 for offloading to pandas where we can

jrbourbeau · 2023-04-30T16:53:43Z

dask/dataframe/tests/test_arithmetics_reduction.py

            getattr(df, func)(**kwargs),
            getattr(ddf, func)(**kwargs),
-            check_dtype=func in ["mean", "max"],
+            check_dtype=func in ["mean"],


I was curious if this was still needed, so I removed it and locally things passed. Including a small change to remove the special check_dtype handling here -- let's just use the default of always checking.

Yeah I think this makes sense. Didn't pay too much attention, since I expect that we get rid of this test soonish

jrbourbeau · 2023-04-30T16:53:51Z

dask/dataframe/tests/test_arithmetics_reduction.py

            getattr(df_numerics, func)(),
            getattr(ddf_numerics, func)(),
-            check_dtype=func in ["mean", "max"],
+            check_dtype=func in ["mean"],


Similar thing here

jrbourbeau · 2023-04-30T16:57:00Z

dask/dataframe/tests/test_arithmetics_reduction.py


    ddf = dd.from_pandas(df, 3)
-    funcs = ["sum"]
+    funcs = ["sum", "prod", "product", "min", "max"]


Not sure why the original tests were formatted this way. Not a huge deal either way, but FWIW I also prefer pytest.mark.parametrize. Pushed a tiny commit that moves funcs into a parameterization.

One thing about xfails: They really slow down the test suite if you have too many of them.

Yeah, fair point. We could avoid pytest.mark.parametrize in these cases or even just mark them as skip instead of xfail -- this isn't exactly the same, but may be close enough

jrbourbeau · 2023-04-30T17:42:29Z

Note: test failures are unrelated to the changes in this PR and are being addressed over in #10241

phofl added 6 commits April 13, 2023 17:32

Start implementing numeric only

0b554f3

Implement numeric_only support for DataFrame.sum

f64009f

Merge remote-tracking branch 'upstream/main' into numeric_only

8a5e7d9

Start with others

d41dc68

Update

3b45137

Add numeric only support to min, max and prod

e6dc2c0

github-actions bot added the dataframe label Apr 24, 2023

phofl added 4 commits April 27, 2023 10:31

Merge remote-tracking branch 'upstream/main' into numeric_only_others

c56e863

Merge remote-tracking branch 'upstream/main' into numeric_only_others

84499df

# Conflicts: # dask/dataframe/_compat.py # dask/dataframe/tests/test_arithmetics_reduction.py

Refactor

fcd3c8b

Refactor

3656a17

phofl mentioned this pull request Apr 28, 2023

Enable numeric_only=False for DataFrame.count #10234

Merged

3 tasks

phofl requested a review from jrbourbeau April 28, 2023 11:04

hendrikmakait reviewed Apr 28, 2023

View reviewed changes

j-bennet reviewed Apr 28, 2023

View reviewed changes

phofl and others added 3 commits April 30, 2023 11:37

Merge remote-tracking branch 'upstream/main' into numeric_only_others

bf6368e

Refactor

726c07b

Minor test refactor

3470668

jrbourbeau approved these changes Apr 30, 2023

View reviewed changes

Typo

10cd61c

jrbourbeau merged commit 7db8efc into dask:main Apr 30, 2023

phofl deleted the numeric_only_others branch April 30, 2023 21:59

Uh oh!

Conversation

phofl commented Apr 24, 2023

Uh oh!

hendrikmakait left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jrbourbeau Apr 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

j-bennet Apr 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jrbourbeau left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jrbourbeau Apr 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jrbourbeau commented Apr 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jrbourbeau Apr 30, 2023 •

edited

Loading

j-bennet Apr 28, 2023 •

edited

Loading

jrbourbeau Apr 30, 2023 •

edited

Loading