Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement numeric_only for skew and kurtosis #10258

Merged
merged 2 commits into from May 17, 2023

Conversation

phofl
Copy link
Collaborator

@phofl phofl commented May 4, 2023

  • Closes #xxxx
  • Tests added / passed
  • Passes pre-commit run --all-files

cc @jrbourbeau

Not sure what the policy Is, but this changes behaviour for pandas 2.0, with this pr

df = pd.DataFrame(
    {
        "int": [1, 2, 3, 4, 5, 6, 7, 8],
        "dt": [pd.NaT] + [datetime(2010, i, 1) for i in range(1, 8)],
    }
)
ddf = dd.from_pandas(df, npartitions=2).skew()

raises like pandas does, but without this pr it drops the dt column. I guess? we would prefer not raising here without deprecating first since we did not show any deprecation warnings?

Nothing changes with pandas < 2.0.

@phofl phofl closed this May 5, 2023
@phofl phofl reopened this May 5, 2023
Copy link
Contributor

@j-bennet j-bennet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Make sure to test upstream as well.

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @phofl. Just one small comment

@@ -1426,6 +1426,36 @@ def test_reductions_frame_dtypes_numeric_only(func):
)


@pytest.mark.parametrize("func", ["skew", "kurtosis"])
def test_skew_kurt_numeric_only_false(func):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an assert_eq to make sure the result from dask matches pandas?

Also, is numeric_only=True tested elsewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of them are tested somewhere else, that's why I added only the False case here.

@j-bennet
Copy link
Contributor

j-bennet commented May 8, 2023

I guess? we would prefer not raising here without deprecating first since we did not show any deprecation warnings?

I would rather raise. We should have added a deprecation earlier, that's true, but at least right now we have the chance to catch up with pandas' behavior. Supporting diverging behaviors and following one step behind feels like extra work for not much benefit. Just my 2c.

@phofl
Copy link
Collaborator Author

phofl commented May 16, 2023

That's what we are doing, so all good

@jrbourbeau jrbourbeau merged commit bcee469 into dask:main May 17, 2023
53 checks passed
@phofl phofl deleted the numeric_only_skew branch May 17, 2023 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants