Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SeriesGroupBy.agg does not accept Aggregates when passed as kwargs #10836

Open
flying-sheep opened this issue Jan 19, 2024 · 3 comments
Open
Labels
bug Something is broken dataframe

Comments

@flying-sheep
Copy link

It uses pandas’ validation function, which doesn’t recognize dask’s Aggregate class:

columns, arg = validate_func_kwargs(kwargs)

@github-actions github-actions bot added the needs triage Needs a response from a contributor label Jan 19, 2024
@hendrikmakait
Copy link
Member

@flying-sheep, thanks for creating this issue. Please provide a minimal reproducer (see http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports for guidelines) to allow us to investigate your problem.

@hendrikmakait hendrikmakait added needs info Needs further information from the user and removed needs triage Needs a response from a contributor labels Jan 19, 2024
@flying-sheep
Copy link
Author

flying-sheep commented Jan 19, 2024

Not really necessary, you can directly see the problem in the linked line, but sure.

import pandas as pd
import dask.dataframe as dd

custom_sum = dd.Aggregation(
    name='custom_sum',
    chunk=lambda s: s.sum(),
    agg=lambda s0: s0.sum()
)

df = pd.DataFrame(dict(a=[1, 1, 1, 1, 1], g=[5, 6, 6, 6, 7]))
ddf = dd.from_pandas(df, npartitions=2)

ddf.groupby('g')['a'].agg(sum="sum").compute()  # works
ddf.groupby('g')['a'].agg(sum=custom_sum).compute()  # broken

@hendrikmakait
Copy link
Member

hendrikmakait commented Jan 19, 2024

Thanks for adding the reproducer! Please keep in mind that the time of contributors is limited, so having a reproducer ready allows us to move more quickly. It is also a great starting point for a regression test.

@hendrikmakait hendrikmakait added dataframe bug Something is broken and removed needs info Needs further information from the user labels Jan 19, 2024
@flying-sheep flying-sheep changed the title SeriesGroupBy does not accept dd.Aggregates SeriesGroupBy does not accept dd.Aggregates when passed as kwargs Jan 23, 2024
@flying-sheep flying-sheep changed the title SeriesGroupBy does not accept dd.Aggregates when passed as kwargs SeriesGroupBy.agg does not accept Aggregates when passed as kwargs Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken dataframe
Projects
None yet
Development

No branches or pull requests

2 participants