Skip to content

Allow str and list in aggfunc in DataFrameGroupby.agg#828

Merged
ueshin merged 13 commits into
databricks:masterfrom
charlesdong1991:named_agg
Oct 1, 2019
Merged

Allow str and list in aggfunc in DataFrameGroupby.agg#828
ueshin merged 13 commits into
databricks:masterfrom
charlesdong1991:named_agg

Conversation

@charlesdong1991
Copy link
Copy Markdown
Contributor

right now, when I look at Groupby, it does not accept str or list, but in pandas, it's allowed. So before implementing named aggregation, i think this is a better thing to deal first.
e.g. in pandas we could have:

Screen Shot 2019-09-24 at 10 58 52 PM

now koalas can also accept this:

Screen Shot 2019-09-24 at 11 00 13 PM

@charlesdong1991
Copy link
Copy Markdown
Contributor Author

I am not sure if this will pass or not, since group key order is changed as you could see from the picture i attach. Any suggestions on a fix is welcome.

@charlesdong1991
Copy link
Copy Markdown
Contributor Author

charlesdong1991 commented Sep 27, 2019

emm, even with current setup, animals.groupby('kind').agg({'height': ['min', 'max']}) will have different order in index compared to pandas. Is it mandatory to have the exactly same output with pandas, the result looks correct except the order of rows changes. Probably should reorder the rows then before outputting the result. @ueshin @HyukjinKwon any thoughts?

@ueshin
Copy link
Copy Markdown
Contributor

ueshin commented Sep 27, 2019

@charlesdong1991 We don't guarantee the row order without a special reason, e.g., Series.value_counts has a sort argument.

@charlesdong1991
Copy link
Copy Markdown
Contributor Author

thanks for your comment, i slightly changed the test a bit to fix this order issue that failed tests, and added some docstrings in agg. Feel free to take a look. @ueshin

Comment thread databricks/koalas/groupby.py Outdated
@charlesdong1991
Copy link
Copy Markdown
Contributor Author

any follow-up review will be appreciated a lot ^^ @ueshin @HyukjinKwon

Comment thread databricks/koalas/groupby.py Outdated

else:
group_keyname = [key.name for key in self._groupkeys]
agg_cols = [key for key in self._kdf.columns if key not in group_keyname]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess self._agg_columns should work?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one! thanks! @ueshin

Copy link
Copy Markdown
Contributor

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending tests.

@ueshin
Copy link
Copy Markdown
Contributor

ueshin commented Oct 1, 2019

@charlesdong1991 oh, one more thing I'd like to ask.
Could you add tests for multi-index columns as well, just in case?

@charlesdong1991
Copy link
Copy Markdown
Contributor Author

@ueshin sure! Added and it passes tests locally, let's see if it's okay on CI.

@ueshin
Copy link
Copy Markdown
Contributor

ueshin commented Oct 1, 2019

@charlesdong1991 Thanks!
LGTM again, pending tests.

Comment thread databricks/koalas/tests/test_groupby.py Outdated

for aggfunc in agg_funcs:
sorted_agg_kdf = kdf.groupby('kind').agg(aggfunc)
sorted_agg_pdf = pdf.groupby('kind').agg(aggfunc)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need .sort_index()?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emm, seems it has the same index as pandas when it's numeric. but could add one to ensure

@softagram-bot
Copy link
Copy Markdown

Softagram Impact Report for pull/828 (head commit: f1e3c6b)

⭐ Change Overview

Showing the changed files, dependency changes and the impact - click for full size
(Open in Softagram Desktop for full details)

📄 Full report

Impact Report explained. Give feedback on this report to support@softagram.com

@ueshin
Copy link
Copy Markdown
Contributor

ueshin commented Oct 1, 2019

The latest failure seems not related to this PR.
Let me retrigger the tests.

@ueshin
Copy link
Copy Markdown
Contributor

ueshin commented Oct 1, 2019

Thanks! merging.

@ueshin ueshin merged commit 1462291 into databricks:master Oct 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants