Implement Series.aggregate and agg#816
Conversation
Codecov Report
@@ Coverage Diff @@
## master #816 +/- ##
==========================================
+ Coverage 94.34% 94.36% +0.02%
==========================================
Files 32 32
Lines 5849 5854 +5
==========================================
+ Hits 5518 5524 +6
+ Misses 331 330 -1
Continue to review full report at Codecov.
|
| raise ValueError("If the given function is a list, it " | ||
| "should only contains function names as strings.") | ||
| elif isinstance(func, str): | ||
| return eval("self.{}()".format(func)) |
There was a problem hiding this comment.
We should avoid eval() as far as possible. getattr(self, func)() instead?
There was a problem hiding this comment.
@ueshin Thanks for review ueshin :) I totally agree. fixed it !
| if isinstance(func, list): | ||
| if all((isinstance(f, str) for f in func)): | ||
| rows = OrderedDict((f, eval("self.{}()".format(f), dict(self=self))) for f in func) | ||
| return Series(rows) |
There was a problem hiding this comment.
This runs Spark jobs many times. In this case, I think we can reuse DataFrame's aggregate.
There was a problem hiding this comment.
@ueshin Thanks again!! fixed it :)
Anyway I have a question, (It may be a very basic question though 😿 )
is it right the every function call in OrderedDict comprehension(when run eval) call spark job each time?
There was a problem hiding this comment.
yes, each aggregate function for Series call triggers sdf.head(2) in
koalas/databricks/koalas/series.py
Lines 3139 to 3149 in cbcb502
There was a problem hiding this comment.
@ueshin Oh now i totally got it. Thanks ueshin!! 😃
ueshin
left a comment
There was a problem hiding this comment.
Otherwise, LGTM, pending tests.
| import re | ||
| import inspect | ||
| from collections import Iterable | ||
| from collections import Iterable, OrderedDict |
There was a problem hiding this comment.
nit: no need this change anymore.
Softagram Impact Report for pull/816 (head commit: cbcb502)⭐ Change Overview
📄 Full report
Impact Report explained. Give feedback on this report to support@softagram.com |
|
Thanks! merging. |
* upstream/master: Updated the koalas logo in readme.md Adding koalas-logo without label Adding Koalas logo to readme Adding koalas logo Clean pandas usage in frame.agg (databricks#821) Implement Series.aggregate and agg (databricks#816) Raise a more helpful error for duplicated columns in Join (databricks#820)

Like pandas Series.aggregate (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.aggregate.html)
I implemented aggregate function for series.
Example:
(above example is using pandas one)