[SPARK-29535][SQL] ADD some aggregate functions for Column in RelationalGroupedDataset.scala#26192
[SPARK-29535][SQL] ADD some aggregate functions for Column in RelationalGroupedDataset.scala#26192TomokoKomiyama wants to merge 1 commit intoapache:masterfrom
Conversation
|
Can one of the admins verify this patch? |
HyukjinKwon
left a comment
There was a problem hiding this comment.
I won't add APIs just for consistency. You can use agg, right?
|
@HyukjinKwon |
|
Sorry, @TomokoKomiyama . I also agree with @HyukjinKwon . cc @sarutak |
|
@dongjoon-hyun She doesn't mind closing this PR. |
|
+1, too. I'll close this. Thanks. |
|
Thank you all! Especially, @TomokoKomiyama and @sarutak . |
What changes were proposed in this pull request?
Add five aggregation functions with Column type parameters.mean(Column, Column*)max(Column, Column*)avg(Column, Column*)min(Column, Column*)sum(Column, Column*)Why are the changes needed?
If we want pass Column type parameters to some aggregation functions with agg(), but it's redundant.df.groupBy("_c0").agg(max($"_c1"))Other aggregation functions such as pivot() can use Column arguments, but these aggregation functions can't use it.
df.groupBy("_c0").max($"_c1")Does this PR introduce any user-facing change?
Yes.We will be able to pass Column type parameters to aggregation functions(mean, max, avg, min, sum) without agg().
df.groupBy("_c0").max($"_c1")How was this patch tested?
Manually tested.