Skip to content

[SPARK-29535][SQL] ADD some aggregate functions for Column in RelationalGroupedDataset.scala#26192

Closed
TomokoKomiyama wants to merge 1 commit intoapache:masterfrom
TomokoKomiyama:add-col
Closed

[SPARK-29535][SQL] ADD some aggregate functions for Column in RelationalGroupedDataset.scala#26192
TomokoKomiyama wants to merge 1 commit intoapache:masterfrom
TomokoKomiyama:add-col

Conversation

@TomokoKomiyama
Copy link
Contributor

What changes were proposed in this pull request?

Add five aggregation functions with Column type parameters.

mean(Column, Column*)
max(Column, Column*)
avg(Column, Column*)
min(Column, Column*)
sum(Column, Column*)

Why are the changes needed?

If we want pass Column type parameters to some aggregation functions with agg(), but it's redundant.

df.groupBy("_c0").agg(max($"_c1"))

Other aggregation functions such as pivot() can use Column arguments, but these aggregation functions can't use it.
df.groupBy("_c0").max($"_c1")

Does this PR introduce any user-facing change?

Yes.

We will be able to pass Column type parameters to aggregation functions(mean, max, avg, min, sum) without agg().
df.groupBy("_c0").max($"_c1")

How was this patch tested?

Manually tested.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't add APIs just for consistency. You can use agg, right?

@TomokoKomiyama
Copy link
Contributor Author

TomokoKomiyama commented Oct 23, 2019

@HyukjinKwon
We can use agg, but I think it would be easier for users to use these aggregation functions with Colmun type in the similar way with String one.
df.groupBy("_c0").max("_c1")
df.groupBy("_c0").max($"_c1")
Other functions can pass String type parameters because of the legacy reason and these will be removed someday, right?
I think it's better to pass Column type parameters to these aggregation functions(min, max...) so that the user is not confused at the time when legacy functions removed.

@dongjoon-hyun
Copy link
Member

Sorry, @TomokoKomiyama . I also agree with @HyukjinKwon .

cc @sarutak

@sarutak
Copy link
Member

sarutak commented Oct 25, 2019

@dongjoon-hyun She doesn't mind closing this PR.

@maropu
Copy link
Member

maropu commented Oct 25, 2019

+1, too. I'll close this. Thanks.

@maropu maropu closed this Oct 25, 2019
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Oct 25, 2019

Thank you all! Especially, @TomokoKomiyama and @sarutak .
IIRC, there was a discussion before similarly and we made the same decision (to minimize the API.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants