Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40707][CONNECT] Add groupby to connect DSL and test more than one grouping expressions #38155

Closed

Conversation

amaliujia
Copy link
Contributor

@amaliujia amaliujia commented Oct 7, 2022

What changes were proposed in this pull request?

  1. Add groupby to connect DSL and test more than one grouping expressions
  2. Pass limited data types through connect proto for LocalRelation's attributes.
  3. Cleanup unused Trait in the testing code.

Why are the changes needed?

Enhance connect's support for GROUP BY.

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT

@amaliujia
Copy link
Contributor Author

R: @cloud-fan

@amaliujia amaliujia changed the title [SPARK-40707] Add groupby to connect DSL and test more than one grouping expressions [SPARK-40707][CONNECT] Add groupby to connect DSL and test more than one grouping expressions Oct 7, 2022
@AmplabJenkins
Copy link

Can one of the admins verify this patch?


val groupingSet = proto.Aggregate.GroupingSet.newBuilder()
for (groupingExpr <- groupingExprs) {
groupingSet.addAggregateExpressions(groupingExpr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we add group by expression to aggregate expressions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed the proto to make this clear.

@amaliujia
Copy link
Contributor Author

amaliujia commented Oct 10, 2022

@cloud-fan PR updated. PLAT.

Expression filter = 2;
}
repeated Expression grouping_expressions = 2;
repeated AggregateFunction result_expressions = 3;
Copy link
Contributor Author

@amaliujia amaliujia Oct 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan I still keep this as AggregateFunction. proto.Expression is a too general type for now.

connect does not have a NamedExpression. I will follow up on this to improve.

This PR is to improve the grouping_expressions anyway

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

followup improvement SGTM. I don't think we even need AggregateFunction. The SQL parser usually just generate UnresolvedFunction, and the analyzer will look up the function and figure out if it's scalar/aggregate/window/table value function.

@amaliujia amaliujia force-pushed the support_more_than_one_grouping_set branch from cc48ed7 to 33f59ed Compare October 11, 2022 02:29
* This object offers methods to convert to/from connect proto to catalyst types.
*/
object TypeProtoConverter {
def toCatalystType(t: proto.Type): DataType = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we name it proto.DataType? And rename this object to DateTypeProtoConverter

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 4e4a848 Oct 11, 2022
@amaliujia amaliujia deleted the support_more_than_one_grouping_set branch October 11, 2022 04:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants