New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-40707][CONNECT] Add groupby to connect DSL and test more than one grouping expressions #38155
[SPARK-40707][CONNECT] Add groupby to connect DSL and test more than one grouping expressions #38155
Conversation
R: @cloud-fan |
Can one of the admins verify this patch? |
connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala
Outdated
Show resolved
Hide resolved
|
||
val groupingSet = proto.Aggregate.GroupingSet.newBuilder() | ||
for (groupingExpr <- groupingExprs) { | ||
groupingSet.addAggregateExpressions(groupingExpr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we add group by expression to aggregate expressions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed the proto to make this clear.
@cloud-fan PR updated. PLAT. |
Expression filter = 2; | ||
} | ||
repeated Expression grouping_expressions = 2; | ||
repeated AggregateFunction result_expressions = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan I still keep this as AggregateFunction
. proto.Expression
is a too general type for now.
connect does not have a NamedExpression
. I will follow up on this to improve.
This PR is to improve the grouping_expressions anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
followup improvement SGTM. I don't think we even need AggregateFunction
. The SQL parser usually just generate UnresolvedFunction
, and the analyzer will look up the function and figure out if it's scalar/aggregate/window/table value function.
cc48ed7
to
33f59ed
Compare
* This object offers methods to convert to/from connect proto to catalyst types. | ||
*/ | ||
object TypeProtoConverter { | ||
def toCatalystType(t: proto.Type): DataType = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we name it proto.DataType
? And rename this object to DateTypeProtoConverter
thanks, merging to master! |
What changes were proposed in this pull request?
groupby
to connect DSL and test more than one grouping expressionsTrait
in the testing code.Why are the changes needed?
Enhance connect's support for GROUP BY.
Does this PR introduce any user-facing change?
No
How was this patch tested?
UT