-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-54292][SQL] Support aggregate functions and GROUP BY in |> SELECT pipe operators #52987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-54292][SQL] Support aggregate functions and GROUP BY in |> SELECT pipe operators #52987
Conversation
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this new feature, @dtenedor .
|
Could you update this PR, @dtenedor ? |
| .doc("When true, aggregate functions can be used in |> SELECT and other pipe operator " + | ||
| "clauses without requiring the |> AGGREGATE keyword. When false, aggregate functions " + | ||
| "must be used exclusively with the |> AGGREGATE clause for proper aggregation semantics.") | ||
| .version("4.2.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
|
This one is passing CI and seems self-contained and safe, do you approve? |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, +1, LGTM (for Apache Spark 4.2.0). Thank you, @dtenedor .
|
OK, great. I will merge it to the |
| if (ctx.aggregationClause != null && !conf.pipeOperatorAllowAggregateInSelect) { | ||
| operationNotAllowed( | ||
| "|> SELECT with a GROUP BY clause is not allowed when " + | ||
| "spark.sql.allowAggregateInSelectWithPipeOperator is disabled", ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me wondering why we need the config. If people don't use agg function in pipe SELECT, then they just don't use it. If they use it, then they will see this error message and enable this config immediately.
cc @srielau
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the chance to speak with Wenchen in person about this -- it is a good point, the config doesn't really help in any scenarios here. I will remove it.
|
Thanks all for reviews -- merging to master. |
What changes were proposed in this pull request?
This PR allows aggregate functions and
GROUP BYto be used in|> SELECTpipe operators. Previously, these were only allowed in|> AGGREGATEpipe operators.Example queries now supported:
-- Aggregate in SELECT
table employees |> select sum(salary) as total_salary;
-- Aggregate with GROUP BY
table orders |> select customer_id, count(*) as order_count group by customer_id;
-- Chained operations
table data |> where status = 'active' |> select sum(value) as total;
Why are the changes needed?
By lifting this restriction (with an opt-out mechanism), we make the SQL pipe operator syntax more intuitive while maintaining backwards compatibility.
Does this PR introduce any user-facing change?
Yes, but it is backwards compatible:
|> SELECTwill now work instead of throwingPIPE_OPERATOR_CONTAINS_AGGREGATE_FUNCTIONerrors|> AGGREGATEor non-aggregate pipe operators are unaffectedBackwards Compatibility Guarantee:
How was this patch tested?
Unit Tests: Added comprehensive test coverage in
pipe-operators.sql:|> AGGREGATEstill works correctlyGolden Files: Regenerated and verified
pipe-operators.sql.outand analyzer resultsTest Execution: All tests pass successfully:
Was this patch authored or co-authored using generative AI tooling?
Yes,
claude-4.5-sonnetwith manual editing and approval.