Skip to content

[SPARK-44870][SQL] Convert HashAggregate to SortAggregate if all grouping expressions are in child output orderings#42557

Closed
wankunde wants to merge 1 commit intoapache:masterfrom
wankunde:hash_to_sort
Closed

[SPARK-44870][SQL] Convert HashAggregate to SortAggregate if all grouping expressions are in child output orderings#42557
wankunde wants to merge 1 commit intoapache:masterfrom
wankunde:hash_to_sort

Conversation

@wankunde
Copy link
Contributor

@wankunde wankunde commented Aug 18, 2023

What changes were proposed in this pull request?

When we try to convert a HashAggregate to SortAggregate in rule ReplaceHashWithSortAgg, add a new function SortOrder.satisfiesExpressions(orderings: Seq[SortOrder], groupExpressions: Seq[Expression]) to determine if the child output ordering satisfies the grouping expressions.

example query 1 :

SELECT a, b, count(1)
FROM values(1, 1, 1), (2, 2, 2) t1(a, b, c)
JOIN values(1, 1, 1), (2, 2, 2) t2(d, e, f)
ON a = d
AND b = e
GROUP by b, a

The grouping expressions are b, a, and the child output orderings are a.asc, b.asc, SortOrder.satisfiesExpressions() is true.

example query 2 :

SELECT a, b, count(1)
FROM values(1, 1, 1), (2, 2, 2) t1(a, b, c)
JOIN values(1, 1, 1), (2, 2, 2) t2(d, e, f)
ON a = d
AND b = e
GROUP by a, b, d

The grouping expressions are a, b, d, and the child output orderings are a.asc, b.asc, but we still can find d in a.asc.children, so SortOrder.satisfiesExpressions() is true..

Why are the changes needed?

Convert more HashAggregate to SortAggregate to improve aggregate performance.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added UT

…ping expressions are in child output orderings
@github-actions github-actions bot added the SQL label Aug 18, 2023
@wankunde wankunde changed the title [WIP][SPARK-44870][SQL] Convert HashAggregate to SortAggregate if all grouping expressions are in child output orderings [SPARK-44870][SQL] Convert HashAggregate to SortAggregate if all grouping expressions are in child output orderings Aug 19, 2023
@wankunde
Copy link
Contributor Author

Hi, @wangyum Could you help to review this PR? Thanks

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Nov 28, 2023
@github-actions github-actions bot closed this Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant