Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45795][SQL] DS V2 supports push down Mode #43661

Closed
wants to merge 2 commits into from

Conversation

beliefer
Copy link
Contributor

@beliefer beliefer commented Nov 5, 2023

What changes were proposed in this pull request?

This PR will translate the aggregate function MODE for pushdown.

The constructor of aggregate function MODE has a deterministic parameter. When multiple values have the same greatest frequency then either any of values is returned if deterministic is false or is not defined, or the lowest value is returned if deterministic is true.
If deterministic is true, the semantics of deterministic is the same as the syntax supported by some databases (e.g. H2, Postgres) show below.
The syntax is:
MODE() WITHIN GROUP (ORDER BY col).

Note: MODE() WITHIN GROUP (ORDER BY col) doesn't support DISTINCT keyword.

Why are the changes needed?

DS V2 supports push down Mode

Does this PR introduce any user-facing change?

'No'.
New feature.

How was this patch tested?

New test cases.

Was this patch authored or co-authored using generative AI tooling?

'No'.

@github-actions github-actions bot added the SQL label Nov 5, 2023
@beliefer beliefer force-pushed the SPARK-45795 branch 2 times, most recently from 4aff81f to d0d33ea Compare November 5, 2023 11:19
@beliefer
Copy link
Contributor Author

beliefer commented Nov 6, 2023

ping @huaxingao @cloud-fan

@beliefer
Copy link
Contributor Author

@huaxingao @cloud-fan

@beliefer beliefer closed this in f69f791 Dec 16, 2023
@beliefer
Copy link
Contributor Author

Merged to master.
@HyukjinKwon Thank you!

@@ -41,6 +41,7 @@
* <li><pre>REGR_R2(input1, input2)</pre> Since 3.4.0</li>
* <li><pre>REGR_SLOPE(input1, input2)</pre> Since 3.4.0</li>
* <li><pre>REGR_SXY(input1, input2)</pre> Since 3.4.0</li>
* <li><pre>MODE(input1[, inverse])</pre> Since 4.0.0</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very hacky. It's kind of we re-define the SQL semantic of the second parameter for MODE function only. Do we have existing examples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Let's improve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants