Skip to content

[AURON #2181] Implement native support for cume_dist window function#2205

Draft
weimingdiit wants to merge 1 commit intoapache:masterfrom
weimingdiit:feat/native-window-cume_dist
Draft

[AURON #2181] Implement native support for cume_dist window function#2205
weimingdiit wants to merge 1 commit intoapache:masterfrom
weimingdiit:feat/native-window-cume_dist

Conversation

@weimingdiit
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #2181

Rationale for this change

Auron does not currently support the cume_dist() window function in native execution, so queries using it fall back to Spark.

cume_dist() must follow Spark semantics exactly: it returns the number of rows preceding or equal to the current row within the partition ordering, divided by the total number of rows in the partition, and rows in the same peer group must return the same value. Because this depends on the full partition size and peer group boundaries, it cannot be implemented correctly with the existing purely streaming window path.

What changes are included in this PR?

This PR adds native support for cume_dist() window function end to end.

The main changes are:

  • Add CUME_DIST to the window function protobuf and planner conversion path.
  • Extend NativeWindowBase to recognize Spark's CumeDist expression and serialize it into the native window plan.
  • Add a native CumeDistProcessor that computes cume_dist() with Spark-compatible semantics:
    • rows in the same peer group produce the same value
    • the result is (peer_group_end_position) / (partition_size)
    • single-row partitions return 1.0
  • Extend native window execution with a full-partition processing path for window functions that require complete partition context.
  • Use that full-partition path for cume_dist() to ensure correctness.
  • Add regression tests on both the native execution side and the Spark SQL side.

Are there any user-facing changes?

Yes.

Queries using cume_dist() window function can now stay on the native execution path instead of falling back to Spark, as long as the rest of the plan is supported by Auron. No new user-facing configuration is introduced.

How was this patch tested?

CI.

…ction

Signed-off-by: weimingdiit <weimingdiit@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement native support for cume_dist window function

1 participant