Skip to content

[AURON #2180] Implement native support for nth_value window function#2203

Draft
weimingdiit wants to merge 1 commit intoapache:masterfrom
weimingdiit:feat/native-window-nth_value
Draft

[AURON #2180] Implement native support for nth_value window function#2203
weimingdiit wants to merge 1 commit intoapache:masterfrom
weimingdiit:feat/native-window-nth_value

Conversation

@weimingdiit
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #2180

Rationale for this change

Auron did not previously support native execution for Spark nth_value window function, so queries using nth_value would fall back to Spark execution.

Spark defines nth_value(input, offset) as returning the value at the offsetth row from the beginning of the current window frame, with offset starting from 1. When IGNORE NULLS is specified, null input rows should be skipped when counting toward the target position. To improve Spark compatibility, Auron needs native support for both the regular and IGNORE NULLS variants.

The current native window executor only supports cumulative row-frame semantics, so this change scopes native conversion to the frame that is already supported correctly:
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

What changes are included in this PR?

This PR adds end-to-end native support for nth_value window function across the Spark frontend, planner, and native engine.

The changes include:

  • adding new protobuf/planner window function variants for NTH_VALUE and NTH_VALUE_IGNORE_NULLS
  • extending Spark-side window conversion in NativeWindowBase to recognize Spark NthValue
  • using reflection to access NthValue fields so the code remains compatible with Spark 3.0, where this expression class is not available
  • adding a native Rust processor to evaluate nth_value
  • implementing both standard counting semantics and IGNORE NULLS semantics
  • restricting native conversion to the supported cumulative row frame
  • adding Rust-side execution tests
  • adding Scala integration tests to verify result correctness and native operator conversion

Are there any user-facing changes?

Yes.
Queries using nth_value can now be executed natively when they use the supported frame:
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

Both of the following forms are supported:

  • nth_value(input, offset)
  • nth_value(input, offset) IGNORE NULLS

Queries using unsupported window frames will continue to fall back to Spark execution.

How was this patch tested?

CI.

…ction

Signed-off-by: weimingdiit <weimingdiit@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement native support for nth_value window function

1 participant