[SPARK-56916][SQL] Simplify ElementAt array codegen under ANSI mode by gengliangwang · Pull Request #55941 · apache/spark

gengliangwang · 2026-05-17T23:31:15Z

Title: [SPARK-56916][SQL] Simplify ElementAt array codegen under ANSI mode
Base: master (independent)
Head: gengliangwang:SPARK-56916-element-at

What changes were proposed in this pull request?

Introduce ArrayUtils.java with a single helper elementAtIndexExact(int length, int index, QueryContext context) and use it from ElementAt's ArrayType branch in both doGenCode and doElementAt (eval).

The helper normalizes a 1-based element_at index against the array length and returns the 0-based position, throwing invalidElementAtIndexError for out-of-bound and invalidIndexOfZeroError for zero index. The caller still emits the type-specific arr.get(pos, dataType) (not the helper, since the return type depends on the array element type).

The non-ANSI branch is left inline because it can choose between defaultValueOutOfBound (an Option[Expression] that requires codegen access) or null.

Why are the changes needed?

Part of SPARK-56908 (umbrella). The ANSI ElementAt codegen body was the largest single inline body in collectionOperations.scala -- the helper collapses ~12 lines to ~3 per call site.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

build/sbt "catalyst/testOnly *CollectionExpressionsSuite"

59/59 pass.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 1.x

### What changes were proposed in this pull request? Introduce `ArrayUtils.java` with a single helper `elementAtIndexExact(int length, int index, QueryContext context)` and use it from `ElementAt`'s `ArrayType` branch in both `doGenCode` and `doElementAt` (eval). The helper normalizes a 1-based `element_at` index against the array length and returns the 0-based position, throwing `invalidElementAtIndexError` for out-of-bound and `invalidIndexOfZeroError` for zero index. The caller still emits the type-specific `arr.get(pos, dataType)` (not the helper, since the return type depends on the array element type). The non-ANSI branch is left inline because it can choose between `defaultValueOutOfBound` (an `Option[Expression]` that requires codegen access) or `null`. ### Why are the changes needed? Part of SPARK-56908 (umbrella). The ANSI `ElementAt` codegen body was the largest single inline body in `collectionOperations.scala` -- the helper collapses ~12 lines to ~3 per call site. ### Does this PR introduce _any_ user-facing change? No. The compiled behavior is identical; only the emitted Java source text changes. ### How was this patch tested? ``` build/sbt "catalyst/testOnly *CollectionExpressionsSuite" ``` 59/59 pass. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor 1.x

gengliangwang · 2026-05-17T23:32:34Z

Stack overview (SPARK-56908 umbrella)

This PR is part of a stack of 8 PRs against SPARK-56908. Order:

[SPARK-56909][SQL] Simplify Cast to int/long codegen under ANSI mode #55934 — [SPARK-56909][SQL] Simplify Cast to int/long codegen under ANSI mode (this stack base)
[SPARK-56910][SQL] Simplify Cast to byte/short codegen under ANSI mode #55935 — [SPARK-56910][SQL] Simplify Cast to byte/short codegen under ANSI mode
[SPARK-56911][SQL] Simplify Cast to decimal codegen under ANSI mode #55936 — [SPARK-56911][SQL] Simplify Cast to decimal codegen under ANSI mode
[SPARK-56912][SQL] Simplify Cast to boolean codegen under ANSI mode #55937 — [SPARK-56912][SQL] Simplify Cast to boolean codegen under ANSI mode
[SPARK-56914][SQL] Simplify decimal arithmetic codegen under ANSI mode #55939 — [SPARK-56914][SQL] Simplify decimal arithmetic codegen under ANSI mode (depends on [SPARK-56911][SQL] Simplify Cast to decimal codegen under ANSI mode #55936)
[SPARK-56913][SQL] Simplify BinaryArithmetic byte/short codegen under ANSI mode #55938 — [SPARK-56913][SQL] Simplify BinaryArithmetic byte/short codegen under ANSI mode (independent)
[SPARK-56915][SQL] Simplify MakeDate/MakeInterval codegen under ANSI mode #55940 — [SPARK-56915][SQL] Simplify MakeDate/MakeInterval codegen under ANSI mode (independent)
[SPARK-56916][SQL] Simplify ElementAt array codegen under ANSI mode #55941 — [SPARK-56916][SQL] Simplify ElementAt array codegen under ANSI mode (independent)

PRs 1-4 are linearly stacked on each other (each branch is based on the previous one). PR 5 (decimal arithmetic) is stacked on top of PR 3 (cast decimal) since it uses CastUtils.changePrecisionExact. PRs 6, 7, 8 branch off master independently.

cloud-fan

Summary

Prior state and problem. ElementAt.doGenCode for ANSI mode contained ~12 lines of inline codegen for index validation (length check, zero-index check, sign normalization), with the same logic duplicated in Scala in doElementAt (eval). Per the SPARK-56908 umbrella, this was the largest single inline body in collectionOperations.scala.

Design approach. Extract the ANSI-mode validation into a Java static helper ArrayUtils.elementAtIndexExact(int length, int index, QueryContext) and call it from both eval and codegen. Each method now splits case _: ArrayType into a failOnError branch (uses the helper) and a non-failOnError branch (kept inline — only the ANSI branch is unified in this PR).

Key design decisions. The helper returns the validated 0-based int; the type-specific arr.get(pos, dataType) remains at the call site so the helper stays independent of element type.

Implementation sketch.

New file ArrayUtils.java in org.apache.spark.sql.catalyst.expressions with a single static method.
ElementAt.doElementAt and ElementAt.doGenCode each gain a new case _: ArrayType if failOnError => branch.

Behavior verified case-by-case against pre-PR (OOB, zero index, negative index, empty array). Codegen scaffolding (nullCheck) is identical to pre-PR.

LGTM with two minor nits inline.

cloud-fan · 2026-05-18T12:30:08Z

+ * of inline length / zero / sign-normalization codegen with a return of
+ * the normalized array position (0-based).
+ */
+public final class ArrayUtils {


The stack uses per-operation naming (CastUtils, ArithmeticUtils, DateTimeConstructorUtils). ArrayUtils is broader than its single element_at-specific helper, and there's already an ArrayExpressionUtils.java in the same package that serves array-expression helpers. Risk: future readers won't know which utility class to look in, and ArrayUtils becomes a magnet for unrelated array helpers.

Consider renaming to ElementAtUtils (matches DateTimeConstructorUtils-style per-operation naming), or folding elementAtIndexExact into the existing ArrayExpressionUtils. WDYT?

cloud-fan · 2026-05-18T12:30:08Z

+ * {@link ElementAt} on {@code ArrayType}: a single call replaces ~12 lines
+ * of inline length / zero / sign-normalization codegen with a return of
+ * the normalized array position (0-based).


The a single call replaces ~12 lines ... clause describes the PR's effect rather than the helper's contract — once merged, the original 12-line inline form isn't visible to future readers. Peer CastUtils.java doesn't include similar line-count claims.

Suggested change

* {@link ElementAt} on {@code ArrayType}: a single call replaces ~12 lines

* of inline length / zero / sign-normalization codegen with a return of

* the normalized array position (0-based).

* {@link ElementAt} on {@code ArrayType}.

gengliangwang mentioned this pull request May 17, 2026

[SPARK-56915][SQL] Simplify MakeDate/MakeInterval codegen under ANSI mode #55940

Open

gengliangwang requested review from cloud-fan and viirya May 17, 2026 23:39

cloud-fan approved these changes May 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56916][SQL] Simplify ElementAt array codegen under ANSI mode#55941

[SPARK-56916][SQL] Simplify ElementAt array codegen under ANSI mode#55941
gengliangwang wants to merge 1 commit into
apache:masterfrom
gengliangwang:SPARK-56916-element-at

gengliangwang commented May 17, 2026

Uh oh!

gengliangwang commented May 17, 2026

Uh oh!

cloud-fan left a comment

Uh oh!

cloud-fan May 18, 2026

Uh oh!

cloud-fan May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gengliangwang commented May 17, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

gengliangwang commented May 17, 2026

Stack overview (SPARK-56908 umbrella)

Uh oh!

cloud-fan left a comment

Choose a reason for hiding this comment

Summary

Uh oh!

cloud-fan May 18, 2026

Choose a reason for hiding this comment

Uh oh!

cloud-fan May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants