[SPARK-54731][SQL] Support Reverse expression to handle BinaryType by performing raw byte-level reversal by xiaoxuandev · Pull Request #55051 · apache/spark

xiaoxuandev · 2026-03-27T05:17:29Z

What changes were proposed in this pull request?

When Spark's reverse function is called on a BinaryType column, it previously threw a type mismatch error because BinaryType was not in the accepted input types. This patch adds BinaryType support to the Reverse expression.

The changes:

Add BinaryType to the inputTypes TypeCollection in Reverse.
Add a BinaryType match case in the interpreted execution path (doReverse) using Scala's Array.reverse.
Add a BinaryType match case in the code generation path (doGenCode) with a dedicated binaryCodeGen method that performs byte-level reversal via a for loop.
Update @ExpressionDescription to document binary support (since 4.2.0).
Update data type mismatch error test expectations to include BINARY in the accepted types.
Add end-to-end SQL test for reverse on BinaryType in DataFrameFunctionsSuite covering both interpreted and codegen paths.

Why are the changes needed?

Without this fix, calling reverse on a BinaryType column fails with a type error. This is inconsistent with other Spark functions like length and concat that already support BinaryType.

Does this PR introduce any user-facing change?

Yes. The reverse function now accepts BinaryType input and returns the byte-reversed binary value. Previously this would throw an AnalysisException.

How was this patch tested?

Unit tests in CollectionExpressionsSuite: multi-byte reversal, empty byte array, single byte, byte array containing 0x00, null input.
End-to-end test in DataFrameFunctionsSuite: reverse on BinaryType column through both interpreted and codegen (cached) paths.
Updated data type mismatch tests to reflect the new accepted type set.

Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored with Kiro.

HyukjinKwon · 2026-03-27T06:46:07Z

Are there any reference that supports binary with reverse?

xiaoxuandev · 2026-03-27T20:00:08Z

@HyukjinKwon For internal consistency within Spark, Concat, Length, and OctetLength already accept both StringType and BinaryType, so Reverse is the odd one out.

For external references, PostgreSQL 18 recently added reverse() support for binary type: https://neon.com/postgresql/postgresql-18/array-bytea-improvements

HyukjinKwon

Looks fine to me. Let's make CI happy. cc @cloud-fan

xiaoxuandev · 2026-04-07T17:31:54Z

@HyukjinKwon Thanks for the review! CI passed after a retry.

cloud-fan · 2026-04-08T12:47:45Z

does reverse function in other databases also allow binary type? looks a bit weird

HyukjinKwon · 2026-04-08T22:14:58Z

I think he provided https://neon.com/postgresql/postgresql-18/array-bytea-improvements as an example.

cloud-fan

The implementation is clean and follows the established pattern used by Length, BitLength, and Concat for BinaryType support. Both interpreted and codegen paths are correct, and test coverage is good.

One minor documentation gap: the PySpark reverse docstring (python/pyspark/sql/functions/builtin.py) still says "returns a reversed string or an array with elements in reverse order" without mentioning binary. Consider updating it in a follow-up.

cloud-fan · 2026-04-09T08:39:05Z

  since = "1.5.0",
  note = """
    Reverse logic for arrays is available since 2.4.0.
+    Reverse logic for binary is available since 4.2.0.


nit: The usage and note sections mention binary, but the examples section doesn't include a binary example. Length and BitLength both demonstrate binary usage with x'hex' syntax. Consider adding one here, e.g.:

> SELECT _FUNC_(x'CAFE'); FE CA

Updated, thanks for the review!

…forming raw byte-level reversal ### What changes were proposed in this pull request? When Spark's `reverse` function is called on a BinaryType column, it previously threw a type mismatch error because BinaryType was not in the accepted input types. This patch adds BinaryType support to the Reverse expression. The changes: 1. Add BinaryType to the inputTypes TypeCollection in Reverse. 2. Add a BinaryType match case in the interpreted execution path (doReverse) using Scala's Array.reverse. 3. Add a BinaryType match case in the code generation path (doGenCode) with a dedicated binaryCodeGen method that performs byte-level reversal via a for loop. 4. Update @ExpressionDescription to document binary support (since 4.2.0), fix parameter name from `array` to `expr`. 5. Update data type mismatch error test expectations to include BINARY in the accepted types. 6. Add end-to-end SQL test for reverse on BinaryType in DataFrameFunctionsSuite covering both interpreted and codegen paths. ### Why are the changes needed? Without this fix, calling `reverse` on a BinaryType column fails with a type error. This is inconsistent with other Spark functions like `length` and `concat` that already support BinaryType. ### Does this PR introduce _any_ user-facing change? Yes. The `reverse` function now accepts BinaryType input and returns the byte-reversed binary value. Previously this would throw an AnalysisException. ### How was this patch tested? - Unit tests in CollectionExpressionsSuite: multi-byte reversal, empty byte array, single byte, byte array containing 0x00, null input. - End-to-end test in DataFrameFunctionsSuite: reverse on BinaryType column through both interpreted and codegen (cached) paths. - Updated data type mismatch tests to reflect the new accepted type set. - Scalastyle and ExpressionsSchemaSuite pass. ### Was this patch authored or co-authored using generative AI tooling? Yes, co-authored with Kiro.

HyukjinKwon

LGTM

HyukjinKwon · 2026-04-14T23:19:04Z

Merged to master.

HyukjinKwon reviewed Apr 5, 2026

View reviewed changes

cloud-fan approved these changes Apr 9, 2026

View reviewed changes

xiaoxuandev added 2 commits April 14, 2026 13:13

Update docs and examples

cbbe133

xiaoxuandev force-pushed the fix-54731 branch from 16a9652 to cbbe133 Compare April 14, 2026 20:14

xiaoxuandev changed the title ~~[SPARK-54731][SQL] Fix Reverse expression to handle BinaryType by performing raw byte-level reversal~~ [SPARK-54731][SQL] Support Reverse expression to handle BinaryType by performing raw byte-level reversal Apr 14, 2026

HyukjinKwon approved these changes Apr 14, 2026

View reviewed changes

HyukjinKwon closed this in 3bcfa1e Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-54731][SQL] Support Reverse expression to handle BinaryType by performing raw byte-level reversal#55051

[SPARK-54731][SQL] Support Reverse expression to handle BinaryType by performing raw byte-level reversal#55051
xiaoxuandev wants to merge 2 commits into
apache:masterfrom
xiaoxuandev:fix-54731

xiaoxuandev commented Mar 27, 2026

Uh oh!

HyukjinKwon commented Mar 27, 2026

Uh oh!

xiaoxuandev commented Mar 27, 2026

Uh oh!

HyukjinKwon left a comment

Uh oh!

xiaoxuandev commented Apr 7, 2026

Uh oh!

cloud-fan commented Apr 8, 2026

Uh oh!

HyukjinKwon commented Apr 8, 2026

Uh oh!

cloud-fan left a comment

Uh oh!

cloud-fan Apr 9, 2026

Uh oh!

xiaoxuandev Apr 14, 2026

Uh oh!

HyukjinKwon left a comment

Uh oh!

HyukjinKwon commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

xiaoxuandev commented Mar 27, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

HyukjinKwon commented Mar 27, 2026

Uh oh!

xiaoxuandev commented Mar 27, 2026

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

xiaoxuandev commented Apr 7, 2026

Uh oh!

cloud-fan commented Apr 8, 2026

Uh oh!

HyukjinKwon commented Apr 8, 2026

Uh oh!

cloud-fan left a comment

Choose a reason for hiding this comment

Uh oh!

cloud-fan Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

xiaoxuandev Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants