Skip to content

[SPARK-54731][SQL] Support Reverse expression to handle BinaryType by performing raw byte-level reversal#55051

Closed
xiaoxuandev wants to merge 2 commits into
apache:masterfrom
xiaoxuandev:fix-54731
Closed

[SPARK-54731][SQL] Support Reverse expression to handle BinaryType by performing raw byte-level reversal#55051
xiaoxuandev wants to merge 2 commits into
apache:masterfrom
xiaoxuandev:fix-54731

Conversation

@xiaoxuandev
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

When Spark's reverse function is called on a BinaryType column, it previously threw a type mismatch error because BinaryType was not in the accepted input types. This patch adds BinaryType support to the Reverse expression.

The changes:

  1. Add BinaryType to the inputTypes TypeCollection in Reverse.
  2. Add a BinaryType match case in the interpreted execution path (doReverse) using Scala's Array.reverse.
  3. Add a BinaryType match case in the code generation path (doGenCode) with a dedicated binaryCodeGen method that performs byte-level reversal via a for loop.
  4. Update @ExpressionDescription to document binary support (since 4.2.0).
  5. Update data type mismatch error test expectations to include BINARY in the accepted types.
  6. Add end-to-end SQL test for reverse on BinaryType in DataFrameFunctionsSuite covering both interpreted and codegen paths.

Why are the changes needed?

Without this fix, calling reverse on a BinaryType column fails with a type error. This is inconsistent with other Spark functions like length and concat that already support BinaryType.

Does this PR introduce any user-facing change?

Yes. The reverse function now accepts BinaryType input and returns the byte-reversed binary value. Previously this would throw an AnalysisException.

How was this patch tested?

  • Unit tests in CollectionExpressionsSuite: multi-byte reversal, empty byte array, single byte, byte array containing 0x00, null input.
  • End-to-end test in DataFrameFunctionsSuite: reverse on BinaryType column through both interpreted and codegen (cached) paths.
  • Updated data type mismatch tests to reflect the new accepted type set.

Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored with Kiro.

@HyukjinKwon
Copy link
Copy Markdown
Member

Are there any reference that supports binary with reverse?

@xiaoxuandev
Copy link
Copy Markdown
Contributor Author

@HyukjinKwon For internal consistency within Spark, Concat, Length, and OctetLength already accept both StringType and BinaryType, so Reverse is the odd one out.

For external references, PostgreSQL 18 recently added reverse() support for binary type: https://neon.com/postgresql/postgresql-18/array-bytea-improvements

Copy link
Copy Markdown
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me. Let's make CI happy. cc @cloud-fan

@xiaoxuandev
Copy link
Copy Markdown
Contributor Author

@HyukjinKwon Thanks for the review! CI passed after a retry.

@cloud-fan
Copy link
Copy Markdown
Contributor

does reverse function in other databases also allow binary type? looks a bit weird

@HyukjinKwon
Copy link
Copy Markdown
Member

I think he provided https://neon.com/postgresql/postgresql-18/array-bytea-improvements as an example.

Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation is clean and follows the established pattern used by Length, BitLength, and Concat for BinaryType support. Both interpreted and codegen paths are correct, and test coverage is good.

One minor documentation gap: the PySpark reverse docstring (python/pyspark/sql/functions/builtin.py) still says "returns a reversed string or an array with elements in reverse order" without mentioning binary. Consider updating it in a follow-up.

since = "1.5.0",
note = """
Reverse logic for arrays is available since 2.4.0.
Reverse logic for binary is available since 4.2.0.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The usage and note sections mention binary, but the examples section doesn't include a binary example. Length and BitLength both demonstrate binary usage with x'hex' syntax. Consider adding one here, e.g.:

      > SELECT _FUNC_(x'CAFE');
       FE CA

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks for the review!

…forming raw byte-level reversal

### What changes were proposed in this pull request?
When Spark's `reverse` function is called on a BinaryType column, it previously threw a type mismatch error because BinaryType was not in the accepted input types. This patch adds BinaryType support to the Reverse expression.

The changes:
1. Add BinaryType to the inputTypes TypeCollection in Reverse.
2. Add a BinaryType match case in the interpreted execution path (doReverse) using Scala's Array.reverse.
3. Add a BinaryType match case in the code generation path (doGenCode) with a dedicated binaryCodeGen method that performs byte-level reversal via a for loop.
4. Update @ExpressionDescription to document binary support (since 4.2.0), fix parameter name from `array` to `expr`.
5. Update data type mismatch error test expectations to include BINARY in the accepted types.
6. Add end-to-end SQL test for reverse on BinaryType in DataFrameFunctionsSuite covering both interpreted and codegen paths.

### Why are the changes needed?
Without this fix, calling `reverse` on a BinaryType column fails with a type error. This is inconsistent with other Spark functions like `length` and `concat` that already support BinaryType.

### Does this PR introduce _any_ user-facing change?
Yes. The `reverse` function now accepts BinaryType input and returns the byte-reversed binary value. Previously this would throw an AnalysisException.

### How was this patch tested?
- Unit tests in CollectionExpressionsSuite: multi-byte reversal, empty byte array, single byte, byte array containing 0x00, null input.
- End-to-end test in DataFrameFunctionsSuite: reverse on BinaryType column through both interpreted and codegen (cached) paths.
- Updated data type mismatch tests to reflect the new accepted type set.
- Scalastyle and ExpressionsSchemaSuite pass.

### Was this patch authored or co-authored using generative AI tooling?
Yes, co-authored with Kiro.
@xiaoxuandev xiaoxuandev changed the title [SPARK-54731][SQL] Fix Reverse expression to handle BinaryType by performing raw byte-level reversal [SPARK-54731][SQL] Support Reverse expression to handle BinaryType by performing raw byte-level reversal Apr 14, 2026
Copy link
Copy Markdown
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HyukjinKwon
Copy link
Copy Markdown
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants