[SPARK-56912][SQL] Simplify Cast to boolean codegen under ANSI mode#55937
Draft
gengliangwang wants to merge 4 commits into
Draft
[SPARK-56912][SQL] Simplify Cast to boolean codegen under ANSI mode#55937gengliangwang wants to merge 4 commits into
gengliangwang wants to merge 4 commits into
Conversation
This was referenced May 17, 2026
Member
Author
Stack overview (SPARK-56908 umbrella)This PR is part of a stack of 8 PRs against SPARK-56908. Order:
PRs 1-4 are linearly stacked on each other (each branch is based on the previous one). PR 5 (decimal arithmetic) is stacked on top of PR 3 (cast decimal) since it uses |
This was referenced May 17, 2026
e911725 to
5bb514b
Compare
### What changes were proposed in this pull request? Introduce `CastUtils.java` and use it from `Cast.scala` to collapse the multi-line ANSI overflow-check codegen for casts that target `int` and `long` into one-line static-method calls. Source and target `DataType` constants used in the overflow error message live as `private static final` fields on the helper class, so the happy path performs no per-row `references[]` lookups. Helpers added: * `longToIntExact(long)` for narrowing `long -> int`. * `floatToIntExact(float)`, `doubleToIntExact(double)` for fractional -> int. * `floatToLongExact(float)`, `doubleToLongExact(double)` for fractional -> long. `Cast.scala` changes: * `castIntegralTypeToIntegralTypeExactCode` and `castFractionToIntegralTypeCode` dispatch on the target type: `int` (and `long` for the fraction case) emit a `CastUtils.<...>Exact` call; byte/short targets keep the inline body (refactored in SPARK-56910). * Eval paths for `castToInt` add ANSI `LongType` / `FloatType` / `DoubleType` cases, and `castToLong` adds `FloatType` / `DoubleType` cases, both delegating to the new helpers. ### Why are the changes needed? Part of SPARK-56908. The current ANSI cast codegen emits 5-line inline overflow blocks per call site. Multiplied across the many cast paths in a TPC-DS plan, this contributes meaningfully to the generated source size and to Janino compile time, and pushes whole-stage methods closer to the 64KB JVM method limit. ### Does this PR introduce _any_ user-facing change? No. The compiled behavior is identical; only the emitted Java source text changes. ### How was this patch tested? `build/sbt "catalyst/testOnly *CastSuite *CastWithAnsiOnSuite *CastWithAnsiOffSuite *AnsiCastSuite *TryCastSuite *ExpressionClassIdentitySuite"` — 312/312 pass. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor 1.x
### What changes were proposed in this pull request?
Extend `CastUtils.java` with helpers for `byte` and `short` ANSI cast
targets and use them from `Cast.scala`. Drops the byte/short-target
dispatch (and the now-unused `lowerAndUpperBound` Scala helper) added
in SPARK-56909 -- after this PR, all integral and fractional narrowing
ANSI casts share the same `CastUtils.<...>Exact` one-line codegen.
Helpers added:
* `shortToByteExact(short)`, `intToByteExact(int)`, `longToByteExact(long)`
* `intToShortExact(int)`, `longToShortExact(long)`
* `floatToByteExact(float)`, `doubleToByteExact(double)`
* `floatToShortExact(float)`, `doubleToShortExact(double)`
`Cast.scala` changes:
* `castIntegralTypeToIntegralTypeExactCode` / `castFractionToIntegralTypeCode`
no longer dispatch on target type -- the helper-name pattern
`${integralPrefix(from)}To${target.capitalize}Exact` covers all four
target types.
* Eval paths for `castToByte` and `castToShort` add ANSI cases for
`ShortType` / `IntegerType` / `LongType` / `FloatType` / `DoubleType`
source types that delegate to the new helpers; the existing
`exactNumeric.toInt(b) + bounds-check` fallback now only handles the
remaining `Decimal` source.
### Why are the changes needed?
Part of SPARK-56908 (umbrella). The original byte/short ANSI cast bodies
were 5 lines each across 8 call sites; this PR collapses them to one
line per call site, matching the int/long target work from SPARK-56909.
### Does this PR introduce _any_ user-facing change?
No. The compiled behavior is identical; only the emitted Java source
text changes.
### How was this patch tested?
```
build/sbt "catalyst/testOnly *CastSuite *CastWithAnsiOnSuite \
*CastWithAnsiOffSuite *AnsiCastSuite *TryCastSuite \
*ExpressionClassIdentitySuite"
```
312/312 pass.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor 1.x
### What changes were proposed in this pull request? Extend `CastUtils.java` with two helpers for decimal precision adjustment and use them from `Cast.changePrecision` (both the eval and codegen implementations). The new helpers mutate the input `Decimal` in place (matching the behavior of the existing inline codegen), so they're safe to call on the temporary produced by `Decimal.fromString(...)` / `Decimal.apply(...)` / decimal-arithmetic results. Helpers added: * `changePrecisionExact(Decimal, int, int, QueryContext)`: ANSI throw on overflow, preserves the per-call-site `QueryContext` so error messages keep their query-origin info. * `changePrecisionOrNull(Decimal, int, int)`: non-ANSI, returns `null` on overflow (no `QueryContext` needed). `Cast.scala` changes: * `changePrecision` eval method dispatches on `nullOnOverflow` and delegates to the appropriate helper. * `changePrecision` codegen method has three branches now: the existing `canNullSafeCast` fast path (unchanged), a `nullOnOverflow` branch (inline), and the ANSI throw branch which now emits a one-line `CastUtils.changePrecisionExact(...)` call instead of the 5-line `if/else` overflow block. ### Why are the changes needed? Part of SPARK-56908 (umbrella). The ANSI throw branch of `Cast.changePrecision` is hit by every cast to decimal that may overflow (very common in TPC-DS, where `cast(int as decimal(7,2))` is widespread). Collapsing the 5-line inline body to one line shrinks the generated Java source for those plans. ### Does this PR introduce _any_ user-facing change? No. The compiled behavior is identical; only the emitted Java source text changes. ### How was this patch tested? ``` build/sbt "catalyst/testOnly *CastSuite *CastWithAnsiOnSuite \ *CastWithAnsiOffSuite *AnsiCastSuite *TryCastSuite *DecimalSuite \ *ExpressionClassIdentitySuite" ``` 337/337 pass. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor 1.x
### What changes were proposed in this pull request? Extend `CastUtils.java` with `stringToBooleanExact(UTF8String, QueryContext)` and use it from `Cast.scala` for the ANSI `String -> Boolean` cast path (both eval and codegen). The non-ANSI path keeps the inline `if/else if/else evNull = true` form because it has no error to throw. ### Why are the changes needed? Part of SPARK-56908 (umbrella). The ANSI String->Boolean cast emits an 8-line `if (isTrueString) … else if (isFalseString) … else throw` block in codegen. This PR collapses it to a one-line `CastUtils .stringToBooleanExact(...)` call. ### Does this PR introduce _any_ user-facing change? No. The compiled behavior is identical; only the emitted Java source text changes. ### How was this patch tested? ``` build/sbt "catalyst/testOnly *CastSuite *CastWithAnsiOnSuite \ *AnsiCastSuite *TryCastSuite" ``` 204/204 pass. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor 1.x
5bb514b to
5a88ec7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Title: [SPARK-56912][SQL] Refactor Cast to boolean codegen under ANSI mode
Base: master (stacked on PR 1→2→3)
Head: gengliangwang:SPARK-56912-cast-boolean
What changes were proposed in this pull request?
Extend
CastUtils.javawithstringToBooleanExact(UTF8String, QueryContext)and use it fromCast.scalafor the ANSIString -> Booleancast path (both eval and codegen). The non-ANSI path keeps the inlineif/else if/else evNull = trueform because it has no error to throw.Why are the changes needed?
Part of SPARK-56908 (umbrella). The ANSI String->Boolean cast emits an 8-line
if (isTrueString) … else if (isFalseString) … else throwblock in codegen. This PR collapses it to a one-lineCastUtils.stringToBooleanExact(...)call.Does this PR introduce any user-facing change?
No.
How was this patch tested?
204/204 pass.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor 1.x