[SPARK-57208][SQL] Simplify Ascii codegen by extracting a static Java helper#56267
Closed
LuciferYang wants to merge 1 commit into
Closed
[SPARK-57208][SQL] Simplify Ascii codegen by extracting a static Java helper#56267LuciferYang wants to merge 1 commit into
LuciferYang wants to merge 1 commit into
Conversation
dongjoon-hyun
approved these changes
Jun 2, 2026
… helper Move Ascii's first-character logic into a single ExpressionImplUtils.ascii(UTF8String) helper. nullSafeEval delegates to it and doGenCode becomes a one-line defineCodeGen call, so eval and codegen share one implementation instead of duplicating the substring/codePointAt block. This collapses the inlined block into one invokestatic per call site (fewer constant-pool entries, smaller generated method), helping with the JVM 64KB method / constant-pool limits, Janino compile time, and JIT work. Part of SPARK-56908.
76715b1 to
2078851
Compare
Contributor
Author
|
rebased |
LuciferYang
added a commit
that referenced
this pull request
Jun 3, 2026
… helper ### What changes were proposed in this pull request? `Ascii` implemented the same first-character logic twice — once in `nullSafeEval` and once inlined in `doGenCode` (a ~6-line `substring(0, 1)` / `numChars() > 0 ? codePointAt(0) : 0` block). This moves that logic into a single `ExpressionImplUtils.ascii(UTF8String): int` helper; `nullSafeEval` delegates to it and `doGenCode` becomes a one-line `defineCodeGen` call, so eval and codegen share one implementation. This follows the other SPARK-56908 helpers in `ExpressionImplUtils` and lands alongside Crc32 (#56222), regexp (#56223), Chr (#56224), Acosh (#56228), and Asinh (#56229), which all append to the same file. ### Why are the changes needed? It collapses the inlined block into a single `invokestatic` per `ascii` call site — fewer constant-pool entries and a smaller generated method, which helps with the JVM 64KB method / constant-pool limits, Janino compile time, and JIT work — and removes the duplicated logic between eval and codegen. Part of SPARK-56908. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing `StringExpressionsSuite` `ascii` tests, plus a new assertion for a supplementary-plane code point (`ascii('😀') = 128512`) that pins the `codePointAt` (rather than `charAt`) behavior. `checkEvaluation` runs both the interpreted and codegen paths. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Claude Opus 4.8) Closes #56267 from LuciferYang/ascii-codegen-helper. Authored-by: YangJie <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit 1fe8493) Signed-off-by: yangjie01 <yangjie01@baidu.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Asciiimplemented the same first-character logic twice — once innullSafeEvaland once inlined indoGenCode(a ~6-linesubstring(0, 1)/numChars() > 0 ? codePointAt(0) : 0block). This moves that logic into a singleExpressionImplUtils.ascii(UTF8String): inthelper;nullSafeEvaldelegates to it anddoGenCodebecomes a one-linedefineCodeGencall, so eval and codegen share one implementation.This follows the other SPARK-56908 helpers in
ExpressionImplUtilsand lands alongside Crc32 (#56222), regexp (#56223), Chr (#56224), Acosh (#56228), and Asinh (#56229), which all append to the same file.Why are the changes needed?
It collapses the inlined block into a single
invokestaticperasciicall site — fewer constant-pool entries and a smaller generated method, which helps with the JVM 64KB method / constant-pool limits, Janino compile time, and JIT work — and removes the duplicated logic between eval and codegen. Part of SPARK-56908.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing
StringExpressionsSuiteasciitests, plus a new assertion for a supplementary-plane code point (ascii('😀') = 128512) that pins thecodePointAt(rather thancharAt) behavior.checkEvaluationruns both the interpreted and codegen paths.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.8)