[SPARK-57245][SQL] Simplify FormatNumber codegen by extracting a static Java helper#56299
Closed
LuciferYang wants to merge 1 commit into
Closed
[SPARK-57245][SQL] Simplify FormatNumber codegen by extracting a static Java helper#56299LuciferYang wants to merge 1 commit into
LuciferYang wants to merge 1 commit into
Conversation
…ic Java helper FormatNumber.doGenCode inlined the integer-scale number pattern build (the default grouping pattern plus `d` decimal places, applied to a DecimalFormat), an exact transliteration of the same block in nullSafeEval. The codegen also threaded a `pattern` StringBuilder through mutable state. Move the pattern build into a static helper `ExpressionImplUtils.applyNumberFormatScale` that both eval and codegen call, mirroring the Ascii helper. This removes the duplicated block, drops the `pattern` StringBuilder (an instance field in eval and a mutable-state slot in codegen), and shrinks the generated Java for the integer-`d` case to a single call. The string-pattern path and the per-`d` caching are unchanged; behavior is preserved.
Member
Contributor
Author
|
@gengliangwang This should comply with the preceding criteria. |
gengliangwang
approved these changes
Jun 4, 2026
Member
|
@LuciferYang Thanks for working on this! |
LuciferYang
added a commit
that referenced
this pull request
Jun 4, 2026
…ic Java helper ### What changes were proposed in this pull request? This is a sub-task of [SPARK-56908](https://issues.apache.org/jira/browse/SPARK-56908) (reduce generated Java size in whole-stage codegen). For an integer `d`, `FormatNumber.doGenCode` inlined the number-pattern build (the default grouping pattern followed by `d` decimal places, applied to a `DecimalFormat`), an exact transliteration of the same block in `nullSafeEval`. The codegen also threaded a `pattern` `StringBuilder` through mutable state: ```java if (d >= 0) { pattern.delete(0, pattern.length()); if (d != lastDValue) { pattern.append("#,###,###,###,###,###,##0"); if (d > 0) { pattern.append("."); for (int i = 0; i < d; i++) { pattern.append("0"); } } lastDValue = d; numberFormat.applyLocalizedPattern(pattern.toString()); } ... } ``` This PR moves the pattern build into a static helper `ExpressionImplUtils.applyNumberFormatScale(DecimalFormat, String defaultFormat, int scale)` that both `eval` and `doGenCode` call, mirroring the `ascii` helper added in SPARK-57208. The integer-`d` codegen becomes: ```java if (d >= 0) { if (d != lastDValue) { lastDValue = d; ExpressionImplUtils.applyNumberFormatScale(numberFormat, "#,###,###,###,###,###,##0", d); } ... } ``` This removes the duplicated block, drops the `pattern` `StringBuilder` (an instance field in `eval` and a mutable-state slot in codegen), and removes a dead `pattern.delete(...)` that the string-pattern branch never read. The string-pattern path and the per-`d` caching (only rebuild when `d` changes) are unchanged. ### Why are the changes needed? To reduce the size of the generated Java in whole-stage codegen, as tracked by SPARK-56908. The integer-`d` branch shrinks to a single helper call, and the generated class drops one mutable-state field (the `pattern` `StringBuilder`). The same block is also deduplicated from `eval`. ### Does this PR introduce _any_ user-facing change? No. This is a codegen/refactor change; `eval`, `nullable`, `dataType`, and results are unchanged, so SQL output and golden files are unaffected. ### How was this patch tested? Existing tests, which exercise the changed path in both interpreted and codegen modes: - `StringExpressionsSuite` "format_number / FormatNumber" (catalyst) covers integer `d` with scale > 0, scale = 0, and scale < 0 (null) across all numeric input types, plus the string-pattern path and null cases. - `StringFunctionsSuite` "number format function" (sql/core). ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Claude Opus 4.8) Closes #56299 from LuciferYang/formatnumber-codegen-helper. Authored-by: YangJie <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit d172a62) Signed-off-by: yangjie01 <yangjie01@baidu.com>
Contributor
Author
|
Merged into master/branch-4.x. Thanks @gengliangwang |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This is a sub-task of SPARK-56908 (reduce generated Java size in whole-stage codegen).
For an integer
d,FormatNumber.doGenCodeinlined the number-pattern build (the default grouping pattern followed byddecimal places, applied to aDecimalFormat), an exact transliteration of the same block innullSafeEval. The codegen also threaded apatternStringBuilderthrough mutable state:This PR moves the pattern build into a static helper
ExpressionImplUtils.applyNumberFormatScale(DecimalFormat, String defaultFormat, int scale)that bothevalanddoGenCodecall, mirroring theasciihelper added in SPARK-57208. The integer-dcodegen becomes:This removes the duplicated block, drops the
patternStringBuilder(an instance field inevaland a mutable-state slot in codegen), and removes a deadpattern.delete(...)that the string-pattern branch never read. The string-pattern path and the per-dcaching (only rebuild whendchanges) are unchanged.Why are the changes needed?
To reduce the size of the generated Java in whole-stage codegen, as tracked by SPARK-56908. The integer-
dbranch shrinks to a single helper call, and the generated class drops one mutable-state field (thepatternStringBuilder). The same block is also deduplicated fromeval.Does this PR introduce any user-facing change?
No. This is a codegen/refactor change;
eval,nullable,dataType, and results are unchanged, so SQL output and golden files are unaffected.How was this patch tested?
Existing tests, which exercise the changed path in both interpreted and codegen modes:
StringExpressionsSuite"format_number / FormatNumber" (catalyst) covers integerdwith scale > 0, scale = 0, and scale < 0 (null) across all numeric input types, plus the string-pattern path and null cases.StringFunctionsSuite"number format function" (sql/core).Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.8)