perf: array-free fast paths for ScalarValue::cast_to#22576
Open
alamb wants to merge 1 commit into
Open
Conversation
`ScalarValue::cast_to_with_options` always built a single-row array and ran the arrow cast kernel, even for trivial conversions. Add two fast paths that produce exactly the same result without that allocation: - identity casts (`source_type == target_type`) clone the value directly - conversions among the string types (`Utf8`/`LargeUtf8`/`Utf8View`) are value-preserving and just rewrap the string Test changes: - `check_scalar_cast` now also calls `cast_to` and asserts it agrees with the arrow cast kernel, so `cast_round_trip` exercises the fast paths. - Extend `cast_round_trip` with the successful-cast type combinations covered by `try_cast_literal_to_type` (identity, numeric widen/narrow, int<->decimal, decimal rescale, timestamp unit/tz, int64<->timestamp, string conversions, dictionary unwrap, binary->fixed-size-binary). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ScalarValue::cast_to
Jefffrey
approved these changes
May 28, 2026
Comment on lines
+4237
to
+4243
| if matches!( | ||
| (&source_type, target_type), | ||
| ( | ||
| DataType::Utf8 | DataType::LargeUtf8 | DataType::Utf8View, | ||
| DataType::Utf8 | DataType::LargeUtf8 | DataType::Utf8View, | ||
| ) | ||
| ) { |
Contributor
There was a problem hiding this comment.
Suggested change
| if matches!( | |
| (&source_type, target_type), | |
| ( | |
| DataType::Utf8 | DataType::LargeUtf8 | DataType::Utf8View, | |
| DataType::Utf8 | DataType::LargeUtf8 | DataType::Utf8View, | |
| ) | |
| ) { | |
| if source_type.is_string() && target_type.is_string() { |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
ScalarValue::cast_to_with_optionsalways builds a single-row array and runs the arrow cast kernel, even for trivial conversions. For two very common cases — casting a value to its own type, and converting between the string types — that array allocation and kernel dispatch is pure overhead.What changes are included in this PR?
Two array-free fast paths in
cast_to_with_optionsthat produce exactly the same result as the existing array + arrow-kernel path:Everything else still goes through the existing arrow path, so behavior is unchanged.
Are these changes tested?
Yes — there are new tests added
Are there any user-facing changes?
No.
cast_to/cast_to_with_optionsreturn the same results as before, just faster for these cases. No API changes.Partially 🤖 Generated with Claude Code