Skip to content

perf: array-free fast paths for ScalarValue::cast_to#22576

Open
alamb wants to merge 1 commit into
apache:mainfrom
alamb:alamb/cast-to-fast-paths
Open

perf: array-free fast paths for ScalarValue::cast_to#22576
alamb wants to merge 1 commit into
apache:mainfrom
alamb:alamb/cast-to-fast-paths

Conversation

@alamb
Copy link
Copy Markdown
Contributor

@alamb alamb commented May 27, 2026

Which issue does this PR close?

Rationale for this change

ScalarValue::cast_to_with_options always builds a single-row array and runs the arrow cast kernel, even for trivial conversions. For two very common cases — casting a value to its own type, and converting between the string types — that array allocation and kernel dispatch is pure overhead.

What changes are included in this PR?

Two array-free fast paths in cast_to_with_options that produce exactly the same result as the existing array + arrow-kernel path:

Everything else still goes through the existing arrow path, so behavior is unchanged.

Are these changes tested?

Yes — there are new tests added

Are there any user-facing changes?

No. cast_to / cast_to_with_options return the same results as before, just faster for these cases. No API changes.

Partially 🤖 Generated with Claude Code

`ScalarValue::cast_to_with_options` always built a single-row array and
ran the arrow cast kernel, even for trivial conversions. Add two
fast paths that produce exactly the same result without that allocation:

- identity casts (`source_type == target_type`) clone the value directly
- conversions among the string types (`Utf8`/`LargeUtf8`/`Utf8View`) are
  value-preserving and just rewrap the string

Test changes:
- `check_scalar_cast` now also calls `cast_to` and asserts it agrees with
  the arrow cast kernel, so `cast_round_trip` exercises the fast paths.
- Extend `cast_round_trip` with the successful-cast type combinations
  covered by `try_cast_literal_to_type` (identity, numeric widen/narrow,
  int<->decimal, decimal rescale, timestamp unit/tz, int64<->timestamp,
  string conversions, dictionary unwrap, binary->fixed-size-binary).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the common Related to common crate label May 27, 2026
@alamb alamb added the performance Make DataFusion faster label May 27, 2026
@alamb alamb marked this pull request as ready for review May 27, 2026 19:52
@alamb alamb changed the title perf: array-free fast paths for ScalarValue::cast_to perf: array-free fast paths for ScalarValue::cast_to May 27, 2026
Comment on lines +4237 to +4243
if matches!(
(&source_type, target_type),
(
DataType::Utf8 | DataType::LargeUtf8 | DataType::Utf8View,
DataType::Utf8 | DataType::LargeUtf8 | DataType::Utf8View,
)
) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if matches!(
(&source_type, target_type),
(
DataType::Utf8 | DataType::LargeUtf8 | DataType::Utf8View,
DataType::Utf8 | DataType::LargeUtf8 | DataType::Utf8View,
)
) {
if source_type.is_string() && target_type.is_string() {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate performance Make DataFusion faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants