Skip to content

perf: avoid redundant string allocation in LIKE prefix pruning (follow-up to #22562)#22574

Closed
alamb wants to merge 3 commits into
apache:mainfrom
alamb:alamb/cast-literal-owned
Closed

perf: avoid redundant string allocation in LIKE prefix pruning (follow-up to #22562)#22574
alamb wants to merge 3 commits into
apache:mainfrom
alamb:alamb/cast-literal-owned

Conversation

@alamb
Copy link
Copy Markdown
Contributor

@alamb alamb commented May 27, 2026

Which issue does this PR close?

Rationale for this change

#22562 fixes LIKE 'prefix%' pruning on Utf8View/LargeUtf8 columns by casting the synthesized bound literal to the column type via try_cast_literal_to_type. As @alamb noted, that path re-allocates the prefix string: string_literal_as wraps an already-owned String in ScalarValue::Utf8, and the borrowing try_cast_literal_to_type copies it again through to_string().

This PR lets the string be moved into the casted value instead of copied.

What changes are included in this PR?

  • try_cast_literal_to_type now accepts either an owned ScalarValue or a &ScalarValue. When an owned value is passed, string-to-string casts move the underlying String rather than re-allocating it.
    • This uses a small IntoScalarCow trait rather than impl Into<Cow<'_, ScalarValue>>, because std has no blanket From<&T> for Cow<'_, T> and ScalarValue lives in another crate (orphan rule), so &ScalarValue can't convert into a Cow directly.
    • All existing &value call sites keep working unchanged.
  • string_literal_as (LIKE prefix pruning) passes the owned prefix through and returns None for unsupported target types, so pruning is skipped cleanly instead of emitting a literal whose type does not match the column statistics.

Cast behavior is otherwise identical — only the redundant string copy is removed.

Note

This is stacked on #22562 (its commits are included here). It will need a rebase onto main once #22562 merges, and will conflict in string_literal_as — that's expected and will be resolved then.

Are these changes tested?

Yes — try_cast_literal_to_type gets a new owned-input unit test; existing cast tests and the prune_like_prefix row-group pruning test continue to pass.

Are there any user-facing changes?

try_cast_literal_to_type's signature changes from (&ScalarValue, &DataType) to (impl IntoScalarCow, &DataType), and a new IntoScalarCow trait is added in datafusion-expr-common. Existing callers passing &ScalarValue are source-compatible. No behavior changes.

🤖 Generated with Claude Code

@github-actions github-actions Bot added logical-expr Logical plan and expressions core Core DataFusion crate labels May 27, 2026
Let `try_cast_literal_to_type` accept either an owned `ScalarValue` or a
`&ScalarValue` (via a small `IntoScalarCow` trait), so a caller that
already owns a string can have it *moved* into the casted value instead
of re-allocated. Previously the LIKE prefix pruning helper wrapped an
owned `String` in `ScalarValue::Utf8` and called the borrowing cast,
which copied the string again via `to_string()`.

`string_literal_as` now passes the owned value through and returns `None`
for unsupported target types (so pruning is skipped cleanly rather than
emitting a literal whose type does not match the column statistics).

All existing `&value` call sites keep working unchanged. Behavior of the
cast is otherwise identical (string-to-string is just moved when owned).

Follow-up to apache#22562 (per review comment).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@alamb alamb force-pushed the alamb/cast-literal-owned branch from 99ea9eb to 0e03991 Compare May 27, 2026 17:28
@alamb
Copy link
Copy Markdown
Contributor Author

alamb commented May 27, 2026

After putzing with this a while. I think the first thing I need to do is to consolidate the cast implemenations. Will do that first and then optii´

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants