perf: avoid redundant string allocation in LIKE prefix pruning (follow-up to #22562)#22574
Closed
alamb wants to merge 3 commits into
Closed
perf: avoid redundant string allocation in LIKE prefix pruning (follow-up to #22562)#22574alamb wants to merge 3 commits into
alamb wants to merge 3 commits into
Conversation
Let `try_cast_literal_to_type` accept either an owned `ScalarValue` or a `&ScalarValue` (via a small `IntoScalarCow` trait), so a caller that already owns a string can have it *moved* into the casted value instead of re-allocated. Previously the LIKE prefix pruning helper wrapped an owned `String` in `ScalarValue::Utf8` and called the borrowing cast, which copied the string again via `to_string()`. `string_literal_as` now passes the owned value through and returns `None` for unsupported target types (so pruning is skipped cleanly rather than emitting a literal whose type does not match the column statistics). All existing `&value` call sites keep working unchanged. Behavior of the cast is otherwise identical (string-to-string is just moved when owned). Follow-up to apache#22562 (per review comment). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
99ea9eb to
0e03991
Compare
Contributor
Author
|
After putzing with this a while. I think the first thing I need to do is to consolidate the cast implemenations. Will do that first and then optii´ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
#22562 fixes
LIKE 'prefix%'pruning onUtf8View/LargeUtf8columns by casting the synthesized bound literal to the column type viatry_cast_literal_to_type. As @alamb noted, that path re-allocates the prefix string:string_literal_aswraps an already-ownedStringinScalarValue::Utf8, and the borrowingtry_cast_literal_to_typecopies it again throughto_string().This PR lets the string be moved into the casted value instead of copied.
What changes are included in this PR?
try_cast_literal_to_typenow accepts either an ownedScalarValueor a&ScalarValue. When an owned value is passed, string-to-string casts move the underlyingStringrather than re-allocating it.IntoScalarCowtrait rather thanimpl Into<Cow<'_, ScalarValue>>, becausestdhas no blanketFrom<&T> for Cow<'_, T>andScalarValuelives in another crate (orphan rule), so&ScalarValuecan't convert into aCowdirectly.&valuecall sites keep working unchanged.string_literal_as(LIKE prefix pruning) passes the owned prefix through and returnsNonefor unsupported target types, so pruning is skipped cleanly instead of emitting a literal whose type does not match the column statistics.Cast behavior is otherwise identical — only the redundant string copy is removed.
Note
This is stacked on #22562 (its commits are included here). It will need a rebase onto
mainonce #22562 merges, and will conflict instring_literal_as— that's expected and will be resolved then.Are these changes tested?
Yes —
try_cast_literal_to_typegets a new owned-input unit test; existing cast tests and theprune_like_prefixrow-group pruning test continue to pass.Are there any user-facing changes?
try_cast_literal_to_type's signature changes from(&ScalarValue, &DataType)to(impl IntoScalarCow, &DataType), and a newIntoScalarCowtrait is added indatafusion-expr-common. Existing callers passing&ScalarValueare source-compatible. No behavior changes.🤖 Generated with Claude Code