-
Notifications
You must be signed in to change notification settings - Fork 786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add nilike
support in comparison
#1846
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1846 +/- ##
==========================================
+ Coverage 83.49% 83.50% +0.01%
==========================================
Files 201 201
Lines 56903 56951 +48
==========================================
+ Hits 47511 47559 +48
Misses 9392 9392
Continue to review full report at Codecov.
|
49b5eae
to
45b524e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good step forward, although I left some questions.
result.append( | ||
!left | ||
.value(i) | ||
.to_uppercase() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this allocates a new string, I wonder how much faster this fast path actually is? Do you have some benchmark results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have simply re-used this code from the already existing ilike
comparator function, so I have no idea regarding that. I agree, it does make sense to do some benchmarks. Although I feel ends_with
and starts_with
sound simple enough to be much faster than a regexp match implicitly.
@@ -3984,6 +4067,60 @@ mod tests { | |||
vec![false, true, false, false] | |||
); | |||
|
|||
test_utf8!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a test of non-ASCII characters, e.g. Â â or something
/// [`LargeStringArray`] and a scalar. | ||
/// | ||
/// See the documentation on [`like_utf8`] for more details. | ||
pub fn nilike_utf8_scalar<OffsetSize: OffsetSizeTrait>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It occurs to me that we now have four methods ilike_utf8_scalar
, nlike_utf8_scalar
, like_utf8_scalar
, nilike_utf8_scalar
where the only difference appears to be an optional post-conversion on left.value(i)
. Perhaps we could refactor out the common logic into a function taking F: Fn(&str) -> Cow<'_, str>
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had this in mind as well but decided to open the PR nevertheless considering there are already three similar functions (which means whoever accepted them in was okay with that). I'll refactor those and see how that goes.
Would you like me to get this PR in and we can maybe follow up with feedback in another PR? I think as is this is a good step forward? Edit: merging master should fix the MIRI failures |
I am up for making the changes you suggested, although if you think it's good to have this faster and improve on later, I can rebase now. |
Merging this one in to keep the queue down -- Thank you @MazterQyou and @tustvold . I would love to see a refactor PR that reduced the code duplication |
Can drop this after rebase on commit 9860aa7 "Add nilike support in comparison (apache#1846)", first released in 17.0.0
Can drop this after rebase on commit 9860aa7 "Add nilike support in comparison (apache#1846)", first released in 17.0.0
Can drop this after rebase on commit 9860aa7 "Add nilike support in comparison (apache#1846)", first released in 17.0.0
Which issue does this PR close?
Closes #1845.
Rationale for this change
comparison
kernel has functionslike
andnlike
, which performLIKE
andNOT LIKE
operations. There was also an addition ofilike
back in late 2021, which allows to performeILIKE
(case-insensitiveLIKE
) operations. However,nilike
, which is whatnlike
tolike
is, respectively (NOT ILIKE
), has been missing. It is a useful addition to perform similarNOT
operation with already supportedILIKE
.What changes are included in this PR?
This PR adds
nilike
functions tocomparison
kernel that perform SQLleft NOT ILIKE right
operation on arrays and scalars, as well as related tests and benches.Are there any user-facing changes?