-
Notifications
You must be signed in to change notification settings - Fork 791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datum based like kernels (#4595) #4732
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arrow-string/src/predicate.rs
Outdated
/// A string based predicate | ||
pub enum Predicate<'a> { | ||
Eq(&'a str), | ||
IEqAscii(&'a str), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it is obvious to others that IEqAscii
means "case insensitive ascii equal" but it might help to explicitly explain what each enum here is
|
||
/// Evaluate this predicate against the given haystack | ||
pub fn evaluate(&self, haystack: &str) -> bool { | ||
match self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is pretty neat to mostly use the built in Rust code 👍
dict_function!("ENDSWITH(left, right)", ends_with_dict, ends_with); | ||
dict_function!("CONTAINS(left, right)", contains_dict, contains); | ||
|
||
/// Perform SQL `left LIKE right` operation on [`StringArray`] / [`LargeStringArray`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This documentation and example seems valuable -- could you please port it over to Predicate::like
?
/// | ||
/// Case insensitive version of [`like_utf8`] | ||
/// | ||
/// Note: this only implements loose matching as defined by the Unicode standard. For example, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this detail, likewise would be nice to transfer over
None => r.value(0), | ||
}; | ||
op_scalar(op, l, l_v, scalar) | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite a nice formulation ❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, it took much bashing my head against my keyboard 😆
The performance is likely not spectacular, but I'm also not sure how important it is to optimise the performance of non-scalar like kernels
@@ -936,34 +567,6 @@ mod tests { | |||
vec![true] | |||
); | |||
|
|||
#[test] | |||
fn test_replace_like_wildcards() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these tests should be carried over as tests for fn regex_like(pattern: &str, case_insensitive: bool) -> Result<Regex, ArrowError> {
It is the last of the dyn binary kernels. There are a few kernels, e.g. regex and some boolean kernels, that never had dyn nor scalar variants, but this is the last of the dyn scalar kernels AFAIK |
Do you plan to add such |
I personally don't, but we could probably create tickets and see if someone in the community wishes to pick this up |
I think creating tickets would be a good idea (though of course, I love tickets) |
Which issue does this PR close?
Closes #4595
Closes #2837
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?