Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datum based like kernels (#4595) #4732

Merged
merged 5 commits into from
Aug 25, 2023
Merged

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Aug 24, 2023

Which issue does this PR close?

Closes #4595
Closes #2837

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate labels Aug 24, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice @tustvold -- thank you ❤️

I think there is some documentation / examples as well as a few tests that got lost that I think we should restore, but otherwise this looks 👨‍🍳 👌

Closes #2837

🎉 is this really the last set of kernels that need this treatment?

/// A string based predicate
pub enum Predicate<'a> {
Eq(&'a str),
IEqAscii(&'a str),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it is obvious to others that IEqAscii means "case insensitive ascii equal" but it might help to explicitly explain what each enum here is


/// Evaluate this predicate against the given haystack
pub fn evaluate(&self, haystack: &str) -> bool {
match self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pretty neat to mostly use the built in Rust code 👍

dict_function!("ENDSWITH(left, right)", ends_with_dict, ends_with);
dict_function!("CONTAINS(left, right)", contains_dict, contains);

/// Perform SQL `left LIKE right` operation on [`StringArray`] / [`LargeStringArray`].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This documentation and example seems valuable -- could you please port it over to Predicate::like?

///
/// Case insensitive version of [`like_utf8`]
///
/// Note: this only implements loose matching as defined by the Unicode standard. For example,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this detail, likewise would be nice to transfer over

None => r.value(0),
};
op_scalar(op, l, l_v, scalar)
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a nice formulation ❤️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, it took much bashing my head against my keyboard 😆

The performance is likely not spectacular, but I'm also not sure how important it is to optimise the performance of non-scalar like kernels

@@ -936,34 +567,6 @@ mod tests {
vec![true]
);

#[test]
fn test_replace_like_wildcards() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these tests should be carried over as tests for fn regex_like(pattern: &str, case_insensitive: bool) -> Result<Regex, ArrowError> {

@tustvold
Copy link
Contributor Author

🎉 is this really the last set of kernels that need this treatment?

It is the last of the dyn binary kernels. There are a few kernels, e.g. regex and some boolean kernels, that never had dyn nor scalar variants, but this is the last of the dyn scalar kernels AFAIK

@alamb
Copy link
Contributor

alamb commented Aug 25, 2023

It is the last of the dyn binary kernels. There are a few kernels, e.g. regex and some boolean kernels, that never had dyn nor scalar variants, but this is the last of the dyn scalar kernels AFAIK

Do you plan to add such Datum for consistency?

@tustvold
Copy link
Contributor Author

Do you plan to add such Datum for consistency?

I personally don't, but we could probably create tickets and see if someone in the community wishes to pick this up

@alamb
Copy link
Contributor

alamb commented Aug 25, 2023

I personally don't, but we could probably create tickets and see if someone in the community wishes to pick this up

I think creating tickets would be a good idea (though of course, I love tickets)

@tustvold tustvold merged commit 221f5d2 into apache:master Aug 25, 2023
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Datum Based String Kernels Inconsistent Dyn Scalar Kernels
2 participants