Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Add find_substring_regex kernel and implement ignore_case for find_substring #28857

Closed
asfimport opened this issue Jun 24, 2021 · 3 comments

Comments

@asfimport
Copy link

asfimport commented Jun 24, 2021

The find_substring compute function uses the MatchSubstringOptions Options class.  However, when I try to set ignore_case to TRUE, I get the following error:

 Error: NotImplemented: find_substring with ignore_case

R code to replicate the error is below, though depends on a currently unmerged branch:

df <- tibble(x = c("Foo and Bar", "baz and qux and quux"))

df %>%
      Table$create() %>%
      mutate(x = arrow_find_substring(x, options = list(pattern = "b", ignore_case = TRUE))) %>%
      collect()

Since case-insensitive search will be implemented using RE2, this is also an opportunity to add a find_substring_regex compute function.

Reporter: Nicola Crane / @thisisnic
Assignee: David Li / @lidavidm

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-13157. Please see the migration documentation for further details.

@asfimport
Copy link
Author

David Li / @lidavidm:
IIRC, this should be doable, but requires some trouble: we would use RE2 to do the case-insensitive search, but RE2 doesn't return the match position unless you have a capture group. However, then you can't use the 'literal' option anymore and have to escape all regex characters in your search string. Though I see RE2 has a QuoteMeta function to do this for you, so it shouldn't be that bad. I'll probably pick this up soon unless someone else wants to do it.

@asfimport
Copy link
Author

Ian Cook / @ianmcook:
RE2 also treats everything between \Q and \E in a regex as literal text, although then you need to escape literal \E in the search string.

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
Issue resolved by pull request 10597
#10597

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants