-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-13879: [C++] Mixed support for binary types in regex functions #11233
ARROW-13879: [C++] Mixed support for binary types in regex functions #11233
Conversation
|
@lidavidm Since you are working on ARROW-13878 which is related, please review this PR. This PR only adds variable-width binary support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I don't think we should add binary support to utf8_ or ascii_ functions, as they make assumptions about input encoding that don't hold for binary values. (The user can perform a safe or unsafe cast if needed.) Also, we need to make sure we actually test with non-UTF8 values and we should make sure RE2 handles this properly.
2c38f33
to
c91af07
Compare
d0e414e
to
050182e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
@pitrou If you get a chance, please take a look at this PR. It is a draft mainly because it is missing additional tests, pending review fixes, and updates to comments/docs. |
879fde1
to
dc257e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, overall this looks good, I left a few small comments.
dc257e5
to
8e26f2e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some pretty minor nits but this looks good to me.
@github-actions autotune |
It appears my suggestions introduced some lines too long. Can you run clang-format? I tried autotune but it didn't work for some reason. |
…rts/ends_with, and split_pattern
35640e6
to
2ffacd6
Compare
Benchmark runs are scheduled for baseline = e7158c6 and contender = be665ef. be665ef is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
This PR extends variable-width binary types support for string functions: * find_substring[_regex] * count_substring[_regex] * match_substring[_regex] * split_pattern[_regex] * replace_substring[_regex] * match_like * starts/ends_with * extract_regex Also, updates several scalar string kernel/function registrations. Closes apache#11233 from edponce/ARROW-13879-Mixed-support-for-binary-types-in-regex- Authored-by: Eduardo Ponce <edponce00@gmail.com> Signed-off-by: David Li <li.davidm96@gmail.com>
This PR extends variable-width binary types support for string functions:
Also, updates several scalar string kernel/function registrations.