Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-14389: [C++][Gandiva] Fix performance bug with LIKE expressions #11471

Conversation

jvictorhuguenin
Copy link
Contributor

@jvictorhuguenin jvictorhuguenin commented Oct 20, 2021

For patterns like %abc% and %ab-c%, the latter wasn't being optimized to become an is_substr expression because of the regex used to identify those cases.

@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@jvictorhuguenin jvictorhuguenin changed the title ARROW-14389: [C++][Gandiva] Fix performance bug with LIKE expressions with reserved characters ARROW-14389: [C++][Gandiva][WIP] Fix performance bug with LIKE expressions with reserved characters Oct 20, 2021
@jvictorhuguenin jvictorhuguenin changed the title ARROW-14389: [C++][Gandiva][WIP] Fix performance bug with LIKE expressions with reserved characters ARROW-14389: [C++][Gandiva] WIP Fix performance bug with LIKE expressions with reserved characters Oct 20, 2021
@jvictorhuguenin
Copy link
Contributor Author

I could fix the REGEX for the is_substr matching, but I'm still trying to figure out the right REGEX for both starts_with and ends_with matching so then can also accept reserved characters.

@vvellanki
Copy link
Contributor

There are some failing tests. Did you get a chance to look at them?

Copy link
Contributor

@vvellanki vvellanki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also measure the performance of the modified implementation?

cpp/src/gandiva/like_holder.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/like_holder.cc Outdated Show resolved Hide resolved
@jvictorhuguenin jvictorhuguenin changed the title ARROW-14389: [C++][Gandiva] WIP Fix performance bug with LIKE expressions with reserved characters ARROW-14389: [C++][Gandiva] Fix performance bug with LIKE expressions Nov 16, 2021
@vvellanki
Copy link
Contributor

@pravindra Can you review this change?

cpp/src/gandiva/like_holder.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/like_holder.cc Show resolved Hide resolved
@jvictorhuguenin jvictorhuguenin force-pushed the feature/fix-performance-like-expr-bug branch 2 times, most recently from ac4888b to 75f277f Compare December 2, 2021 23:39
@jvictorhuguenin jvictorhuguenin force-pushed the feature/fix-performance-like-expr-bug branch from 75f277f to a653fd9 Compare December 7, 2021 15:40
@jvictorhuguenin jvictorhuguenin force-pushed the feature/fix-performance-like-expr-bug branch 2 times, most recently from 936bddf to 97e8cea Compare December 15, 2021 13:22
@jvictorhuguenin jvictorhuguenin force-pushed the feature/fix-performance-like-expr-bug branch from 97e8cea to b27b6cb Compare December 21, 2021 14:51
@ViniciusSouzaRoque ViniciusSouzaRoque force-pushed the feature/fix-performance-like-expr-bug branch from b27b6cb to bc72d4d Compare December 21, 2021 19:08
@jvictorhuguenin jvictorhuguenin force-pushed the feature/fix-performance-like-expr-bug branch 5 times, most recently from 9cab935 to d39c27e Compare January 13, 2022 13:08
@jvictorhuguenin jvictorhuguenin force-pushed the feature/fix-performance-like-expr-bug branch 2 times, most recently from 29f68f6 to a6987e7 Compare January 20, 2022 14:53
@jvictorhuguenin jvictorhuguenin force-pushed the feature/fix-performance-like-expr-bug branch from a6987e7 to 38980ca Compare January 24, 2022 12:02
@pravindra pravindra closed this in bf0ee3f Feb 1, 2022
@ursabot
Copy link

ursabot commented Feb 1, 2022

Benchmark runs are scheduled for baseline = adfb913 and contender = bf0ee3f. bf0ee3f is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.36% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.22% ⬆️0.04%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

vibhatha pushed a commit to vibhatha/arrow that referenced this pull request Feb 4, 2022
For patterns like %abc% and %ab-c%, the latter wasn't being optimized to become an is_substr expression because of the regex used to identify those cases.

Closes apache#11471 from jvictorhuguenin/feature/fix-performance-like-expr-bug

Authored-by: jvictorhuguenin <j.victorhuguenin2018@gmail.com>
Signed-off-by: Pindikura Ravindra <ravindra@dremio.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants