Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(compute): LIKE/ILIKE/NLIKE escape parenthesis #1042

Merged
merged 1 commit into from
Dec 20, 2021

Conversation

ovr
Copy link
Contributor

@ovr ovr commented Dec 13, 2021

Hello!

Which issue does this PR close?

I found that DF works strange with LIKE %(%)%

> SELECT 'int' LIKE '%(%)%', 'int(255)' LIKE '%(%)%';
+--------------------------------+-------------------------------------+
| Utf8("int") Like Utf8("%(%)%") | Utf8("int(255)") Like Utf8("%(%)%") |
+--------------------------------+-------------------------------------+
| true                           | true                                |
+--------------------------------+-------------------------------------+
1 row in set. Query took 0.010 seconds.

But it should return false for the first case, because It doesn't contains () parenthesis.

I found that prepared regexp is wrong and contains unescaped ( & ), but it should be escaped with \.

image
image

Thanks

@github-actions github-actions bot added the arrow Changes to the arrow crate label Dec 13, 2021
Signed-off-by: Dmitry Patsura <talk@dmtry.me>
@ovr ovr force-pushed the arrow-like-escape-parenthesis branch from d69347a to e6e3244 Compare December 13, 2021 15:08
@codecov-commenter
Copy link

Codecov Report

Merging #1042 (e6e3244) into master (7e44ca8) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1042   +/-   ##
=======================================
  Coverage   82.31%   82.31%           
=======================================
  Files         168      168           
  Lines       49031    49037    +6     
=======================================
+ Hits        40359    40365    +6     
  Misses       8672     8672           
Impacted Files Coverage Δ
arrow/src/compute/kernels/comparison.rs 93.27% <100.00%> (+0.05%) ⬆️
arrow/src/array/transform/mod.rs 85.10% <0.00%> (-0.14%) ⬇️
arrow/src/datatypes/datatype.rs 66.38% <0.00%> (+0.42%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e44ca8...e6e3244. Read the comment docs.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ovr looks good.

I'll leave this PR open for a bit in case anyone else wants to have a comment

cc @seddonm1 @jwdeitch

vec!["varchar(255)", "int(255)", "varchar", "int"],
"%(%)%",
like_utf8_scalar,
vec![true, true, false, false]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

let re_pattern = right
.replace("%", ".*")
.replace("_", ".")
.replace("(", r#"\("#)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about his more, I wonder if other special regexp characters need to be replaced too (e.g. . and +)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the list of special characters is here: https://docs.rs/regex/1.5.4/regex/#syntax

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call. maybe we should pass right through this, then add the wildcards
https://docs.rs/regex/0.2.2/regex/fn.escape.html

(I'm happy to help test this @ovr )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is a great call @jwdeitch -- so something like

let re_pattern = escape(right)
  .replace('%', '.*')
  .replace('_', '.')?

Perhaps

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the rust regex crate syntax difficult at the time so I would suggest adding test cases for all their escaping. Regex is so powerful it would be easy to miss something.

@alamb
Copy link
Contributor

alamb commented Dec 20, 2021

Filed #1069 to track additional characters (other than ( and )).

Thanks @ovr and @jwdeitch

@alamb alamb merged commit 0691f80 into apache:master Dec 20, 2021
alamb pushed a commit that referenced this pull request Dec 21, 2021
Signed-off-by: Dmitry Patsura <talk@dmtry.me>
alamb added a commit that referenced this pull request Dec 22, 2021
Signed-off-by: Dmitry Patsura <talk@dmtry.me>

Co-authored-by: Dmitry Patsura <talk@dmtry.me>
@ovr ovr deleted the arrow-like-escape-parenthesis branch September 4, 2023 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants