Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support more advanced suite of LIKE expressions #5013

Merged
merged 3 commits into from Apr 9, 2020

Conversation

agavra
Copy link
Contributor

@agavra agavra commented Apr 6, 2020

fixes #2054

Description

Adds support for more advanced LIKE functionality by turning a LIKE predicate into a regex statement and evaluating that. I looked for existing java implementations but none seemed much better than implementing it myself and avoiding a third-party (heavyweight) dependency.

This change also allows you to reference a column now in the LIKE predicate.

Testing done

  • Extensive Unit test
  • QTT test

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@agavra agavra requested a review from a team as a code owner April 6, 2020 22:46
Copy link
Contributor

@big-andy-coates big-andy-coates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @agavra

LGTM, 'cept where does this character range stuff come from? I've not seen that in SQL before. Is that in the spec? AFAIK SQL LIKE only has _ and % with special meaning...

(I'm only putting a blocker on as I think we should think carefully about non-SQL syntax, if indeed it is).

*/
public static boolean matches(final String val, final String pattern) {
// note that we do not yet support escape characters in the pattern
// see issue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing issue link?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#5021 will add it

{"percents", "barfoobaz", "%foo%"},
{"percents [X]", "barbarbar", "%foo%"},
{"percents one side", "barfoo", "%foo"},
{"percents one side [X]", "barbarbar", "%foo"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{"percents one side [X]", "barbarbar", "%foo"},
{"percents one side [X]", "barfoobar", "%foo"},

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, worth adding case for:

  • % other side and
  • % in middle and
  • multiple %s
  • escaped %s???? Or do we not support them yet.. (hard to add?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the middle two are covered by c[a-c]t%[a-e][^abc]%m__w - i'll add tests for the first and last (escaping is done by [%] in MSFT-sql at least and should be supported here as well)

@agavra
Copy link
Contributor Author

agavra commented Apr 7, 2020

Thanks @agavra

LGTM, 'cept where does this character range stuff come from? I've not seen that in SQL before. Is that in the spec? AFAIK SQL LIKE only has _ and % with special meaning...

(I'm only putting a blocker on as I think we should think carefully about non-SQL syntax, if indeed it is).

Huh - I looked at https://docs.microsoft.com/en-us/sql/t-sql/language-elements/like-transact-sql?view=sql-server-ver15 since they tend to have the best docs and just assumed that it was standard. But looking at postgres they don't follow the same mechanism.... I'm happy to go either way about it, but that was the hardest to code so I'd low-key be bummed about removing it 😂

EDIT: looks like OracleSQL doesn't have it either

@blueedgenick
Copy link
Contributor

@agavra can i chime in with another "looks good but it's in the wrong function" observation here ? I definitely like getting more search functionality but that SQLServer implementation is, unfortunately, not a very representative one to model from. The more"usual" implementation is to only support '_' and '%' in the LIKE function, perhaps with an optional escape character argument, and then have a REGEXP_LIKE to handle the fancier stuff. See examples here (https://prestodb.io/docs/current/functions/regexp.html) and (https://docs.oracle.com/en/database/oracle/oracle-database/20/sqlrf/Pattern-matching-Conditions.html#GUID-0779657B-06A8-441F-90C5-044B47862A0A)

@agavra
Copy link
Contributor Author

agavra commented Apr 8, 2020

Well it seems I've learned my lesson - I will check docs other than MSFT SqlServer before implementing difficult functionality 😂 I'll remove the [] stuff

@big-andy-coates
Copy link
Contributor

Can you add the escape char too?

@agavra
Copy link
Contributor Author

agavra commented Apr 8, 2020

Can you add the escape char too?

If you give a mouse a cookie 🍪 .... haha sure I'll add it too

@agavra agavra force-pushed the like_exps branch 4 times, most recently from 2a89d7b to 72e60e3 Compare April 8, 2020 20:11
Copy link
Contributor

@big-andy-coates big-andy-coates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @agavra

LGTM, 'cept for the weird exception / error message you'll be showing to users if they enter a multi-char escape


if (patternString.endsWith("%")) {
if (node.getEscape().isPresent()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we not remove this if/else by having LikeEval.matches() take Optional<Char> for last param?

Copy link
Contributor Author

@agavra agavra Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somewhat - I'd still need to do:

LikeEvaluator.matches(... + "Optional.ofNullable(" + escape.map(escape -> "'" + escape + "'").getOrElse(null))

And to be honest I don't think that's any cleaner.

(this needs to be generated code, I can't pass the optional itself as a string)

@agavra agavra merged commit 67cd9d9 into confluentinc:master Apr 9, 2020
@agavra agavra deleted the like_exps branch April 9, 2020 17:50
@apurvam apurvam added this to the 0.9.0 milestone Apr 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support more general use of '%' and '_' wildcards in LIKE expressions
4 participants