Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-30230][SQL] Like ESCAPE syntax can not use '_' and '%' #26860

Closed
wants to merge 2 commits into from

Conversation

ulysses-you
Copy link
Contributor

What changes were proposed in this pull request?

Since 25001, spark support like escape syntax.
But '%' and '_' is the reserve char in Like expression. We can not use them as escape char.

Why are the changes needed?

Avoid some unexpect problem when using like escape syntax.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Add UT.

@ulysses-you
Copy link
Contributor Author

if (Set('%', '_').contains(escapeChar)) {
throw new ParseException("Invalid escape string." +
"Escape string can not be '%', '_'.", ctx)
}
Copy link
Member

@maropu maropu Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot avoid this case in the antlr side (SqlBase.g4)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, the antlr side just mach a string literal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean like this ?

ESCAPE_STRING
    : '\'' ( ~('\''|'\\') | ('\\' .) | ~('%' | '_') )* '\''
    | '\'' ( ~('\''|'\\') | ('\\' .) | ~('%' | '_') )* '\''
    ;

NOT? kind=LIKE pattern=valueExpression (ESCAPE escapeChar=ESCAPE_STRING)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems much easier to do at scala side...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with better error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I do agree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

@cloud-fan
Copy link
Contributor

ok to test

1 similar comment
@cloud-fan
Copy link
Contributor

ok to test

@cloud-fan
Copy link
Contributor

add to whitelist

@ulysses-you
Copy link
Contributor Author

Sorry for a mistake.

@@ -1393,6 +1393,10 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
}
str.charAt(0)
}.getOrElse('\\')
if (Set('%', '_').contains(escapeChar)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: don't use Set for just 2 elements. We can just do a = % || a = _

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the @cloud-fan comment.

@SparkQA
Copy link

SparkQA commented Dec 12, 2019

Test build #115224 has finished for PR 26860 at commit 20532d2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 12, 2019

Test build #115236 has finished for PR 26860 at commit 3def8ff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member

Thanks, merging to master

@ulysses-you
Copy link
Contributor Author

Thanks.

@beliefer
Copy link
Contributor

@ulysses-you @maropu @cloud-fan @gengliangwang
I checked postgreSQL, it supports as below:
SELECT 'abC' LIKE 'ab%C' ESCAPE '%';

@gengliangwang
Copy link
Member

gengliangwang commented Dec 13, 2019

@beliefer you are right. I run the following in PostgreSQL:

> SELECT 'abcd' LIKE 'ab%_d';
true
> SELECT 'abcd' LIKE 'ab%_d' escape '%';
false

I should not merge it without checking on PostgreSQL. I am going to revert this one.
Any thoughts? @ulysses-you @maropu @cloud-fan

@cloud-fan
Copy link
Contributor

Can we check more systems? it does look weird to support % and _ as escape chars.

@ulysses-you
Copy link
Contributor Author

ulysses-you commented Dec 13, 2019

If '%' and '_' as escape, how to use the real pattern.
Like this ?

SELECT 'abcd' LIKE 'ab%%' escape '%';

But actually StringUtils.escapeLikeRegex will do this check. It can not run correctly.

while (in.hasNext) {
      in.next match {
        case c1 if c1 == escapeChar && in.hasNext =>
          val c = in.next
          c match {
            case '_' | '%' => out ++= Pattern.quote(Character.toString(c))
            case c if c == escapeChar => out ++= Pattern.quote(Character.toString(c))
            case _ => fail(s"the escape character is not allowed to precede '$c'")
          }
        case c if c == escapeChar => fail("it is not allowed to end with the escape character")
        // we will never run these code path
        case '_' => out ++= "."
        case '%' => out ++= ".*"
        case c => out ++= Pattern.quote(Character.toString(c))
      }
    }

@beliefer
Copy link
Contributor

beliefer commented Dec 13, 2019

>SELECT 'abcd' LIKE 'ab%%';
true
>SELECT 'abcd' LIKE 'ab%%' ESCAPE '%';
false

Because '%' is escaped.

@cloud-fan
Copy link
Contributor

@gengliangwang let's revert this.

@maropu
Copy link
Member

maropu commented Dec 13, 2019

+1 for the revert, thanks for taking care, @gengliangwang !

@ulysses-you
Copy link
Contributor Author

OK, I see. Thanks for reverting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
8 participants