Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-30724][SQL] Support 'LIKE ANY' and 'LIKE ALL' operators #27477

Closed
wants to merge 10 commits into from
Closed

[SPARK-30724][SQL] Support 'LIKE ANY' and 'LIKE ALL' operators #27477

wants to merge 10 commits into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Feb 6, 2020

What changes were proposed in this pull request?

LIKE ANY/SOME and LIKE ALL operators are mostly used when we are matching a text field with numbers of patterns. For example:

Teradata / Hive 3.0 / Snowflake:

--like any
select 'foo' LIKE ANY ('%foo%','%bar%');

--like all
select 'foo' LIKE ALL ('%foo%','%bar%');

PostgreSQL:

-- like any
select 'foo' LIKE ANY (array['%foo%','%bar%']);

-- like all
select 'foo' LIKE ALL (array['%foo%','%bar%']);

This PR add support these two operators.

More details:
https://docs.teradata.com/reader/756LNiPSFdY~4JcCCcR5Cw/4~AyrPNmDN0Xk4SALLo6aQ
https://issues.apache.org/jira/browse/HIVE-15229
https://docs.snowflake.net/manuals/sql-reference/functions/like_any.html

Why are the changes needed?

To smoothly migrate SQLs to Spark SQL.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit test.

@SparkQA
Copy link

SparkQA commented Feb 6, 2020

Test build #117992 has finished for PR 27477 at commit 4548b0f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 6, 2020

Test build #117994 has finished for PR 27477 at commit 5475675.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -748,6 +748,7 @@ predicate
| NOT? kind=IN '(' expression (',' expression)* ')'
| NOT? kind=IN '(' query ')'
| NOT? kind=RLIKE pattern=valueExpression
| NOT? kind=LIKE quantifier=(ANY | ALL) '(' expression (',' expression)* ')'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we don't need to support ESCAPE. Did I understand correctly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it~

}.getOrElse('\\')
invertIfNotDefined(Like(e, expression(ctx.pattern), Literal(escapeChar)))
Option(ctx.quantifier).map(_.getType) match {
case Some(SqlBaseParser.ANY) if !ctx.expression.isEmpty =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this ctx.expression.isEmpty? It seems that the parser rule guarantee at least one expression.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it.

@@ -1375,6 +1375,14 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging
case other => Seq(other)
}

def getLikeQuantifierExps(expressions: java.util.List[ExpressionContext]): Seq[Expression] = {
if (expressions.isEmpty) {
throw new ParseException("Syntax error: expected something between '(' and ')'.", ctx)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Teradata's exception:
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think should remove Syntax error: , because ParseException could replace it.

@@ -748,6 +748,7 @@ predicate
| NOT? kind=IN '(' expression (',' expression)* ')'
| NOT? kind=IN '(' query ')'
| NOT? kind=RLIKE pattern=valueExpression
| NOT? kind=LIKE quantifier=(ANY | ALL) ('('')' | '(' expression (',' expression)* ')')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened previously when we didn't have '('')' | here? I guessed that it was also a Parse Exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise it will throw AnalysisException:

-- !query
select company from like_any_table where company like any ()
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.AnalysisException
Invalid number of arguments for function any. Expected: 1; Found: 0; line 1 pos 54

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it's considered as function. I got it.

@SparkQA
Copy link

SparkQA commented Feb 7, 2020

Test build #118015 has finished for PR 27477 at commit de7c398.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 7, 2020

Test build #118025 has finished for PR 27477 at commit caea82a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 7, 2020

Test build #118038 has finished for PR 27477 at commit 88ca4c2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@SparkQA
Copy link

SparkQA commented Feb 12, 2020

Test build #118302 has finished for PR 27477 at commit 1c3d98c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member Author

wangyum commented Feb 13, 2020

cc @cloud-fan

@gatorsmile
Copy link
Member

This would be good to have since both Teradata and Snowflake support it.

cc @maryannxue @hvanhovell @cloud-fan

@SparkQA
Copy link

SparkQA commented Apr 11, 2020

Test build #121127 has finished for PR 27477 at commit 0656b05.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@HyukjinKwon
Copy link
Member

Looks fine to me

@@ -766,6 +766,7 @@ predicate
| NOT? kind=IN '(' expression (',' expression)* ')'
| NOT? kind=IN '(' query ')'
| NOT? kind=RLIKE pattern=valueExpression
| NOT? kind=LIKE quantifier=(ANY | ALL) ('('')' | '(' expression (',' expression)* ')')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we support SOME as well? The BoolOr agg func can be called with any and some.

val escapeChar = Option(ctx.escapeChar).map(string).map { str =>
if (str.length != 1) {
throw new ParseException("Invalid escape string." +
"Escape string must contains only one character.", ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: contains -> contain ?

case Some(SqlBaseParser.ANY) =>
getLikeQuantifierExps(ctx.expression).reduceLeft(Or)
case Some(SqlBaseParser.ALL) =>
getLikeQuantifierExps(ctx.expression).reduceLeft(And)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: getLikeQuantifierExps -> getLikeQuantifierExprs ?

@SparkQA
Copy link

SparkQA commented Apr 22, 2020

Test build #121622 has finished for PR 27477 at commit 0656b05.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

assertEqual("not (a like any ('foo%', 'bar%'))", !(('a like "foo%") || ('a like "bar%")))
assertEqual("a like all ('foo%', 'bar%')", ('a like "foo%") && ('a like "bar%"))
assertEqual("a not like all ('foo%', 'bar%')", !('a like "foo%") && !('a like "bar%"))
assertEqual("not (a like all ('foo%', 'bar%'))", !(('a like "foo%") && ('a like "bar%")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add two more tests for error handling for L1396 and L1422 in AstBuilder.scala?

-- Automatically generated by SQLQueryTestSuite
-- Number of queries: 14


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I've checked that the output is the same with PostgreSQL output: https://gist.github.com/maropu/fa4bd6491e21751d6bbc44c545390b0c

@maropu
Copy link
Member

maropu commented Apr 23, 2020

Looks fine except for the existing comments.

@SparkQA
Copy link

SparkQA commented Apr 23, 2020

Test build #121684 has finished for PR 27477 at commit cf4666f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 24, 2020

Test build #121713 has finished for PR 27477 at commit f9af131.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Apr 24, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Apr 24, 2020

Test build #121733 has finished for PR 27477 at commit f9af131.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Apr 24, 2020

@wangyum btw, we need to update the SQL document for this new syntax. Follow-up PR is alright, though. cc: @huaxingao

@maropu maropu closed this in b10263b Apr 24, 2020
@maropu
Copy link
Member

maropu commented Apr 24, 2020

Thanks, all! Merged to master.

@wangyum wangyum deleted the SPARK-30724 branch April 24, 2020 13:31
@huaxingao
Copy link
Contributor

@maropu Since this is for 3.1, I will not include this new syntax in 3.0 sql ref.

@maropu
Copy link
Member

maropu commented Apr 24, 2020

Yea, we need to update it only in master.

SELECT company FROM like_all_table WHERE company NOT LIKE ALL (NULL, NULL);

-- negative case
SELECT company FROM like_any_table WHERE company LIKE ALL ();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is using of non-existing table intentional? I guess the purpose was to check LIKE ALL ().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a typo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.