-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-30724][SQL] Support 'LIKE ANY' and 'LIKE ALL' operators #27477
Conversation
Test build #117992 has finished for PR 27477 at commit
|
Test build #117994 has finished for PR 27477 at commit
|
@@ -748,6 +748,7 @@ predicate | |||
| NOT? kind=IN '(' expression (',' expression)* ')' | |||
| NOT? kind=IN '(' query ')' | |||
| NOT? kind=RLIKE pattern=valueExpression | |||
| NOT? kind=LIKE quantifier=(ANY | ALL) '(' expression (',' expression)* ')' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we don't need to support ESCAPE
. Did I understand correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it~
}.getOrElse('\\') | ||
invertIfNotDefined(Like(e, expression(ctx.pattern), Literal(escapeChar))) | ||
Option(ctx.quantifier).map(_.getType) match { | ||
case Some(SqlBaseParser.ANY) if !ctx.expression.isEmpty => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this ctx.expression.isEmpty
? It seems that the parser rule guarantee at least one expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
@@ -1375,6 +1375,14 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging | |||
case other => Seq(other) | |||
} | |||
|
|||
def getLikeQuantifierExps(expressions: java.util.List[ExpressionContext]): Seq[Expression] = { | |||
if (expressions.isEmpty) { | |||
throw new ParseException("Syntax error: expected something between '(' and ')'.", ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think should remove Syntax error:
, because ParseException
could replace it.
@@ -748,6 +748,7 @@ predicate | |||
| NOT? kind=IN '(' expression (',' expression)* ')' | |||
| NOT? kind=IN '(' query ')' | |||
| NOT? kind=RLIKE pattern=valueExpression | |||
| NOT? kind=LIKE quantifier=(ANY | ALL) ('('')' | '(' expression (',' expression)* ')') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happened previously when we didn't have '('')' |
here? I guessed that it was also a Parse Exception
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise it will throw AnalysisException
:
-- !query
select company from like_any_table where company like any ()
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.AnalysisException
Invalid number of arguments for function any. Expected: 1; Found: 0; line 1 pos 54
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it's considered as function
. I got it.
Test build #118015 has finished for PR 27477 at commit
|
Test build #118025 has finished for PR 27477 at commit
|
Test build #118038 has finished for PR 27477 at commit
|
# Conflicts: # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Test build #118302 has finished for PR 27477 at commit
|
cc @cloud-fan |
This would be good to have since both Teradata and Snowflake support it. |
Test build #121127 has finished for PR 27477 at commit
|
retest this please |
Looks fine to me |
@@ -766,6 +766,7 @@ predicate | |||
| NOT? kind=IN '(' expression (',' expression)* ')' | |||
| NOT? kind=IN '(' query ')' | |||
| NOT? kind=RLIKE pattern=valueExpression | |||
| NOT? kind=LIKE quantifier=(ANY | ALL) ('('')' | '(' expression (',' expression)* ')') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we support SOME
as well? The BoolOr
agg func can be called with any
and some
.
val escapeChar = Option(ctx.escapeChar).map(string).map { str => | ||
if (str.length != 1) { | ||
throw new ParseException("Invalid escape string." + | ||
"Escape string must contains only one character.", ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: contains -> contain ?
case Some(SqlBaseParser.ANY) => | ||
getLikeQuantifierExps(ctx.expression).reduceLeft(Or) | ||
case Some(SqlBaseParser.ALL) => | ||
getLikeQuantifierExps(ctx.expression).reduceLeft(And) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: getLikeQuantifierExps -> getLikeQuantifierExprs ?
Test build #121622 has finished for PR 27477 at commit
|
assertEqual("not (a like any ('foo%', 'bar%'))", !(('a like "foo%") || ('a like "bar%"))) | ||
assertEqual("a like all ('foo%', 'bar%')", ('a like "foo%") && ('a like "bar%")) | ||
assertEqual("a not like all ('foo%', 'bar%')", !('a like "foo%") && !('a like "bar%")) | ||
assertEqual("not (a like all ('foo%', 'bar%'))", !(('a like "foo%") && ('a like "bar%"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add two more tests for error handling for L1396 and L1422 in AstBuilder.scala?
-- Automatically generated by SQLQueryTestSuite | ||
-- Number of queries: 14 | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: I've checked that the output is the same with PostgreSQL output: https://gist.github.com/maropu/fa4bd6491e21751d6bbc44c545390b0c
Looks fine except for the existing comments. |
Test build #121684 has finished for PR 27477 at commit
|
Test build #121713 has finished for PR 27477 at commit
|
retest this please |
Test build #121733 has finished for PR 27477 at commit
|
@wangyum btw, we need to update the SQL document for this new syntax. Follow-up PR is alright, though. cc: @huaxingao |
Thanks, all! Merged to master. |
@maropu Since this is for 3.1, I will not include this new syntax in 3.0 sql ref. |
Yea, we need to update it only in master. |
SELECT company FROM like_all_table WHERE company NOT LIKE ALL (NULL, NULL); | ||
|
||
-- negative case | ||
SELECT company FROM like_any_table WHERE company LIKE ALL (); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is using of non-existing table intentional? I guess the purpose was to check LIKE ALL ()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a typo
What changes were proposed in this pull request?
LIKE ANY/SOME
andLIKE ALL
operators are mostly used when we are matching a text field with numbers of patterns. For example:Teradata / Hive 3.0 / Snowflake:
PostgreSQL:
This PR add support these two operators.
More details:
https://docs.teradata.com/reader/756LNiPSFdY~4JcCCcR5Cw/4~AyrPNmDN0Xk4SALLo6aQ
https://issues.apache.org/jira/browse/HIVE-15229
https://docs.snowflake.net/manuals/sql-reference/functions/like_any.html
Why are the changes needed?
To smoothly migrate SQLs to Spark SQL.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit test.