Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-28083][SQL] Support LIKE ... ESCAPE syntax #25001

Closed
wants to merge 62 commits into from
Closed

[SPARK-28083][SQL] Support LIKE ... ESCAPE syntax #25001

wants to merge 62 commits into from

Conversation

beliefer
Copy link
Contributor

@beliefer beliefer commented Jun 28, 2019

What changes were proposed in this pull request?

The syntax 'LIKE predicate: ESCAPE clause' is a ANSI SQL.
For example:

select 'abcSpark_13sd' LIKE '%Spark\\_%';             //true
select 'abcSpark_13sd' LIKE '%Spark/_%';              //false
select 'abcSpark_13sd' LIKE '%Spark"_%';              //false
select 'abcSpark_13sd' LIKE '%Spark/_%' ESCAPE '/';   //true
select 'abcSpark_13sd' LIKE '%Spark"_%' ESCAPE '"';   //true
select 'abcSpark%13sd' LIKE '%Spark\\%%';             //true
select 'abcSpark%13sd' LIKE '%Spark/%%';              //false
select 'abcSpark%13sd' LIKE '%Spark"%%';              //false
select 'abcSpark%13sd' LIKE '%Spark/%%' ESCAPE '/';   //true
select 'abcSpark%13sd' LIKE '%Spark"%%' ESCAPE '"';   //true
select 'abcSpark\\13sd' LIKE '%Spark\\\\_%';          //true
select 'abcSpark/13sd' LIKE '%Spark//_%';             //false
select 'abcSpark"13sd' LIKE '%Spark""_%';             //false
select 'abcSpark/13sd' LIKE '%Spark//_%' ESCAPE '/';  //true
select 'abcSpark"13sd' LIKE '%Spark""_%' ESCAPE '"';  //true

But Spark SQL only supports 'LIKE predicate'.

Note: If the input string or pattern string is null, then the result is null too.

There are some mainstream database support the syntax.

PostgreSQL:
https://www.postgresql.org/docs/11/functions-matching.html

Vertica:
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Predicates/LIKE-predicate.htm?zoom_highlight=like%20escape

MySQL:
https://dev.mysql.com/doc/refman/5.6/en/string-comparison-functions.html

Oracle:
https://docs.oracle.com/en/database/oracle/oracle-database/19/jjdbc/JDBC-reference-information.html#GUID-5D371A5B-D7F6-42EB-8C0D-D317F3C53708
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Pattern-matching-Conditions.html#GUID-0779657B-06A8-441F-90C5-044B47862A0A

Teradata
https://docs.teradata.com/reader/756LNiPSFdY~4JcCCcR5Cw/ZP3CE_cR~e7V50zVkzzeVQ

Snowflake
https://docs.snowflake.net/manuals/sql-reference/functions/like.html

How was this patch tested?

Exists UT and new UT.

This PR merged to my production environment and runs above sql:

spark-sql> select 'abcSpark_13sd' LIKE '%Spark\\_%';
true
Time taken: 0.119 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark_13sd' LIKE '%Spark/_%';
false
Time taken: 0.103 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark_13sd' LIKE '%Spark"_%';
false
Time taken: 0.096 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark_13sd' LIKE '%Spark/_%' ESCAPE '/';
true
Time taken: 0.096 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark_13sd' LIKE '%Spark"_%' ESCAPE '"';
true
Time taken: 0.092 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark%13sd' LIKE '%Spark\\%%';
true
Time taken: 0.109 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark%13sd' LIKE '%Spark/%%';
false
Time taken: 0.1 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark%13sd' LIKE '%Spark"%%';
false
Time taken: 0.081 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark%13sd' LIKE '%Spark/%%' ESCAPE '/';
true
Time taken: 0.095 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark%13sd' LIKE '%Spark"%%' ESCAPE '"';
true
Time taken: 0.113 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark\\13sd' LIKE '%Spark\\\\_%';
true
Time taken: 0.078 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark/13sd' LIKE '%Spark//_%';
false
Time taken: 0.067 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark"13sd' LIKE '%Spark""_%';
false
Time taken: 0.084 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark/13sd' LIKE '%Spark//_%' ESCAPE '/';
true
Time taken: 0.091 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark"13sd' LIKE '%Spark""_%' ESCAPE '"';
true
Time taken: 0.091 seconds, Fetched 1 row(s)

I create a table and its schema is:

spark-sql> desc formatted gja_test;
key     string  NULL
value   string  NULL
other   string  NULL

# Detailed Table Information
Database        test
Table   gja_test
Owner   test
Created Time    Wed Apr 10 11:06:15 CST 2019
Last Access     Thu Jan 01 08:00:00 CST 1970
Created By      Spark 2.4.1-SNAPSHOT
Type    MANAGED
Provider        hive
Table Properties        [transient_lastDdlTime=1563443838]
Statistics      26 bytes
Location        hdfs://namenode.xxx:9000/home/test/hive/warehouse/test.db/gja_test
Serde Library   org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat     org.apache.hadoop.mapred.TextInputFormat
OutputFormat    org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Storage Properties      [field.delim=   , serialization.format= ]
Partition Provider      Catalog
Time taken: 0.642 seconds, Fetched 21 row(s)

Table gja_test exists three rows of data.

spark-sql> select * from gja_test;
a       A       ao
b       B       bo
"__     """__   "
Time taken: 0.665 seconds, Fetched 3 row(s)

At finally, I test this function:

spark-sql> select * from gja_test where key like value escape '"';
"__     """__   "
Time taken: 0.687 seconds, Fetched 1 row(s)

@SparkQA
Copy link

SparkQA commented Jun 28, 2019

Test build #106996 has finished for PR 25001 at commit 4f5016a.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class Like(

@SparkQA
Copy link

SparkQA commented Jun 28, 2019

Test build #106999 has finished for PR 25001 at commit 8609a46.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 28, 2019

Test build #107001 has finished for PR 25001 at commit 8992b5a.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 28, 2019

Test build #107012 has finished for PR 25001 at commit c9d7dfe.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 28, 2019

Test build #107013 has finished for PR 25001 at commit b5be74a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jul 2, 2019

Also, you need to update docs/sql-keywords.md

@beliefer
Copy link
Contributor Author

beliefer commented Jul 2, 2019

@maropu Thanks for you reminder. I have added keyword.

@@ -103,6 +103,7 @@ Below is a list of all the keywords in Spark SQL.
<tr><td>DROP</td><td>non-reserved</td><td>non-reserved</td><td>reserved</td></tr>
<tr><td>ELSE</td><td>reserved</td><td>non-reserved</td><td>reserved</td></tr>
<tr><td>END</td><td>reserved</td><td>non-reserved</td><td>reserved</td></tr>
<tr><td>ESCAPE</td><td>non-reserved</td><td>non-reserved</td><td>non-reserved</td></tr>
Copy link
Member

@maropu maropu Jul 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your reminder, I will change it.

@@ -65,6 +65,59 @@ abstract class StringRegexExpression extends BinaryExpression
override def sql: String = s"${left.sql} ${prettyName.toUpperCase(Locale.ROOT)} ${right.sql}"
}

abstract class StringRegexV2Expression extends TernaryExpression
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this abstract class? I think we could make the fix more simple by just tweaking StringUtils.escapeLikeRegex? Anyway, the simple, the better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the StringRegexExpression extends the BinaryExpression that is only two input parameters all allowed. So I make the StringRegexV2Expression extends the TernaryExpression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just making StringRegexExpression ternary?

Copy link
Contributor Author

@beliefer beliefer Jul 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because RLIKE extends StringRegexExpression and only need two input parameters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried this?

case class Like(
  inputExpr: Expression,
  patternExpr: Expression,
  escapeExpr: Option[String] = None) extends StringRegexExpression

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like this? master...maropu:SPARK-28083

/* 066 */           String filter_rightStr_0 = scan_value_1.toString();
/* 067 */           java.util.regex.Pattern filter_pattern_0 = java.util.regex.Pattern.compile(org.apache.spark.sql.catalyst.util.StringUtils.escapeLikeRegex(filter_rightStr_0, "\"));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let me have a try!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu I have made this try. It was not pass the tests of RegexpExpressionsSuite.
The failure info:

- LIKE Pattern *** FAILED ***
  Code generation of null LIKE input[0, string, true] ESCAPE \ failed:

  java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 50, Column 0: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 50, Column 0: Line break in literal not allowed

  java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 50, Column 0: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 50, Column 0: Line break in literal not allowed

The code generated:

/* 033 */   public java.lang.Object apply(java.lang.Object _i) {
/* 034 */     InternalRow i = (InternalRow) _i;
/* 035 */
/* 036 */
/* 037 */     boolean isNull_0 = true;
/* 038 */     boolean value_0 = false;
/* 039 */
/* 040 */     if (!true) {
/* 041 */       boolean isNull_2 = i.isNullAt(0);
/* 042 */       UTF8String value_2 = isNull_2 ?
/* 043 */       null : (i.getUTF8String(0));
/* 044 */       if (!isNull_2) {
/* 045 */
/* 046 */         isNull_0 = false; // resultCode could change nullability.
/* 047 */
/* 048 */         String rightStr_0 = value_2.toString();
/* 049 */         java.util.regex.Pattern pattern_0 = java.util.regex.Pattern.compile(org.apache.spark.sql.catalyst.util.StringUtils.escapeLikeRegex(rightStr_0, "\"));
/* 050 */         value_0 = pattern_0.matcher(((UTF8String)null).toString()).matches();
/* 051 */
/* 052 */
/* 053 */       }
/* 054 */
/* 055 */     }
/* 056 */     isNull_3 = isNull_0;
/* 057 */     value_3 = value_0;
/* 058 */
/* 059 */     // copy all the results into MutableRow
/* 060 */
/* 061 */     if (!isNull_3) {
/* 062 */       mutableRow.setBoolean(0, value_3);
/* 063 */     } else {
/* 064 */       mutableRow.setNullAt(0);
/* 065 */     }
/* 066 */
/* 067 */     return mutableRow;
/* 068 */   }
/* 069 */
/* 070 */

Copy link
Contributor Author

@beliefer beliefer Jul 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

      val pattern = ctx.freshName("pattern")
      val rightStr = ctx.freshName("rightStr")
      val escapeChar = escapeCharOpt.getOrElse("\\\\")
      nullSafeCodeGen(ctx, ev, (eval1, eval2) => {
        s"""
          String $rightStr = $eval2.toString();
          $patternClass $pattern = $patternClass.compile($escapeFunc($rightStr, "$escapeChar"));
          ${ev.value} = $pattern.matcher($eval1.toString()).matches();
        """
      })

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed
val escapeChar = escapeCharOpt.getOrElse("\\\\")
to
val escapeChar = escapeCharOpt.getOrElse("\\\\\\\\")
The latter is OK.

@SparkQA
Copy link

SparkQA commented Jul 2, 2019

Test build #107105 has finished for PR 25001 at commit 6ac3f21.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangyum
Copy link
Member

wangyum commented Jul 2, 2019

@beliefer Please re-generate golden files.

org.scalatest.exceptions.TestFailedException: Expected "...(1 AS STRING) LIKE %[]:boolean>", but got "...(1 AS STRING) LIKE %[ \]:boolean>" Schema did not match for query #39 select 1 like '%' FROM t: QueryOutput(select 1 like '%' FROM t,struct<CAST(1 AS STRING) LIKE % \:boolean>,true)

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107105/testReport/org.apache.spark.sql/SQLQueryTestSuite/sql/

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jul 3, 2019

Please use the following command to rebuild the golden answer files, @beliefer .

SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite"

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-28083][SQL] Enhance ANSI SQL: LIKE predicate: ESCAPE clause. [SPARK-28083][SQL] Support LIKE ... ESCAPE syntax Jul 3, 2019
@SparkQA
Copy link

SparkQA commented Jul 3, 2019

Test build #107172 has finished for PR 25001 at commit 5303564.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

beliefer commented Jul 3, 2019

@wangyum @dongjoon-hyun Thanks for all your help and review.

docs/sql-keywords.md Outdated Show resolved Hide resolved
@SparkQA
Copy link

SparkQA commented Jul 4, 2019

Test build #107226 has finished for PR 25001 at commit a0ceae1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -484,7 +484,7 @@ object LikeSimplification extends Rule[LogicalPlan] {
private val equalTo = "([^_%]*)".r

def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
case Like(input, Literal(pattern, StringType)) =>
case Like(input, Literal(pattern, StringType), opt) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opt => escapeChar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@gengliangwang
Copy link
Member

@beliefer Thanks for changing the parameter data type. The code looks simpler now :)

@SparkQA
Copy link

SparkQA commented Dec 5, 2019

Test build #114892 has finished for PR 25001 at commit 64e49b7.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 5, 2019

Test build #114894 has finished for PR 25001 at commit 9feb25d.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Dec 5, 2019

Test build #114898 has finished for PR 25001 at commit 9feb25d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


override def matches(regex: Pattern, str: String): Boolean = regex.matcher(str).matches()

override def toString: String = s"$left LIKE $right"
override def toString: String = s"$left LIKE $right ESCAPE '$escapeChar'"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can skip printing ESCAPE '$escapeChar' if escapeChar =\

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except one comment

Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order
to match "\abc", the pattern should be "\\abc".

When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks
to Spark 1.6 behavior regarding string literal parsing. For example, if the config is
enabled, the pattern to match "\abc" should be "\abc".
* escape - an string added since Spark 3.0. The default escape character is the '\'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit string or character ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

@SparkQA
Copy link

SparkQA commented Dec 6, 2019

Test build #114925 has finished for PR 25001 at commit 4cc9e0a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member

Thanks, merging to master

@beliefer
Copy link
Contributor Author

beliefer commented Dec 6, 2019

@maropu @cloud-fan @gengliangwang @gatorsmile @HyukjinKwon @dongjoon-hyun @Ngone51
Thanks for all your work.

attilapiros pushed a commit to attilapiros/spark that referenced this pull request Dec 6, 2019
## What changes were proposed in this pull request?

The syntax 'LIKE predicate: ESCAPE clause' is a ANSI SQL.
For example:

```
select 'abcSpark_13sd' LIKE '%Spark\\_%';             //true
select 'abcSpark_13sd' LIKE '%Spark/_%';              //false
select 'abcSpark_13sd' LIKE '%Spark"_%';              //false
select 'abcSpark_13sd' LIKE '%Spark/_%' ESCAPE '/';   //true
select 'abcSpark_13sd' LIKE '%Spark"_%' ESCAPE '"';   //true
select 'abcSpark%13sd' LIKE '%Spark\\%%';             //true
select 'abcSpark%13sd' LIKE '%Spark/%%';              //false
select 'abcSpark%13sd' LIKE '%Spark"%%';              //false
select 'abcSpark%13sd' LIKE '%Spark/%%' ESCAPE '/';   //true
select 'abcSpark%13sd' LIKE '%Spark"%%' ESCAPE '"';   //true
select 'abcSpark\\13sd' LIKE '%Spark\\\\_%';          //true
select 'abcSpark/13sd' LIKE '%Spark//_%';             //false
select 'abcSpark"13sd' LIKE '%Spark""_%';             //false
select 'abcSpark/13sd' LIKE '%Spark//_%' ESCAPE '/';  //true
select 'abcSpark"13sd' LIKE '%Spark""_%' ESCAPE '"';  //true
```
But Spark SQL only supports 'LIKE predicate'.

Note: If the input string or pattern string is null, then the result is null too.

There are some mainstream database support the syntax.

**PostgreSQL:**
https://www.postgresql.org/docs/11/functions-matching.html

**Vertica:**
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Predicates/LIKE-predicate.htm?zoom_highlight=like%20escape

**MySQL:**
https://dev.mysql.com/doc/refman/5.6/en/string-comparison-functions.html

**Oracle:**
https://docs.oracle.com/en/database/oracle/oracle-database/19/jjdbc/JDBC-reference-information.html#GUID-5D371A5B-D7F6-42EB-8C0D-D317F3C53708
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Pattern-matching-Conditions.html#GUID-0779657B-06A8-441F-90C5-044B47862A0A

## How was this patch tested?

Exists UT and new UT.

This PR merged to my production environment and runs above sql:
```
spark-sql> select 'abcSpark_13sd' LIKE '%Spark\\_%';
true
Time taken: 0.119 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark_13sd' LIKE '%Spark/_%';
false
Time taken: 0.103 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark_13sd' LIKE '%Spark"_%';
false
Time taken: 0.096 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark_13sd' LIKE '%Spark/_%' ESCAPE '/';
true
Time taken: 0.096 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark_13sd' LIKE '%Spark"_%' ESCAPE '"';
true
Time taken: 0.092 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark%13sd' LIKE '%Spark\\%%';
true
Time taken: 0.109 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark%13sd' LIKE '%Spark/%%';
false
Time taken: 0.1 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark%13sd' LIKE '%Spark"%%';
false
Time taken: 0.081 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark%13sd' LIKE '%Spark/%%' ESCAPE '/';
true
Time taken: 0.095 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark%13sd' LIKE '%Spark"%%' ESCAPE '"';
true
Time taken: 0.113 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark\\13sd' LIKE '%Spark\\\\_%';
true
Time taken: 0.078 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark/13sd' LIKE '%Spark//_%';
false
Time taken: 0.067 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark"13sd' LIKE '%Spark""_%';
false
Time taken: 0.084 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark/13sd' LIKE '%Spark//_%' ESCAPE '/';
true
Time taken: 0.091 seconds, Fetched 1 row(s)
spark-sql> select 'abcSpark"13sd' LIKE '%Spark""_%' ESCAPE '"';
true
Time taken: 0.091 seconds, Fetched 1 row(s)
```
I create a table and its schema is:
```
spark-sql> desc formatted gja_test;
key     string  NULL
value   string  NULL
other   string  NULL

# Detailed Table Information
Database        test
Table   gja_test
Owner   test
Created Time    Wed Apr 10 11:06:15 CST 2019
Last Access     Thu Jan 01 08:00:00 CST 1970
Created By      Spark 2.4.1-SNAPSHOT
Type    MANAGED
Provider        hive
Table Properties        [transient_lastDdlTime=1563443838]
Statistics      26 bytes
Location        hdfs://namenode.xxx:9000/home/test/hive/warehouse/test.db/gja_test
Serde Library   org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat     org.apache.hadoop.mapred.TextInputFormat
OutputFormat    org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Storage Properties      [field.delim=   , serialization.format= ]
Partition Provider      Catalog
Time taken: 0.642 seconds, Fetched 21 row(s)
```
Table `gja_test` exists three rows of data.
```
spark-sql> select * from gja_test;
a       A       ao
b       B       bo
"__     """__   "
Time taken: 0.665 seconds, Fetched 3 row(s)
```
At finally, I test this function:
```
spark-sql> select * from gja_test where key like value escape '"';
"__     """__   "
Time taken: 0.687 seconds, Fetched 1 row(s)
```

Closes apache#25001 from beliefer/ansi-sql-like.

Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
@maropu
Copy link
Member

maropu commented Dec 7, 2019

Thanks, all.

gengliangwang pushed a commit that referenced this pull request Dec 12, 2019
### What changes were proposed in this pull request?

Since [25001](#25001), spark support like escape syntax.
But '%' and '_' is the reserve char in `Like` expression. We can not use them as escape char.

### Why are the changes needed?

Avoid some unexpect problem when using like escape syntax.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Add UT.

Closes #26860 from ulysses-you/SPARK-30230.

Authored-by: ulysses <youxiduo@weidian.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
dongjoon-hyun pushed a commit that referenced this pull request Dec 19, 2019
…capeChar

Since [25001](#25001), spark support like escape syntax.

We should also sync the escape used by `LikeSimplification`.

Avoid optimize failed.

No.

Add UT.

Closes #26880 from ulysses-you/SPARK-30254.

Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun pushed a commit that referenced this pull request Dec 21, 2019
### What changes were proposed in this pull request?
This PR is a follow-up to #25001

### Why are the changes needed?
No

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Pass the Jenkins with the newly update test files.

Closes #26949 from beliefer/uncomment-like-escape-tests.

Authored-by: gengjiaan <gengjiaan@360.cn>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
10 participants