Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33597][SQL] Support REGEXP_LIKE for consistent with mainstream databases #30543

Closed
wants to merge 41 commits into from

Conversation

beliefer
Copy link
Contributor

@beliefer beliefer commented Nov 30, 2020

What changes were proposed in this pull request?

There are a lot of mainstream databases support regex function REGEXP_LIKE.
Currently, Spark supports RLike and we just need add a new alias REGEXP_LIKE for it.
Oracle
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Pattern-matching-Conditions.html#GUID-D2124F3A-C6E4-4CCA-A40E-2FFCABFD8E19
Presto
https://prestodb.io/docs/current/functions/regexp.html
Vertica
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/RegularExpressions/REGEXP_LIKE.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CRegular%20Expression%20Functions%7C_____5
Snowflake
https://docs.snowflake.com/en/sql-reference/functions/regexp_like.html

Additional modifications

  1. Because test case named check outputs of expression examples in ExpressionInfoSuite executes the example SQL of built-in function, so the below SQL be executed:
    SELECT '%SystemDrive%\Users\John' regexp_like '%SystemDrive%\\Users.*'
    But Spark SQL not supports this syntax yet.
  2. Another reason: SELECT '%SystemDrive%\Users\John' _FUNC_ '%SystemDrive%\\Users.*'; is an SQL syntax, not the usecase for function RLike.
    As the above reason, this PR changes the example SQL of RLike.

Why are the changes needed?

No

Does this PR introduce any user-facing change?

Make the behavior of Spark SQL consistent with mainstream databases.

How was this patch tested?

Jenkins test

beliefer and others added 30 commits June 19, 2020 10:36
@SparkQA
Copy link

SparkQA commented Dec 9, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37089/

@SparkQA
Copy link

SparkQA commented Dec 9, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37089/

@SparkQA
Copy link

SparkQA commented Dec 9, 2020

Test build #132487 has finished for PR 30543 at commit dc4f946.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Test build #132528 has finished for PR 30543 at commit dc4f946.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37130/

@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37130/

@beliefer
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37138/

@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37138/

@SparkQA
Copy link

SparkQA commented Dec 10, 2020

Test build #132534 has finished for PR 30543 at commit dc4f946.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

cc @cloud-fan @HyukjinKwon

@SparkQA
Copy link

SparkQA commented Dec 17, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37525/

@SparkQA
Copy link

SparkQA commented Dec 17, 2020

Test build #132923 has finished for PR 30543 at commit e1cd719.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 17, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37525/

@beliefer
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 17, 2020

Test build #132932 has finished for PR 30543 at commit e1cd719.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 17, 2020

Test build #132946 has finished for PR 30543 at commit e1cd719.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SparkPod(pod: Pod, container: Container)
  • trait KubernetesFeatureConfigStep
  • public class Distributions
  • trait CheckAnalysis extends PredicateHelper with LookupCatalog
  • case class UnresolvedView(
  • case class Decode(params: Seq[Expression], child: Expression) extends RuntimeReplaceable
  • case class StringDecode(bin: Expression, charset: Expression)
  • case class NoopCommand(
  • case class ShowTableExtended(
  • case class AlterTableRecoverPartitions(child: LogicalPlan) extends Command
  • case class DropView(
  • case class RepairTable(child: LogicalPlan) extends Command
  • case class AlterViewAs(
  • case class AlterViewSetProperties(
  • case class AlterViewUnsetProperties(
  • case class CacheTable(
  • case class CacheTableAsSelect(
  • case class UncacheTable(
  • case class SubqueryExec(name: String, child: SparkPlan, maxNumRows: Option[Int] = None)
  • trait BaseCacheTableExec extends V2CommandExec
  • case class CacheTableExec(
  • case class CacheTableAsSelectExec(
  • case class UncacheTableExec(

@beliefer
Copy link
Contributor Author

retest this please

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay, seems fine. cc @cloud-fan and @gatorsmile FYI

@SparkQA
Copy link

SparkQA commented Dec 18, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37584/

@SparkQA
Copy link

SparkQA commented Dec 18, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37584/

@SparkQA
Copy link

SparkQA commented Dec 18, 2020

Test build #132984 has finished for PR 30543 at commit e1cd719.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class SparkPod(pod: Pod, container: Container)
  • trait KubernetesFeatureConfigStep
  • public class Distributions
  • trait CheckAnalysis extends PredicateHelper with LookupCatalog
  • case class UnresolvedView(
  • case class Decode(params: Seq[Expression], child: Expression) extends RuntimeReplaceable
  • case class StringDecode(bin: Expression, charset: Expression)
  • case class NoopCommand(
  • case class ShowTableExtended(
  • case class AlterTableRecoverPartitions(child: LogicalPlan) extends Command
  • case class DropView(
  • case class RepairTable(child: LogicalPlan) extends Command
  • case class AlterViewAs(
  • case class AlterViewSetProperties(
  • case class AlterViewUnsetProperties(
  • case class CacheTable(
  • case class CacheTableAsSelect(
  • case class UncacheTable(
  • case class SubqueryExec(name: String, child: SparkPlan, maxNumRows: Option[Int] = None)
  • trait BaseCacheTableExec extends V2CommandExec
  • case class CacheTableExec(
  • case class CacheTableAsSelectExec(
  • case class UncacheTableExec(

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in f239128 Dec 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants