[SPARK-42553][SQL][3.3] Ensure at least one time unit after "interval" #40253

jiang13021 · 2023-03-02T12:43:22Z

What changes were proposed in this pull request?

This PR aims to ensure "at least one time unit should be given for interval literal" by modifying SqlBaseParser.

This is a backport of #40195

Why are the changes needed?

INTERVAL is a Non-Reserved keyword in spark. But when I run

scala> spark.sql("select interval from mytable")

I get

org.apache.spark.sql.catalyst.parser.ParseException:
at least one time unit should be given for interval literal(line 1, pos 7)== SQL ==
select interval from mytable
-------^^^  at org.apache.spark.sql.errors.QueryParsingErrors$.invalidIntervalLiteralError(QueryParsingErrors.scala:196)
......

It is a bug because "Non-Reserved keywords" have a special meaning in particular contexts and can be used as identifiers in other contexts. So by design, INTERVAL can be used as a column name.

Currently the interval's grammar is

interval
    : INTERVAL (errorCapturingMultiUnitsInterval | errorCapturingUnitToUnitInterval)?
    ;

There is no need to make the time unit nullable, we can ensure "at least one time unit should be given for interval literal" if the interval's grammar is

interval
    : INTERVAL (errorCapturingMultiUnitsInterval | errorCapturingUnitToUnitInterval)
    ;

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test: PlanParsserSuite."SPARK-42553: NonReserved keyword 'interval' can be column name"

Local test

scala> val myDF = spark.sparkContext.makeRDD(1 to 5).toDF("interval")
myDF: org.apache.spark.sql.DataFrame = [interval: int]

scala> myDF.createOrReplaceTempView("mytable")

scala> spark.sql("select interval from mytable;").show()
+--------+
|interval|
+--------+
|       1|
|       2|
|       3|
|       4|
|       5|
+--------+

MaxGekk · 2023-03-02T13:00:32Z

@jiang13021 Thank you for the backport. Could add the following, please:

The tag [3.3] to PR's title.
This is a backport of https://github.com/apache/spark/pull/40195 to PR's description.

jiang13021 · 2023-03-02T13:23:07Z

@jiang13021 Thank you for the backport. Could add the following, please:

The tag [3.3] to PR's title.

This is a backport of https://github.com/apache/spark/pull/40195 to PR's description.

Done

MaxGekk

Waiting for CI.

MaxGekk · 2023-03-02T15:23:51Z

+1, LGTM. All GAs passed. Merging to 3.3.
Thank you, @jiang13021.

### What changes were proposed in this pull request? This PR aims to ensure "at least one time unit should be given for interval literal" by modifying SqlBaseParser. This is a backport of #40195 ### Why are the changes needed? INTERVAL is a Non-Reserved keyword in spark. But when I run ```shell scala> spark.sql("select interval from mytable") ``` I get ``` org.apache.spark.sql.catalyst.parser.ParseException: at least one time unit should be given for interval literal(line 1, pos 7)== SQL == select interval from mytable -------^^^ at org.apache.spark.sql.errors.QueryParsingErrors$.invalidIntervalLiteralError(QueryParsingErrors.scala:196) ...... ``` It is a bug because "Non-Reserved keywords" have a special meaning in particular contexts and can be used as identifiers in other contexts. So by design, INTERVAL can be used as a column name. Currently the interval's grammar is ``` interval : INTERVAL (errorCapturingMultiUnitsInterval | errorCapturingUnitToUnitInterval)? ; ``` There is no need to make the time unit nullable, we can ensure "at least one time unit should be given for interval literal" if the interval's grammar is ``` interval : INTERVAL (errorCapturingMultiUnitsInterval | errorCapturingUnitToUnitInterval) ; ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test: PlanParsserSuite."SPARK-42553: NonReserved keyword 'interval' can be column name" Local test ```shell scala> val myDF = spark.sparkContext.makeRDD(1 to 5).toDF("interval") myDF: org.apache.spark.sql.DataFrame = [interval: int] scala> myDF.createOrReplaceTempView("mytable") scala> spark.sql("select interval from mytable;").show() +--------+ |interval| +--------+ | 1| | 2| | 3| | 4| | 5| +--------+ ``` Closes #40253 from jiang13021/branch-3.3-42553. Authored-by: jiangyzanze <jiangyanze.jyz@alibaba-inc.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>

[SPARK-42553][SQL] Ensure at least one time unit after "interval"

abf0bc8

github-actions bot added the SQL label Mar 2, 2023

jiang13021 mentioned this pull request Mar 2, 2023

[SPARK-42553][SQL] Ensure at least one time unit after "interval" #40195

Closed

jiang13021 changed the title ~~[SPARK-42553][SQL] Ensure at least one time unit after "interval"~~ [SPARK-42553][SQL][3.3] Ensure at least one time unit after "interval" Mar 2, 2023

MaxGekk approved these changes Mar 2, 2023

View reviewed changes

MaxGekk closed this Mar 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-42553][SQL][3.3] Ensure at least one time unit after "interval" #40253

[SPARK-42553][SQL][3.3] Ensure at least one time unit after "interval" #40253

jiang13021 commented Mar 2, 2023 •

edited by MaxGekk

MaxGekk commented Mar 2, 2023

jiang13021 commented Mar 2, 2023

MaxGekk left a comment

MaxGekk commented Mar 2, 2023

[SPARK-42553][SQL][3.3] Ensure at least one time unit after "interval" #40253

[SPARK-42553][SQL][3.3] Ensure at least one time unit after "interval" #40253

Conversation

jiang13021 commented Mar 2, 2023 • edited by MaxGekk

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

MaxGekk commented Mar 2, 2023

jiang13021 commented Mar 2, 2023

MaxGekk left a comment

Choose a reason for hiding this comment

MaxGekk commented Mar 2, 2023

jiang13021 commented Mar 2, 2023 •

edited by MaxGekk