Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-42553][SQL][3.3] Ensure at least one time unit after "interval" #40253

Closed
wants to merge 1 commit into from

Conversation

jiang13021
Copy link

@jiang13021 jiang13021 commented Mar 2, 2023

What changes were proposed in this pull request?

This PR aims to ensure "at least one time unit should be given for interval literal" by modifying SqlBaseParser.

This is a backport of #40195

Why are the changes needed?

INTERVAL is a Non-Reserved keyword in spark. But when I run

scala> spark.sql("select interval from mytable")

I get

org.apache.spark.sql.catalyst.parser.ParseException:
at least one time unit should be given for interval literal(line 1, pos 7)== SQL ==
select interval from mytable
-------^^^  at org.apache.spark.sql.errors.QueryParsingErrors$.invalidIntervalLiteralError(QueryParsingErrors.scala:196)
......

It is a bug because "Non-Reserved keywords" have a special meaning in particular contexts and can be used as identifiers in other contexts. So by design, INTERVAL can be used as a column name.

Currently the interval's grammar is

interval
    : INTERVAL (errorCapturingMultiUnitsInterval | errorCapturingUnitToUnitInterval)?
    ;

There is no need to make the time unit nullable, we can ensure "at least one time unit should be given for interval literal" if the interval's grammar is

interval
    : INTERVAL (errorCapturingMultiUnitsInterval | errorCapturingUnitToUnitInterval)
    ;

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test: PlanParsserSuite."SPARK-42553: NonReserved keyword 'interval' can be column name"

Local test

scala> val myDF = spark.sparkContext.makeRDD(1 to 5).toDF("interval")
myDF: org.apache.spark.sql.DataFrame = [interval: int]

scala> myDF.createOrReplaceTempView("mytable")

scala> spark.sql("select interval from mytable;").show()
+--------+
|interval|
+--------+
|       1|
|       2|
|       3|
|       4|
|       5|
+--------+

@MaxGekk
Copy link
Member

MaxGekk commented Mar 2, 2023

@jiang13021 Thank you for the backport. Could add the following, please:

  1. The tag [3.3] to PR's title.
  2. This is a backport of https://github.com/apache/spark/pull/40195 to PR's description.

@jiang13021 jiang13021 changed the title [SPARK-42553][SQL] Ensure at least one time unit after "interval" [SPARK-42553][SQL][3.3] Ensure at least one time unit after "interval" Mar 2, 2023
@jiang13021
Copy link
Author

@jiang13021 Thank you for the backport. Could add the following, please:

  1. The tag [3.3] to PR's title.
  2. This is a backport of https://github.com/apache/spark/pull/40195 to PR's description.

Done

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for CI.

@MaxGekk
Copy link
Member

MaxGekk commented Mar 2, 2023

+1, LGTM. All GAs passed. Merging to 3.3.
Thank you, @jiang13021.

MaxGekk pushed a commit that referenced this pull request Mar 2, 2023
### What changes were proposed in this pull request?
This PR aims to ensure "at least one time unit should be given for interval literal" by modifying SqlBaseParser.

This is a backport of #40195

### Why are the changes needed?
INTERVAL is a Non-Reserved keyword in spark. But when I run
```shell
scala> spark.sql("select interval from mytable")
```
I get
```
org.apache.spark.sql.catalyst.parser.ParseException:
at least one time unit should be given for interval literal(line 1, pos 7)== SQL ==
select interval from mytable
-------^^^  at org.apache.spark.sql.errors.QueryParsingErrors$.invalidIntervalLiteralError(QueryParsingErrors.scala:196)
......
```
It is a bug because "Non-Reserved keywords" have a special meaning in particular contexts and can be used as identifiers in other contexts. So by design, INTERVAL can be used as a column name.

Currently the interval's grammar is
```
interval
    : INTERVAL (errorCapturingMultiUnitsInterval | errorCapturingUnitToUnitInterval)?
    ;
```
There is no need to make the time unit nullable, we can ensure "at least one time unit should be given for interval literal" if the interval's grammar is
```
interval
    : INTERVAL (errorCapturingMultiUnitsInterval | errorCapturingUnitToUnitInterval)
    ;
```
### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Unit test: PlanParsserSuite."SPARK-42553: NonReserved keyword 'interval' can be column name"

Local test
```shell
scala> val myDF = spark.sparkContext.makeRDD(1 to 5).toDF("interval")
myDF: org.apache.spark.sql.DataFrame = [interval: int]

scala> myDF.createOrReplaceTempView("mytable")

scala> spark.sql("select interval from mytable;").show()
+--------+
|interval|
+--------+
|       1|
|       2|
|       3|
|       4|
|       5|
+--------+

```

Closes #40253 from jiang13021/branch-3.3-42553.

Authored-by: jiangyzanze <jiangyanze.jyz@alibaba-inc.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
@MaxGekk MaxGekk closed this Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants