-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26215][SQL] Define reserved/non-reserved keywords based on the ANSI SQL standard #23259
Conversation
To discuss this topic smoothly, I made this pr. |
@@ -769,7 +774,7 @@ nonReserved | |||
| REVOKE | GRANT | LOCK | UNLOCK | MSCK | REPAIR | RECOVER | EXPORT | IMPORT | LOAD | VALUES | COMMENT | ROLE | |||
| ROLES | COMPACTIONS | PRINCIPALS | TRANSACTIONS | INDEX | INDEXES | LOCKS | OPTION | LOCAL | INPATH | |||
| ASC | DESC | LIMIT | RENAME | SETS | |||
| AT | NULLS | OVERWRITE | ALL | ANY | ALTER | AS | BETWEEN | BY | CREATE | DELETE | |||
| AT | NULLS | OVERWRITE | ANY | ALTER | AS | BETWEEN | BY | CREATE | DELETE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't ANY
move to reserved
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, thanks. you're right.
Test build #99854 has finished for PR 23259 at commit
|
Retest this please. |
cc @hvanhovell , @mgaido91 , too. |
Test build #99880 has finished for PR 23259 at commit
|
thanks @maropu for starting it!
I think SQL 2011 is good, but if we can't find a public version, maybe it's also OK to follow postgres
I think it should be
I think the new files you created in this PR is a good place to document it |
+1 for SQL 2011. I downloaded the standard but I couldn't find any section dedicated to, In postgres doc, though, they are stating that they are not following the standard strictly: https://www.postgresql.org/docs/11/sql-keywords-appendix.html. Shall we follow that list and follow the standard as it is mentioned there? |
ping @maropu Anything is blocking this PR? |
I'm working on this now, so I'll update in a few days. |
01bc383
to
46029b2
Compare
46029b2
to
bbd5990
Compare
| SERDEPROPERTIES | SET | SETS | SHOW | SKEWED | SORT | SORTED | START | STATISTICS | STORED | STRATIFY | ||
| STRUCT | TABLE | TABLES | TABLESAMPLE | TBLPROPERTIES | TEMPORARY | TERMINATED | THEN | TO | TOUCH | ||
| TRAILING | TRANSACTION | TRANSACTIONS | TRANSFORM | TRUE | TRUNCATE | UNARCHIVE | UNBOUNDED | UNCACHE | ||
| UNLOCK | UNSET | USE | VALUES | VIEW | WHEN | WHERE | WINDOW | WITH | ||
; |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
| DATABASE | SELECT | FROM | WHERE | HAVING | TO | TABLE | WITH | NOT | ||
| DIRECTORY | ||
| BOTH | LEADING | TRAILING | ||
: ADD | AFTER | ALL | ALTER | ANALYZE | AND | ANY | ARCHIVE | ARRAY | AS | ASC | AT | BETWEEN | BOTH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To compare to the reserved
entries easily, I just sorted the nonReserved
ones in an alphabetical order.
bbd5990
to
71455d8
Compare
Not finished yet, so I need more time to brush up code (some code is still wrong...) |
Test build #101889 has finished for PR 23259 at commit
|
Test build #101890 has finished for PR 23259 at commit
|
Test build #101892 has finished for PR 23259 at commit
|
15c2046
to
42740b8
Compare
Test build #101985 has finished for PR 23259 at commit
|
42740b8
to
80005cb
Compare
Test build #101986 has finished for PR 23259 at commit
|
80005cb
to
f6bf2e0
Compare
Test build #102009 has finished for PR 23259 at commit
|
ok, ready to review, anyone could do this? |
retest this please |
Test build #102599 has finished for PR 23259 at commit
|
retest this please |
Test build #102602 has finished for PR 23259 at commit
|
retest this please |
Test build #102605 has finished for PR 23259 at commit
|
I'm fixing the test failure... |
Test build #102607 has finished for PR 23259 at commit
|
displayTitle: SQL Reserved/Non-Reserved Keywords | ||
--- | ||
|
||
When `spark.sql.parser.ansi.enabled` is set to true (false by default), some keywords are reserved for Spark SQL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Spark SQL there are 2 kinds of keywords: non-reserved and reserved. Non-reserved keywords have a
special meaning only in particular contexts and can be used as identifiers (e.g., table names, view names,
column names, column aliases, table aliases) in other contexts. Reserved keywords can't be used as
table alias, but can be used as other identifiers.
The list of reserved and non-reserved keywords can change according to the config
`spark.sql.parser.ansi.enabled`, which is false by default.
| ANTI | FULL | INNER | LEFT | SEMI | RIGHT | NATURAL | JOIN | CROSS | ON | ||
| UNION | INTERSECT | EXCEPT | SETMINUS | ||
| {ansi}? ansiReserved | ||
| {!ansi}? reserved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe defaultReserved
// Let's say you add a new token `NEWTOKEN` and this is not reserved regardless of a `spark.sql.parser.ansi.enabled` | ||
// value. In this case, you must add a token `NEWTOKEN` in both `ansiNonReserved` and `nonReserved`. | ||
|
||
// A list of the reserved keywords below in Spark SQL. These keywords cannot be used for identifiers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// The list of the reserved keywords when `spark.sql.parser.ansi.enabled` is true. Currently, we only reserve
// the ANSI keywords that almost all the ANSI SQL standards (SQL-92, SQL-99, SQL-2003, SQL-2008, SQL-2011,
// and SQL-2016) and PostgreSQL reserve.
; | ||
|
||
// When `spark.sql.parser.ansi.enabled` = true, the `ansiNonReserved` keywords can be used for identifiers. | ||
// Otherwise (`spark.sql.parser.ansi.enabled` = false), we follow the existing Spark SQL behaviour until v3.0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list of the non-reserved keywords when `spark.sql.parser.ansi.enabled` is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
val keyword = ctx.getText | ||
if (ctx.ansiReserved() != null) { | ||
throw new ParseException( | ||
s"'$keyword' is reserved and you cannot use this keyword as an identifier.", ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, do we need to do it in this PR? I think this PR just changes the list of non-reserved/reserved keywords according to the config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah..., ok. But, we have no behaivour change between ansi=true and ansi=false now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I reverted the unrelated stuffs.
@@ -53,10 +55,96 @@ class TableIdentifierParserSuite extends SparkFunSuite { | |||
"bigint", "binary", "boolean", "current_date", "current_timestamp", "date", "double", "float", | |||
"int", "smallint", "timestamp", "at", "position", "both", "leading", "trailing", "extract") | |||
|
|||
val hiveStrictNonReservedKeyword = Seq("anti", "full", "inner", "left", "semi", "right", | |||
val hiveStrictNonReservedKeywords = Seq("anti", "full", "inner", "left", "semi", "right", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, seems we don't have reserved keywords at all before this PR. We only have non-reserved keywords and strict-non-reserved keywords...
LGTM. This PR has only a small behavior change when ansi mode is on: some keywords can't be used in table aliases. In the followup we can forbid reserved keywords as identifiers, when ansi mode is on. |
Thanks for the active review! |
Test build #102615 has finished for PR 23259 at commit
|
Test build #102627 has finished for PR 23259 at commit
|
retest this please |
Test build #102629 has finished for PR 23259 at commit
|
Thanks! Merged to master. |
I'll created a new jira for the followup: https://issues.apache.org/jira/browse/SPARK-26976 |
… ANSI SQL standard ## What changes were proposed in this pull request? This pr targeted to define reserved/non-reserved keywords for Spark SQL based on the ANSI SQL standards and the other database-like systems (e.g., PostgreSQL). We assume that they basically follow the ANSI SQL-2011 standard, but it is slightly different between each other. Therefore, this pr documented all the keywords in `docs/sql-reserved-and-non-reserved-key-words.md`. NOTE: This pr only added a small set of keywords as reserved ones and these keywords are reserved in all the ANSI SQL standards (SQL-92, SQL-99, SQL-2003, SQL-2008, SQL-2011, and SQL-2016) and PostgreSQL. This is because there is room to discuss which keyword should be reserved or not, .e.g., interval units (day, hour, minute, second, ...) are reserved in the ANSI SQL standards though, they are not reserved in PostgreSQL. Therefore, we need more researches about the other database-like systems (e.g., Oracle Databases, DB2, SQL server) in follow-up activities. References: - The reserved/non-reserved SQL keywords in the ANSI SQL standards: https://developer.mimer.com/wp-content/uploads/2018/05/Standard-SQL-Reserved-Words-Summary.pdf - SQL Key Words in PostgreSQL: https://www.postgresql.org/docs/current/sql-keywords-appendix.html ## How was this patch tested? Added tests in `TableIdentifierParserSuite`. Closes #23259 from maropu/SPARK-26215-WIP. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
…mode is on ## What changes were proposed in this pull request? This pr added code to forbid reserved keywords as identifiers when ANSI mode is on. This is a follow-up of SPARK-26215(#23259). ## How was this patch tested? Added tests in `TableIdentifierParserSuite`. Closes #23880 from maropu/SPARK-26976. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
… ANSI SQL standard ## What changes were proposed in this pull request? This pr targeted to define reserved/non-reserved keywords for Spark SQL based on the ANSI SQL standards and the other database-like systems (e.g., PostgreSQL). We assume that they basically follow the ANSI SQL-2011 standard, but it is slightly different between each other. Therefore, this pr documented all the keywords in `docs/sql-reserved-and-non-reserved-key-words.md`. NOTE: This pr only added a small set of keywords as reserved ones and these keywords are reserved in all the ANSI SQL standards (SQL-92, SQL-99, SQL-2003, SQL-2008, SQL-2011, and SQL-2016) and PostgreSQL. This is because there is room to discuss which keyword should be reserved or not, .e.g., interval units (day, hour, minute, second, ...) are reserved in the ANSI SQL standards though, they are not reserved in PostgreSQL. Therefore, we need more researches about the other database-like systems (e.g., Oracle Databases, DB2, SQL server) in follow-up activities. References: - The reserved/non-reserved SQL keywords in the ANSI SQL standards: https://developer.mimer.com/wp-content/uploads/2018/05/Standard-SQL-Reserved-Words-Summary.pdf - SQL Key Words in PostgreSQL: https://www.postgresql.org/docs/current/sql-keywords-appendix.html ## How was this patch tested? Added tests in `TableIdentifierParserSuite`. Closes apache#23259 from maropu/SPARK-26215-WIP. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
What changes were proposed in this pull request?
This pr targeted to define reserved/non-reserved keywords for Spark SQL based on the ANSI SQL standards and the other database-like systems (e.g., PostgreSQL). We assume that they basically follow the ANSI SQL-2011 standard, but it is slightly different between each other. Therefore, this pr documented all the keywords in
docs/sql-reserved-and-non-reserved-key-words.md
.NOTE: This pr only added a small set of keywords as reserved ones and these keywords are reserved in all the ANSI SQL standards (SQL-92, SQL-99, SQL-2003, SQL-2008, SQL-2011, and SQL-2016) and PostgreSQL. This is because there is room to discuss which keyword should be reserved or not, .e.g., interval units (day, hour, minute, second, ...) are reserved in the ANSI SQL standards though, they are not reserved in PostgreSQL. Therefore, we need more researches about the other database-like systems (e.g., Oracle Databases, DB2, SQL server) in follow-up activities.
References:
How was this patch tested?
Added tests in
TableIdentifierParserSuite
.