[SPARK-27998][SQL] Column alias should support quote string#24840
[SPARK-27998][SQL] Column alias should support quote string#24840zhulipeng wants to merge 1 commit intoapache:masterfrom zhulipeng:SPARK-27998
Conversation
wangyum
left a comment
There was a problem hiding this comment.
@lipzhu Could you add some test cases?
|
ok to test |
|
Test build #106390 has finished for PR 24840 at commit
|
|
@lipzhu ANSI SQL uses double quotes to quote identifiers with special characters. Basically, double-quotes in ANSI SQL is equivalent to backquotes in Spark SQL while in ANSI SQL single-quotes are for string literals. That being said, I don't think ANSI SQL ever allows string literals to be used as aliases. It has to be a possibly quoted identifier. The SQL '03 grammar rule also suggests that. In the future, if we have something like "SQL dialect profile" that allows users to switch between different dialects and use double-quotes for quoting identifiers and single-quotes for quoting strings, then we can have this syntax. But right now, the syntax change proposed in this PR is inconsistent and violates ANSI SQL. |
|
@liancheng , thanks for your comments on this PR.
Only MySQL support backquote as column alias. The other database engines(Include MySQL) all support double quote. SparkSQL will throw Parser exception, this is a not good user experience.
Referring to the "SQL dialect profile", seems current SparkSQL grammar is like MySQL style. Do you think is it reasonable to move SparkSQL's grammar closer to the DW databases(Teradata/Redshift/Vertica)? SQL similar will reduce a lot of effort for user to switch between |
|
Can one of the admins verify this patch? |
|
I think we should leave it not fixed for now and decide which DBMS we will follow (or allow via a configuration like PostgreSQL). Let's don't fix it for now. |
|
We're closing this PR because it hasn't been updated in a while. If you'd like to revive this PR, please reopen it! |
|
@lipzhu Your change will make name alias support both |
What changes were proposed in this pull request?
According to the ANSI SQL standard, column alias can be double quote string but SparkSQL only support backquote string.
However, SparkSQL's syntax is different from others DB engines.
How was this patch tested?
Pass the Jenkins with the updated test cases.