Skip to content

[SPARK-27998][SQL] Column alias should support quote string#24840

Closed
zhulipeng wants to merge 1 commit intoapache:masterfrom
zhulipeng:SPARK-27998
Closed

[SPARK-27998][SQL] Column alias should support quote string#24840
zhulipeng wants to merge 1 commit intoapache:masterfrom
zhulipeng:SPARK-27998

Conversation

@zhulipeng
Copy link
Contributor

What changes were proposed in this pull request?

According to the ANSI SQL standard, column alias can be double quote string but SparkSQL only support backquote string.

SELECT au_fname AS "First name", 
au_lname AS 'Last name', 
city AS City, 
state, 
zip 'Postal code' FROM authors;

However, SparkSQL's syntax is different from others DB engines.

  • MySQL support Backquote(``), Single quote (''), Double quote("").
  • SQL Server support Single quote (''), Double quote("").
  • Teradata, Oracle, PostgreSQL only support Double quote("").

How was this patch tested?

Pass the Jenkins with the updated test cases.

Copy link
Member

@wangyum wangyum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lipzhu Could you add some test cases?

@dongjoon-hyun
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Jun 11, 2019

Test build #106390 has finished for PR 24840 at commit dfd79f6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

liancheng commented Jun 11, 2019

@lipzhu ANSI SQL uses double quotes to quote identifiers with special characters. Basically, double-quotes in ANSI SQL is equivalent to backquotes in Spark SQL while in ANSI SQL single-quotes are for string literals.

That being said, I don't think ANSI SQL ever allows string literals to be used as aliases. It has to be a possibly quoted identifier. The SQL '03 grammar rule also suggests that.

In the future, if we have something like "SQL dialect profile" that allows users to switch between different dialects and use double-quotes for quoting identifiers and single-quotes for quoting strings, then we can have this syntax. But right now, the syntax change proposed in this PR is inconsistent and violates ANSI SQL.

@zhulipeng
Copy link
Contributor Author

@liancheng , thanks for your comments on this PR.

ANSI SQL uses double quotes to quote identifiers with special characters. Basically, double-quotes in ANSI SQL is equivalent to backquotes in Spark SQL while in ANSI SQL single-quotes are for string literals

Only MySQL support backquote as column alias. The other database engines(Include MySQL) all support double quote. SparkSQL will throw Parser exception, this is a not good user experience.

In the future, if we have something like "SQL dialect profile" that allows users to switch between different dialects and use double-quotes for quoting identifiers and single-quotes for quoting strings, then we can have this syntax

Referring to the "SQL dialect profile", seems current SparkSQL grammar is like MySQL style. Do you think is it reasonable to move SparkSQL's grammar closer to the DW databases(Teradata/Redshift/Vertica)? SQL similar will reduce a lot of effort for user to switch between
SparkSQL and DW db engines.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@HyukjinKwon
Copy link
Member

I think we should leave it not fixed for now and decide which DBMS we will follow (or allow via a configuration like PostgreSQL). Let's don't fix it for now.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while.
This isn't a judgement on the merit of the PR in any way. It's just
a way of keeping the PR queue manageable.

If you'd like to revive this PR, please reopen it!

@github-actions github-actions bot added the Stale label Dec 28, 2019
@github-actions github-actions bot closed this Dec 29, 2019
@AngersZhuuuu
Copy link
Contributor

@lipzhu Your change will make name alias support both '' and "" not only "".
In presto and posgresSQL, '' as STRING can't as identifier, "" can be a identifier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants