Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-28880][SQL] Support ANSI nested bracketed comments #27495

Closed
wants to merge 28 commits into from

Conversation

beliefer
Copy link
Contributor

@beliefer beliefer commented Feb 8, 2020

What changes were proposed in this pull request?

Spark SQL support single comments and bracketed comments now. This PR will support nested bracketed comments.

There are some mainstream database support the syntax.
PostgreSQL:
https://www.postgresql.org/docs/11/sql-syntax-lexical.html#SQL-SYNTAX-COMMENTS

Vertica:
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Expressions/Comments.htm?zoom_highlight=comments

Note: Because Spark SQL not exists UT for single comments and bracketed comments, so I add some UT for them.

Why are the changes needed?

nested bracketed comments is ANSI standard.

Does this PR introduce any user-facing change?

No

How was this patch tested?

New UT

@gatorsmile
Copy link
Member

cc @maropu @gengliangwang @cloud-fan

@beliefer
Copy link
Contributor Author

beliefer commented Feb 8, 2020

@gatorsmile Thank you.

@SparkQA
Copy link

SparkQA commented Feb 8, 2020

Test build #118054 has finished for PR 27495 at commit 464cd03.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Feb 8, 2020

I personally think that these supported comment syntax are documented in the Spark SQL Guide (along with the PostgreSQL/Vertica docs above). WDYT? cc: @dilipbiswal


test("bracketed comment case one") {
val plan = table("a").select(star())
assertEqual("/* This is an example of SQL which should not execute:\n" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you follow a format like this?

    assertEqual(
      """
        |/*
        |  XXX
        | */
        |SELECT ...
      """.stripMargin,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Thanks.

@@ -1794,7 +1794,7 @@ BRACKETED_EMPTY_COMMENT
;

BRACKETED_COMMENT
: '/*' ~[+] .*? '*/' -> channel(HIDDEN)
: '/*' ~[+] ( ~'/' | ~'*' '/' ~'*' )*? BRACKETED_COMMENT? ( ~'/' | ~'*' '/' ~'*' )*? '*/' -> channel(HIDDEN)
Copy link
Member

@maropu maropu Feb 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

: '/*' ~[+] (BRACKETED_COMMENT|.)*? '*/' -> channel(HIDDEN) ?

Copy link
Contributor Author

@beliefer beliefer Feb 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry. Maybe I lost some information.
I try to use this again and found it works well.

@maropu
Copy link
Member

maropu commented Feb 8, 2020

Can you remove the jira numbers below?

-- [SPARK-28880] ANSI SQL: Bracketed comments

-- [SPARK-28880] ANSI SQL: Bracketed comments

@beliefer
Copy link
Contributor Author

beliefer commented Feb 8, 2020

Can you remove the jira numbers below?

-- [SPARK-28880] ANSI SQL: Bracketed comments

-- [SPARK-28880] ANSI SQL: Bracketed comments

I want remove the jira numbers with #27481. Golden files can't generated correctly.

@maropu
Copy link
Member

maropu commented Feb 8, 2020

If the golden file is not correct, we should fix this in this pr because SPARK-28880 is a ticket to support nested bracketed comments and this means the fix makes the file correct. To resolve SPARK-28880, I think we need to check if the golden file is correct.

@beliefer
Copy link
Contributor Author

beliefer commented Feb 8, 2020

If the golden file is not correct, we should fix this in this pr because SPARK-28880 is a ticket to support nested bracketed comments and this means the fix makes the file correct. To resolve SPARK-28880, I think we need to check if the golden file is correct.

Could I create another ticket and update golden file with the new ticket id?

@SparkQA
Copy link

SparkQA commented Feb 8, 2020

Test build #118063 has finished for PR 27495 at commit c782bac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Feb 8, 2020

Since that's a minor fix, plz include it in this pr.

@SparkQA
Copy link

SparkQA commented Feb 8, 2020

Test build #118066 has finished for PR 27495 at commit 0a21b5e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 8, 2020

Test build #118069 has finished for PR 27495 at commit 1b9d1da.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 8, 2020

Test build #118070 has finished for PR 27495 at commit 3d03688.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 8, 2020

Test build #118071 has finished for PR 27495 at commit 596d905.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

beliefer commented Feb 8, 2020

Since that's a minor fix, plz include it in this pr.

I have no idea that SQLQueryTestSuite treats nested bracketed comments well now.
#27481 treats bracketed comments well now.
I don't want the work for SQLQueryTestSuite blocks support for nested bracketed comments.
And I think these are two things.

@maropu
Copy link
Member

maropu commented Feb 9, 2020

Ah, I got it now. The reason not to generate the output correctly is an issue in the testing logic itself in SQLQueryTestSuite, right? (I thought that's an issue in how-to-write postgreSQL/comments.sql). If so, they're different issues as you said above.

@beliefer
Copy link
Contributor Author

beliefer commented Feb 9, 2020

Ah, I got it now. The reason not to generate the output correctly is an issue in the testing logic itself in SQLQueryTestSuite, right? (I thought that's an issue in how-to-write postgreSQL/comments.sql). If so, they're different issues as you said above.

@maropu Yes, thank you. I will throw exception in golden files until we can fix it.

@SparkQA
Copy link

SparkQA commented Feb 9, 2020

Test build #118095 has finished for PR 27495 at commit 7760f91.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

@cloud-fan This is a very very good idea. I learned it and will make a try.

@beliefer
Copy link
Contributor Author

@cloud-fan
'/*' (BRACKETED_COMMENT|.)*? '*/' {isHint()}? -> channel(HIDDEN)
exists a issue that antlr matches bracketed comments or hint first and then call method isHint, so the matches will throw parse exception first.

@beliefer
Copy link
Contributor Author

@cloud-fan @gengliangwang
I improved the code show below and it works well.

public boolean isHint() {
   int firstChar = _input.LA(1);
   if (firstChar == '+') {
     return true;
   } else {
     return false;
   }
}
'/*' {!isHint()}? (BRACKETED_COMMENT|.)*? '*/' -> channel(HIDDEN)

* Returns true if the first character is '+'.
*/
public boolean isHint() {
int firstChar = _input.LA(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_input.LA(1) returns next char, shall we name it nextChar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@@ -61,6 +61,19 @@ grammar SqlBase;
* When true, the behavior of keywords follows ANSI SQL standard.
*/
public boolean SQL_standard_keyword_behavior = false;

/**
* Verify whether current token is a valid hint token (which follows '/*' and is '+').
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to enrich this comment:

This is called when we see '/*' and try to match it as a comment. If the next char is '+', this should
be parsed as hint later and we can't match it as a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better. Thanks.

@@ -1797,11 +1810,11 @@ SIMPLE_COMMENT
;

BRACKETED_EMPTY_COMMENT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(BRACKETED_COMMENT|.)*? should work for empty comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need.

@SparkQA
Copy link

SparkQA commented Feb 21, 2020

Test build #118785 has finished for PR 27495 at commit 660cd34.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.


/**
* This method will be called when we see '/*' and try to match it as a bracketed comment.
* If the next character is '+', it should be parsed as hint later, otherwise we cannot match
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise -> and

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

* If the next character is '+', it should be parsed as hint later, otherwise we cannot match
* it as a bracketed comment.
*
* Returns true if the first character is '+'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next character

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fault.

@SparkQA
Copy link

SparkQA commented Feb 21, 2020

Test build #118788 has finished for PR 27495 at commit 48814c3.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Feb 21, 2020

Test build #118789 has finished for PR 27495 at commit 48814c3.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Feb 21, 2020

Test build #118790 has finished for PR 27495 at commit 48814c3.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@beliefer
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Feb 21, 2020

Test build #118786 has finished for PR 27495 at commit 9c856b7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 21, 2020

Test build #118798 has finished for PR 27495 at commit 48814c3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the work!

@gengliangwang
Copy link
Member

Merging to master.

@beliefer
Copy link
Contributor Author

@cloud-fan @maropu @gengliangwang @dongjoon-hyun @gatorsmile Thanks for all your help.

sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
### What changes were proposed in this pull request?
Although Spark SQL support bracketed comments, but `SQLQueryTestSuite` can't treat bracketed comments well and lead to generated golden files can't display bracketed comments well.
This PR will improve the treatment of bracketed comments and add three test case in `PlanParserSuite`.
Spark SQL can't support nested bracketed comments and apache#27495 used to support it.

### Why are the changes needed?
Golden files can't display well.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
New UT.

Closes apache#27481 from beliefer/ansi-brancket-comments.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
### What changes were proposed in this pull request?
Spark SQL support single comments and bracketed comments now. This PR will support nested bracketed comments.

There are some mainstream database support the syntax.
**PostgreSQL:**
https://www.postgresql.org/docs/11/sql-syntax-lexical.html#SQL-SYNTAX-COMMENTS

**Vertica:**
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Expressions/Comments.htm?zoom_highlight=comments

Note: Because Spark SQL not exists UT for single comments and bracketed comments, so I add some UT for them.

### Why are the changes needed?
nested bracketed comments is ANSI standard.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
New UT

Closes apache#27495 from beliefer/nested-brancket-comments.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
@beliefer beliefer deleted the nested-brancket-comments branch April 23, 2024 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
7 participants