[SPARK-33100][SQL] Ignore a semicolon inside a bracketed comment in spark-sql #29982

turboFei · 2020-10-09T07:17:08Z

What changes were proposed in this pull request?

Now the spark-sql does not support parse the sql statements with bracketed comments.
For the sql statements:

/* SELECT 'test'; */
SELECT 'test';

Would be split to two statements:
The first one: /* SELECT 'test'
The second one: */ SELECT 'test'

Then it would throw an exception because the first one is illegal.
In this PR, we ignore the content in bracketed comments while splitting the sql statements.
Besides, we ignore the comment without any content.

Why are the changes needed?

Spark-sql might split the statements inside bracketed comments and it is not correct.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added UT.

…ents

maropu · 2020-10-09T13:31:55Z

If a comment /* ... */ includes ;. it will throw an exception? The comment style is already supported, so the title and the description look confusing. I think this is just a bug.

maropu · 2020-10-09T13:32:04Z

ok to test

SparkQA · 2020-10-09T14:08:01Z

Test build #129585 has finished for PR 29982 at commit 084fc37.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-10-09T14:18:57Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34190/

SparkQA · 2020-10-09T14:43:59Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34190/

turboFei · 2020-10-09T16:01:02Z

@maropu Thanks for your comments, I have modified the title.

SparkQA · 2020-10-10T06:11:54Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34219/

SparkQA · 2020-10-10T06:13:14Z

Test build #129618 has finished for PR 29982 at commit 23437dd.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-10-10T06:13:15Z

Test build #129616 has finished for PR 29982 at commit 6b225ec.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-10-10T06:18:29Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34222/

SparkQA · 2020-10-10T06:34:21Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34219/

SparkQA · 2020-10-10T06:36:49Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34222/

SparkQA · 2020-10-10T09:05:59Z

Test build #129621 has finished for PR 29982 at commit bc1bb37.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-10-10T09:13:31Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34225/

SparkQA · 2020-10-10T09:37:29Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34225/

...e-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala

sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala

...e-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala

sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala

maropu · 2021-01-04T12:55:56Z

Thanks for fixing this, @turboFei. Looks fine cc: @HyukjinKwon @yaooqinn @wangyum

sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala

SparkQA · 2021-01-05T02:00:06Z

Test build #133631 has finished for PR 29982 at commit f7f8030.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-01-05T06:00:09Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38220/

SparkQA · 2021-01-05T06:33:32Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38220/

…park-sql ### What changes were proposed in this pull request? Now the spark-sql does not support parse the sql statements with bracketed comments. For the sql statements: ``` /* SELECT 'test'; */ SELECT 'test'; ``` Would be split to two statements: The first one: `/* SELECT 'test'` The second one: `*/ SELECT 'test'` Then it would throw an exception because the first one is illegal. In this PR, we ignore the content in bracketed comments while splitting the sql statements. Besides, we ignore the comment without any content. ### Why are the changes needed? Spark-sql might split the statements inside bracketed comments and it is not correct. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added UT. Closes #29982 from turboFei/SPARK-33110. Lead-authored-by: fwang12 <fwang12@ebay.com> Co-authored-by: turbofei <fwang12@ebay.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org> (cherry picked from commit a071826) Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>

maropu · 2021-01-05T06:57:08Z

Many thanks, @turboFei and @yaooqinn ! Merged to master/3.1. FYI: @dongjoon-hyun @HyukjinKwon

maropu · 2021-01-05T06:57:43Z

@turboFei Could you open a PR to fix it for branch-3.0/2.4?

turboFei · 2021-01-05T08:00:05Z

@turboFei Could you open a PR to fix it for branch-3.0/2.4?

sure

…park-sql Now the spark-sql does not support parse the sql statements with bracketed comments. For the sql statements: ``` /* SELECT 'test'; */ SELECT 'test'; ``` Would be split to two statements: The first one: `/* SELECT 'test'` The second one: `*/ SELECT 'test'` Then it would throw an exception because the first one is illegal. In this PR, we ignore the content in bracketed comments while splitting the sql statements. Besides, we ignore the comment without any content. Spark-sql might split the statements inside bracketed comments and it is not correct. No. Added UT. Closes apache#29982 from turboFei/SPARK-33110. Lead-authored-by: fwang12 <fwang12@ebay.com> Co-authored-by: turbofei <fwang12@ebay.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>

maropu · 2021-01-05T23:03:01Z

@turboFei I found some GA flakiness caused by this commit, e.g.,
https://github.com/apache/spark/pull/31045/checks?check_run_id=1652972350
https://github.com/apache/spark/runs/1652975825?check_suite_focus=true

Could you check/fix it? FYI: @dongjoon-hyun @HyukjinKwon

turboFei · 2021-01-06T00:21:41Z

oh, i will fix it today.

Should we ignore the comments during two ;?

for example,

/* comment*/; select * from test；

should be transfered asselect * from test;

Or I just remove the test case like that？

maropu · 2021-01-06T00:26:57Z

What's a root cause of the flakiness? It depends on the cause, I think.

turboFei · 2021-01-06T03:30:44Z

These two statements only return one result.

Might the first statement contains an invalid statement /* SELECT 'test';*/ and does not return result.

gatorsmile · 2021-01-06T04:01:50Z

CC @bogdanghit

turboFei · 2021-01-06T04:05:36Z

there is a bug for statementBegin method.
For /* SELECT 'test';*/, the last character / would be treated as beginning of statements

turboFei · 2021-01-06T04:32:23Z

create #31054 to fix this issue

… in spark-sql ### What changes were proposed in this pull request? Now the spark-sql does not support parse the sql statements with bracketed comments. For the sql statements: ``` /* SELECT 'test'; */ SELECT 'test'; ``` Would be split to two statements: The first one: `/* SELECT 'test'` The second one: `*/ SELECT 'test'` Then it would throw an exception because the first one is illegal. In this PR, we ignore the content in bracketed comments while splitting the sql statements. Besides, we ignore the comment without any content. NOTE: This backport comes from #29982 ### Why are the changes needed? Spark-sql might split the statements inside bracketed comments and it is not correct. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added UT. Closes #31033 from turboFei/SPARK-33100. Authored-by: fwang12 <fwang12@ebay.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>

…kend ### What changes were proposed in this pull request? In current spark-sql cli interface, if the end SQL is not a close comment, the SQL won't be passed to backend engine and just ignored. This caused a problem that if user write a SQL with wrong comment. It's just ignored and won't throw exception. For example: ``` spark-sql> /* This is a comment without end symbol SELECT 1; spark-sql> ``` After this pr: ``` spark-sql> /* This is a comment without end symbol SELECT 1; Error in query: Unclosed bracketed comment(line 1, pos 0) == SQL == /* This is a comment without end symbol SELECT 1; ^^^ ``` In SPARK-33100 add this change #29982 Hive related code https://github.com/apache/hive/blob/1090c93b1a02d480bdee2af2cecf503f8a54efc6/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java#L488-L490 ### Why are the changes needed? Exact exceptions are thrown for wrong statements, which is convenient for users to troubleshoot. ### Does this PR introduce _any_ user-facing change? Yes, if user write a wrong comment in sql/sql file or query in the end. Before it's just ignored since it's not a statement. Now it will be passed to backend engine and if the statement is not correct, it will throw SQL exception. ### How was this patch tested? added UT and test by handle. ``` spark-sql> /* SELECT /*+ HINT() 4; */; Error in query: mismatched input ';' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 26) == SQL == /* SELECT /*+ HINT() 4; */; --------------------------^^^ spark-sql> /* SELECT /*+ HINT() 4; */ > SELECT 1; 1 Time taken: 3.16 seconds, Fetched 1 row(s) spark-sql> /* SELECT /*+ HINT() */ 4; */; spark-sql> > ; spark-sql> > /* SELECT /*+ HINT() 4\\; > SELECT 1; Error in query: Unclosed bracketed comment(line 1, pos 0) == SQL == /* SELECT /*+ HINT() 4\\; ^^^ SELECT 1; spark-sql> ``` Closes #34815 from AngersZhuuuu/SPARK-37555. Authored-by: Angerszhuuuu <angers.zhu@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

turboFei changed the title ~~[SPARK-33110][SQL] Support parse the sql statements with c-style comments~~ [SPARK-33100][SQL] Support parse the sql statements with c-style comments Oct 9, 2020

[SPARK-33100][SQL] Support parse the sql statements with C-style comm…

084fc37

…ents

turboFei changed the title ~~[SPARK-33100][SQL] Support parse the sql statements with c-style comments~~ [SPARK-33100][SQL] Support parse the sql statements with C-style comments Oct 9, 2020

turboFei force-pushed the SPARK-33110 branch from a64e107 to 084fc37 Compare October 9, 2020 07:20

turboFei changed the title ~~[SPARK-33100][SQL] Support parse the sql statements with C-style comments~~ [SPARK-33100][SQL] Fix issue when parsing the sql statements with C-style comments Oct 9, 2020

turboFei changed the title ~~[SPARK-33100][SQL] Fix issue when parsing the sql statements with C-style comments~~ [SPARK-33100][SQL] Fix the issue when parsing the sql statements with C-style comments Oct 9, 2020

turboFei changed the title ~~[SPARK-33100][SQL] Fix the issue when parsing the sql statements with C-style comments~~ [SPARK-33100][SQL] Fix the issue when parsing sql statements with C-style comments Oct 9, 2020

ignore the comment without content

23437dd

turboFei force-pushed the SPARK-33110 branch from 6b225ec to 23437dd Compare October 10, 2020 05:27

fix ut

bc1bb37