Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33100][SQL] Ignore a semicolon inside a bracketed comment in spark-sql #29982

Closed
wants to merge 31 commits into from

Conversation

turboFei
Copy link
Member

@turboFei turboFei commented Oct 9, 2020

What changes were proposed in this pull request?

Now the spark-sql does not support parse the sql statements with bracketed comments.
For the sql statements:

/* SELECT 'test'; */
SELECT 'test';

Would be split to two statements:
The first one: /* SELECT 'test'
The second one: */ SELECT 'test'

Then it would throw an exception because the first one is illegal.
In this PR, we ignore the content in bracketed comments while splitting the sql statements.
Besides, we ignore the comment without any content.

Why are the changes needed?

Spark-sql might split the statements inside bracketed comments and it is not correct.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added UT.

@turboFei turboFei changed the title [SPARK-33110][SQL] Support parse the sql statements with c-style comments [SPARK-33100][SQL] Support parse the sql statements with c-style comments Oct 9, 2020
@turboFei turboFei changed the title [SPARK-33100][SQL] Support parse the sql statements with c-style comments [SPARK-33100][SQL] Support parse the sql statements with C-style comments Oct 9, 2020
@maropu
Copy link
Member

maropu commented Oct 9, 2020

If a comment /* ... */ includes ;. it will throw an exception? The comment style is already supported, so the title and the description look confusing. I think this is just a bug.

@maropu
Copy link
Member

maropu commented Oct 9, 2020

ok to test

@SparkQA
Copy link

SparkQA commented Oct 9, 2020

Test build #129585 has finished for PR 29982 at commit 084fc37.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 9, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34190/

@SparkQA
Copy link

SparkQA commented Oct 9, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34190/

@turboFei turboFei changed the title [SPARK-33100][SQL] Support parse the sql statements with C-style comments [SPARK-33100][SQL] Fix issue when parsing the sql statements with C-style comments Oct 9, 2020
@turboFei turboFei changed the title [SPARK-33100][SQL] Fix issue when parsing the sql statements with C-style comments [SPARK-33100][SQL] Fix the issue when parsing the sql statements with C-style comments Oct 9, 2020
@turboFei
Copy link
Member Author

turboFei commented Oct 9, 2020

@maropu Thanks for your comments, I have modified the title.

@turboFei turboFei changed the title [SPARK-33100][SQL] Fix the issue when parsing the sql statements with C-style comments [SPARK-33100][SQL] Fix the issue when parsing sql statements with C-style comments Oct 9, 2020
@SparkQA
Copy link

SparkQA commented Oct 10, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34219/

@SparkQA
Copy link

SparkQA commented Oct 10, 2020

Test build #129618 has finished for PR 29982 at commit 23437dd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 10, 2020

Test build #129616 has finished for PR 29982 at commit 6b225ec.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 10, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34222/

@SparkQA
Copy link

SparkQA commented Oct 10, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34219/

@SparkQA
Copy link

SparkQA commented Oct 10, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34222/

@SparkQA
Copy link

SparkQA commented Oct 10, 2020

Test build #129621 has finished for PR 29982 at commit bc1bb37.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 10, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34225/

@SparkQA
Copy link

SparkQA commented Oct 10, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34225/

@maropu
Copy link
Member

maropu commented Jan 4, 2021

Thanks for fixing this, @turboFei. Looks fine cc: @HyukjinKwon @yaooqinn @wangyum

@SparkQA
Copy link

SparkQA commented Jan 5, 2021

Test build #133631 has finished for PR 29982 at commit f7f8030.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38220/

@SparkQA
Copy link

SparkQA commented Jan 5, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38220/

@maropu maropu closed this in a071826 Jan 5, 2021
maropu pushed a commit that referenced this pull request Jan 5, 2021
…park-sql

### What changes were proposed in this pull request?
Now the spark-sql does not support parse the sql statements with bracketed comments.
For the sql statements:
```
/* SELECT 'test'; */
SELECT 'test';
```
Would be split to two statements:
The first one: `/* SELECT 'test'`
The second one: `*/ SELECT 'test'`

Then it would throw an exception because the first one is illegal.
In this PR, we ignore the content in bracketed comments while splitting the sql statements.
Besides, we ignore the comment without any content.

### Why are the changes needed?
Spark-sql might split the statements inside bracketed comments and it is not correct.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added UT.

Closes #29982 from turboFei/SPARK-33110.

Lead-authored-by: fwang12 <fwang12@ebay.com>
Co-authored-by: turbofei <fwang12@ebay.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(cherry picked from commit a071826)
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
@maropu
Copy link
Member

maropu commented Jan 5, 2021

Many thanks, @turboFei and @yaooqinn ! Merged to master/3.1. FYI: @dongjoon-hyun @HyukjinKwon

@maropu
Copy link
Member

maropu commented Jan 5, 2021

@turboFei Could you open a PR to fix it for branch-3.0/2.4?

@turboFei
Copy link
Member Author

turboFei commented Jan 5, 2021

@turboFei Could you open a PR to fix it for branch-3.0/2.4?

sure

@turboFei turboFei deleted the SPARK-33110 branch January 5, 2021 08:12
turboFei added a commit to turboFei/spark that referenced this pull request Jan 5, 2021
…park-sql

Now the spark-sql does not support parse the sql statements with bracketed comments.
For the sql statements:
```
/* SELECT 'test'; */
SELECT 'test';
```
Would be split to two statements:
The first one: `/* SELECT 'test'`
The second one: `*/ SELECT 'test'`

Then it would throw an exception because the first one is illegal.
In this PR, we ignore the content in bracketed comments while splitting the sql statements.
Besides, we ignore the comment without any content.

Spark-sql might split the statements inside bracketed comments and it is not correct.

No.

Added UT.

Closes apache#29982 from turboFei/SPARK-33110.

Lead-authored-by: fwang12 <fwang12@ebay.com>
Co-authored-by: turbofei <fwang12@ebay.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
turboFei added a commit to turboFei/spark that referenced this pull request Jan 5, 2021
…park-sql

Now the spark-sql does not support parse the sql statements with bracketed comments.
For the sql statements:
```
/* SELECT 'test'; */
SELECT 'test';
```
Would be split to two statements:
The first one: `/* SELECT 'test'`
The second one: `*/ SELECT 'test'`

Then it would throw an exception because the first one is illegal.
In this PR, we ignore the content in bracketed comments while splitting the sql statements.
Besides, we ignore the comment without any content.

Spark-sql might split the statements inside bracketed comments and it is not correct.

No.

Added UT.

Closes apache#29982 from turboFei/SPARK-33110.

Lead-authored-by: fwang12 <fwang12@ebay.com>
Co-authored-by: turbofei <fwang12@ebay.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
@maropu
Copy link
Member

maropu commented Jan 5, 2021

@turboFei I found some GA flakiness caused by this commit, e.g.,
https://github.com/apache/spark/pull/31045/checks?check_run_id=1652972350
https://github.com/apache/spark/runs/1652975825?check_suite_focus=true

Could you check/fix it? FYI: @dongjoon-hyun @HyukjinKwon

@turboFei
Copy link
Member Author

turboFei commented Jan 6, 2021

oh, i will fix it today.

Should we ignore the comments during two ;?

for example,

/* comment*/; select * from test;

should be transfered asselect * from test;

Or I just remove the test case like that?

@maropu
Copy link
Member

maropu commented Jan 6, 2021

What's a root cause of the flakiness? It depends on the cause, I think.

@turboFei
Copy link
Member Author

turboFei commented Jan 6, 2021

These two statements only return one result.

Might the first statement contains an invalid statement /* SELECT 'test';*/ and does not return result.

image

@gatorsmile
Copy link
Member

CC @bogdanghit

@turboFei
Copy link
Member Author

turboFei commented Jan 6, 2021

there is a bug for statementBegin method.
For /* SELECT 'test';*/, the last character / would be treated as beginning of statements

@turboFei
Copy link
Member Author

turboFei commented Jan 6, 2021

create #31054 to fix this issue

maropu pushed a commit that referenced this pull request Jan 8, 2021
… in spark-sql

### What changes were proposed in this pull request?
Now the spark-sql does not support parse the sql statements with bracketed comments.
For the sql statements:
```
/* SELECT 'test'; */
SELECT 'test';
```
Would be split to two statements:
The first one: `/* SELECT 'test'`
The second one: `*/ SELECT 'test'`

Then it would throw an exception because the first one is illegal.
In this PR, we ignore the content in bracketed comments while splitting the sql statements.
Besides, we ignore the comment without any content.

NOTE: This backport comes from #29982

### Why are the changes needed?
Spark-sql might split the statements inside bracketed comments and it is not correct.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added UT.

Closes #31033 from turboFei/SPARK-33100.

Authored-by: fwang12 <fwang12@ebay.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
cloud-fan pushed a commit that referenced this pull request Dec 7, 2021
…kend

### What changes were proposed in this pull request?
In current spark-sql cli interface, if the end SQL is  not a close comment, the SQL won't be passed to backend engine and just ignored. This caused a problem that if user write a SQL with wrong comment. It's just ignored and won't throw exception.
For example:
```
spark-sql> /* This is a comment without end symbol SELECT 1;
spark-sql>
```

After this pr:
```
spark-sql> /* This is a comment without end symbol SELECT 1;
Error in query:
Unclosed bracketed comment(line 1, pos 0)

 == SQL ==
 /* This is a comment without end symbol SELECT 1;
 ^^^
```

In SPARK-33100 add this change #29982

Hive related code
https://github.com/apache/hive/blob/1090c93b1a02d480bdee2af2cecf503f8a54efc6/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java#L488-L490

### Why are the changes needed?
Exact exceptions are thrown for wrong statements, which is convenient for users to troubleshoot.

### Does this PR introduce _any_ user-facing change?
Yes, if user write a wrong comment in sql/sql file or query in the end.
Before it's just ignored since it's not a statement. Now it will be passed to backend engine and if the statement is not correct, it will throw SQL exception.

### How was this patch tested?
added UT and  test by handle.

```
spark-sql> /* SELECT /*+ HINT() 4; */;
Error in query:
mismatched input ';' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 26)

== SQL ==
/* SELECT /*+ HINT() 4; */;
--------------------------^^^

spark-sql> /* SELECT /*+ HINT() 4; */
         >         SELECT 1;
1
Time taken: 3.16 seconds, Fetched 1 row(s)
spark-sql> /* SELECT /*+ HINT() */ 4; */;
spark-sql>
         > ;
spark-sql>
         > /* SELECT /*+ HINT() 4\\;
         >         SELECT 1;
Error in query:
Unclosed bracketed comment(line 1, pos 0)

== SQL ==
/* SELECT /*+ HINT() 4\\;
^^^
        SELECT 1;

spark-sql>
```

Closes #34815 from AngersZhuuuu/SPARK-37555.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants