Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-30049][SQL] SQL fails to parse when comment contains an unmatched quote character. #27321

Closed
wants to merge 4 commits into from

Conversation

javierivanov
Copy link
Contributor

What changes were proposed in this pull request?

A SQL statement that contains a comment with an unmatched quote character can lead to a parse error:

  • Added a insideComment flag in the splitter method to avoid checking single and double quotes within a comment:
spark-sql> SELECT 1 -- someone's comment here
         > ;
Error in query: 
extraneous input ';' expecting <EOF>(line 2, pos 0)

== SQL ==
SELECT 1 -- someone's comment here
;
^^^

Why are the changes needed?

This misbehaviour was not present on previous spark versions.

Does this PR introduce any user-facing change?

  • No

How was this patch tested?

  • New tests were added.

@javierivanov javierivanov changed the title SQL fails to parse when comment contains an unmatched quote character. [SPARK-30049][SQL] SQL fails to parse when comment contains an unmatched quote character. Jan 22, 2020
@dongjoon-hyun
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Jan 22, 2020

Test build #117249 has finished for PR 27321 at commit 6695496.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

cc @wangyum FYI

@tgravescs
Copy link
Contributor

I haven't had time to test your patch but did you test this with just the spark-shell and spark.sql(""" """) syntax as well? I didn't think that went through the SparkSQLCliDriver and that case also fails.

@kiszk
Copy link
Member

kiszk commented Feb 3, 2020

ping @javierivanov

@javierivanov
Copy link
Contributor Author

Sorry for my delay here...
spark.sql will parse only 1 SQL statement, hence the ; is not accepted by SQL function.

scala> sql("""SELECT 1;""")
org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input ';' expecting <EOF>(line 1, pos 8)

== SQL ==
SELECT 1;
--------^^^

In this case the ; is being catch in the comment.

scala> sql("""SELECT 1 -- someone's comment here;""")
res1: org.apache.spark.sql.DataFrame = [1: int]

The SQLDriver is parsing SQLStatements into a list, and foreach calls the sql context to interpret each.

This was failing because the parser was still detecting quotes inside of comments and passed the sql statement like this:

scala> sql("""SELECT 1 -- someone's comment here
     | ;""")
org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input ';' expecting <EOF>(line 2, pos 0)

== SQL ==
SELECT 1 -- someone's comment here
;
^^^

Let me know for any comments 👍

@maropu
Copy link
Member

maropu commented Feb 5, 2020

yea, right. Spark doesnt support multi-line queries in spark.sql: (related jira: https://issues.apache.org/jira/browse/SPARK-24260)

|;""".stripMargin -> "testcomment"
)
}

Copy link
Member

@maropu maropu Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you avoid this unnecessary blank?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu Not sure, which blank?

@tgravescs
Copy link
Contributor

yea, right. Spark doesnt support multi-line queries in spark.sql: (related jira: https://issues.apache.org/jira/browse/SPARK-24260)

ah ok, that makes sense, thanks

@SparkQA
Copy link

SparkQA commented Feb 6, 2020

Test build #117997 has finished for PR 27321 at commit 074a943.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Feb 7, 2020

ping @wangyum

@SparkQA
Copy link

SparkQA commented Feb 26, 2020

Test build #118967 has finished for PR 27321 at commit c4924a7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

@wangyum any comments?

@asfgit asfgit closed this in 3ff2135 Mar 3, 2020
asfgit pushed a commit that referenced this pull request Mar 3, 2020
…hed quote character

### What changes were proposed in this pull request?

A SQL statement that contains a comment with an unmatched quote character can lead to a parse error:
- Added a insideComment flag in the splitter method to avoid checking single and double quotes within a comment:
```
spark-sql> SELECT 1 -- someone's comment here
         > ;
Error in query:
extraneous input ';' expecting <EOF>(line 2, pos 0)

== SQL ==
SELECT 1 -- someone's comment here
;
^^^
```

### Why are the changes needed?

This misbehaviour was not present on previous spark versions.

### Does this PR introduce any user-facing change?

- No

### How was this patch tested?

- New tests were added.

Closes #27321 from javierivanov/SPARK-30049B.

Lead-authored-by: Javier <jfuentes@hortonworks.com>
Co-authored-by: Javier Fuentes <j.fuentes.m@icloud.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
(cherry picked from commit 3ff2135)
Signed-off-by: Thomas Graves <tgraves@apache.org>
@tgravescs
Copy link
Contributor

merged to master and branch-3.0, thanks @javierivanov

@gatorsmile
Copy link
Member

cc @juliuszsompolski

@dongjoon-hyun
Copy link
Member

Hi, All.
It seems that https://issues.apache.org/jira/browse/SPARK-31102 is reported due to this by @wangyum . Could you take a look?

sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…hed quote character

### What changes were proposed in this pull request?

A SQL statement that contains a comment with an unmatched quote character can lead to a parse error:
- Added a insideComment flag in the splitter method to avoid checking single and double quotes within a comment:
```
spark-sql> SELECT 1 -- someone's comment here
         > ;
Error in query:
extraneous input ';' expecting <EOF>(line 2, pos 0)

== SQL ==
SELECT 1 -- someone's comment here
;
^^^
```

### Why are the changes needed?

This misbehaviour was not present on previous spark versions.

### Does this PR introduce any user-facing change?

- No

### How was this patch tested?

- New tests were added.

Closes apache#27321 from javierivanov/SPARK-30049B.

Lead-authored-by: Javier <jfuentes@hortonworks.com>
Co-authored-by: Javier Fuentes <j.fuentes.m@icloud.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
turboFei pushed a commit to turboFei/spark that referenced this pull request Jan 5, 2021
…hed quote character

A SQL statement that contains a comment with an unmatched quote character can lead to a parse error:
- Added a insideComment flag in the splitter method to avoid checking single and double quotes within a comment:
```
spark-sql> SELECT 1 -- someone's comment here
         > ;
Error in query:
extraneous input ';' expecting <EOF>(line 2, pos 0)

== SQL ==
SELECT 1 -- someone's comment here
;
^^^
```

This misbehaviour was not present on previous spark versions.

- No

- New tests were added.

Closes apache#27321 from javierivanov/SPARK-30049B.

Lead-authored-by: Javier <jfuentes@hortonworks.com>
Co-authored-by: Javier Fuentes <j.fuentes.m@icloud.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants