Skip to content

[SPARK-30822][SQL] Remove semicolon at the end of a sql query#27567

Closed
samredai wants to merge 1 commit intoapache:masterfrom
samredai:semicolon
Closed

[SPARK-30822][SQL] Remove semicolon at the end of a sql query#27567
samredai wants to merge 1 commit intoapache:masterfrom
samredai:semicolon

Conversation

@samredai
Copy link

@samredai samredai commented Feb 13, 2020

What changes were proposed in this pull request?

This change proposes ignoring a terminating semicolon from queries submitted by the user (if included) instead of raising a parse exception.

Why are the changes needed?

When a user submits a directly executable SQL statement terminated with a semicolon, they receive an org.apache.spark.sql.catalyst.parser.ParseException of extraneous input ';' expecting <EOF>. SQL-92 describes a direct SQL statement as having the format of <directly executable statement> <semicolon> and the majority of SQL implementations either require the semicolon as a statement terminator, or make it optional (meaning not raising an exception when it's included, seemingly in recognition that it's a common behavior).

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit test added to PlanParserSuite

sbt> project catalyst
sbt> testOnly *PlanParserSuite
[info] - case insensitive (565 milliseconds)
[info] - explain (9 milliseconds)
[info] - set operations (41 milliseconds)
[info] - common table expressions (31 milliseconds)
[info] - simple select query (47 milliseconds)
[info] - hive-style single-FROM statement (11 milliseconds)
[info] - multi select query (32 milliseconds)
[info] - query organization (41 milliseconds)
[info] - insert into (12 milliseconds)
[info] - aggregation (24 milliseconds)
[info] - limit (11 milliseconds)
[info] - window spec (11 milliseconds)
[info] - lateral view (17 milliseconds)
[info] - joins (62 milliseconds)
[info] - sampled relations (11 milliseconds)
[info] - sub-query (11 milliseconds)
[info] - scalar sub-query (9 milliseconds)
[info] - table reference (2 milliseconds)
[info] - table valued function (8 milliseconds)
[info] - SPARK-20311 range(N) as alias (2 milliseconds)
[info] - SPARK-20841 Support table column aliases in FROM clause (3 milliseconds)
[info] - SPARK-20962 Support subquery column aliases in FROM clause (4 milliseconds)
[info] - SPARK-20963 Support aliases for join relations in FROM clause (3 milliseconds)
[info] - inline table (23 milliseconds)
[info] - simple select query with !> and !< (5 milliseconds)
[info] - select hint syntax (34 milliseconds)
[info] - SPARK-20854: select hint syntax with expressions (12 milliseconds)
[info] - SPARK-20854: multiple hints (4 milliseconds)
[info] - TRIM function (16 milliseconds)
[info] - OVERLAY function (16 milliseconds)
[info] - precedence of set operations (18 milliseconds)
[info] - create/alter view as insert into table (4 milliseconds)
[info] - Invalid insert constructs in the query (10 milliseconds)
[info] - relation in v2 catalog (3 milliseconds)
[info] - CTE with column alias (2 milliseconds)
[info] - statement containing terminal semicolons (3 milliseconds)
[info] ScalaTest
[info] Run completed in 3 seconds, 129 milliseconds.
[info] Total number of tests run: 36
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 36, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 36, Failed 0, Errors 0, Passed 36

Current behavior:

scala

scala> val df = sql("select 1")
// df: org.apache.spark.sql.DataFrame = [1: int]
scala> df.show()
// +---+
// |  1|
// +---+
// |  1|
// +---+

scala> val df = sql("select 1;")
// org.apache.spark.sql.catalyst.parser.ParseException:
// extraneous input ';' expecting <EOF>(line 1, pos 8)

// == SQL ==
// select 1;
// --------^^^

//   at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:263)
//   at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:130)
//   at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:52)
//   at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:76)
//   at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:605)
//   at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
//   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:605)
//   ... 47 elided

pyspark

df = spark.sql('select 1')
df.show()
#+---+
#|  1|
#+---+
#|  1|
#+---+

df = spark.sql('select 1;')
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "/Users/ssetegne/spark/python/pyspark/sql/session.py", line 646, in sql
#     return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
#   File "/Users/ssetegne/spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1286, in # __call__
#   File "/Users/ssetegne/spark/python/pyspark/sql/utils.py", line 102, in deco
#     raise converted
# pyspark.sql.utils.ParseException: 
# extraneous input ';' expecting <EOF>(line 1, pos 8)

# == SQL ==
# select 1;
# --------^^^

Behavior after proposed fix:

scala

scala> val df = sql("select 1")
// df: org.apache.spark.sql.DataFrame = [1: int]
scala> df.show()
// +---+
// |  1|
// +---+
// |  1|
// +---+

scala> val df = sql("select 1;")
// df: org.apache.spark.sql.DataFrame = [1: int]
scala> df.show()
// +---+
// |  1|
// +---+
// |  1|
// +---+

pyspark

df = spark.sql('select 1')
df.show()
#+---+
#|    1 |
#+---+
#|    1 |
#+---+

df = spark.sql('select 1;')
df.show()
#+---+
#|    1 |
#+---+
#|    1 |
#+---+

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-30822] Remove semicolon at the end of a sql query [SPARK-30822][SQL] Remove semicolon at the end of a sql query Feb 20, 2020
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making a PR, @samsetegne .
BTW, if we want to do this, we need to be consistent in the other environments, Scala/Java/R.
Could you try to do this in Scala side instead? Then, Java/Python/R will follow it consistently.

@samredai
Copy link
Author

@dongjoon-hyun sure thing! I'll update the PR.

@maropu
Copy link
Member

maropu commented Feb 23, 2020

In database systems, a semicolon is a standard way to split multiple statements into each one. spark.sql does not support multiple statements now (https://issues.apache.org/jira/browse/SPARK-24260) though, is this support still useful?

On the other hand, in bin/spark-sql, we can use semicolons to input multiple queries;

spark-sql> select 1; select 1; select 1;
1
Time taken: 0.219 seconds, Fetched 1 row(s)
1
Time taken: 0.129 seconds, Fetched 1 row(s)
1
Time taken: 0.151 seconds, Fetched 1 row(s)

@samredai
Copy link
Author

@maropu I think the two issues are related but very much separate at the same time. There is the question of should spark.sql accept multiple statements separated by a semicolon? which is the discussion posed by [SPARK-24260]. However, even if the spark.sql API was to never accept multiple statements, the issue presented here would still remain; should spark.sql fail when a single valid SQL statement is provided with a terminal semicolon?

As examples, here are two popular MySQL clients for two different languages that do not accept multiple SQL statements yet do not fail when a single statement is provided with a terminal semicolon:
https://github.com/sidorares/node-mysql2
https://github.com/PyMySQL/PyMySQL

Even when using java.sql.DriverManager in scala, including the semicolon still makes for a successful query.

connection = DriverManager.getConnection(url, username, password)
val statement = connection.createStatement()
val results = statement.executeQuery("select 'Bojack' as horseman;")

@maropu
Copy link
Member

maropu commented Feb 23, 2020

Ah, I see. Do most JDBC clients ignore a semicolon in the end? Could you check the other JDBC client behaivours, too?

@samredai
Copy link
Author

Ah, I see. Do most JDBC clients ignore a semicolon in the end? Could you check the other JDBC client behaivours, too?

The acceptance of the terminating semicolon happens at the database layer so most (if not all) JDBC clients ship the query to the database with the semicolon included. Even if the client does not support multiple statements, the database does and so I imagine that a statement that looks like <statement>; would just be split by the dbms into <statement> and an empty string that's ignored.

Here's an example in the Hive CLI source code where a statement is split at the semicolon and each statement has a terminating semicolon appended to it. There's also some logic to prevent confusing ";" (quoted text containing a semicolon) for a statement terminating semicolon.
https://github.com/apache/hive/blob/master/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java#L405

Terminating semicolons are also commonly accepted by NoSQL dbms', i.e. Cassandra which identifies a CQL statement as: cql_statement ::= statement [ ';' ]
https://cassandra.apache.org/doc/latest/cql/definitions.html#statements

@maropu
Copy link
Member

maropu commented Feb 24, 2020

ok, so I'll accept the tests after some tests addded in this pr.

@samredai
Copy link
Author

@maropu Thanks! I added a unit test to SparkSqlParserSuite and updated the PR comment

@dongjoon-hyun
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Feb 26, 2020

Test build #118934 has finished for PR 27567 at commit 8798af0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 26, 2020

Test build #118936 has finished for PR 27567 at commit be1cc43.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we fix this in the SqlBase.g4 side? Then, add tests in PlanParserSuite?

Copy link
Author

@samredai samredai Feb 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that would actually be much cleaner and is the more appropriate place for the fix. What do you think about adding the quantifier as (';')*? to allow for any number of semicolons, i.e.

sql("select name from people;; ;; ; ")

It goes beyond handling the accidental additional semicolon but I don't see why any number of semicolons at the end would make a query ambiguous.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu I updated the PR as described including some tests, let me know what you think. To expand on the example in my earlier reply:

// Below query is understood to be a single statement
sql("select 1, 2, 3;; ;; ; ")
res1: org.apache.spark.sql.DataFrame = [1: int, 2: int ... 1 more field]

// Below multi-statement query appropriately fails
sql("select 1, 2, 3;; ;; select 4, 5, 6; ")
org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input 'select' expecting {<EOF>, ';'}(line 1, pos 20)

// == SQL ==
// select 1, 2, 3;; ;; select 4, 5, 6
// --------------------^^^

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this new behaviour? That looks fine to me though. @cloud-fan @dongjoon-hyun @HyukjinKwon

@SparkQA
Copy link

SparkQA commented Feb 26, 2020

Test build #118991 has finished for PR 27567 at commit 29eb531.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 27, 2020

Test build #119004 has finished for PR 27567 at commit a67a310.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 27, 2020

Test build #119013 has finished for PR 27567 at commit 4bca772.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@samredai
Copy link
Author

All of the unit tests passed for this but it looks like there was some intermittent connection issue while installing R.

@maropu
Copy link
Member

maropu commented Feb 27, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Feb 27, 2020

Test build #119033 has finished for PR 27567 at commit 4bca772.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Feb 27, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Feb 28, 2020

Test build #119055 has finished for PR 27567 at commit 4bca772.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should a single statement support multiple ;?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan I would expect the most common scenario would be the user unintentionally submitted an extra semicolon. Do you think the query should fail in that case or is the intention so obvious that it's essentially explicit? In cases where multiple statements are allowed i.e. spark-sql, would the additional semicolons just equate to empty statements that are ignored or would something else happen that could have an effect on performance?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fine.

BTW should ';'* be good enough? * means 0 or more occurrence.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @cloud-fan you're right, I updated this and also squashed all the commits.

…icolons in SQL statements

When a user submits a sql query that is terminated with a semicolon, currently they are met with an `org.apache.spark.sql.catalyst.parser.ParseException` of `extraneous input ';' expecting <EOF>`. This fixes this by updating the ANTLR grammar to allow any number of consecutive terminating semicolons for a SQL statement.

Added tests to PlanParserSuite for terminal semicolons. For `describe-query.sql`, `grouping_set.sql`, `interval.sql`, and DDLParserSuite the changes to the grammar rules for singleStatement require the portion of the exception message that reads "...expecting [<EOF>]..." to be updated to "...expecting [{<EOF>, ';'}]...".
@SparkQA
Copy link

SparkQA commented Mar 24, 2020

Test build #120220 has finished for PR 27567 at commit 22d9ee4.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Mar 24, 2020

Test build #120268 has finished for PR 27567 at commit 22d9ee4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan closed this in 44431d4 Mar 25, 2020
@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

@dongjoon-hyun
Copy link
Member

Late LGTM. Thank you all.

cloud-fan pushed a commit that referenced this pull request Mar 25, 2020
# What changes were proposed in this pull request?
This change proposes ignoring a terminating semicolon from queries submitted by the user (if included) instead of raising a parse exception.

# Why are the changes needed?
When a user submits a directly executable SQL statement terminated with a semicolon, they receive an `org.apache.spark.sql.catalyst.parser.ParseException` of `extraneous input ';' expecting <EOF>`. SQL-92 describes a direct SQL statement as having the format of `<directly executable statement> <semicolon>` and the majority of SQL implementations either require the semicolon as a statement terminator, or make it optional (meaning not raising an exception when it's included, seemingly in recognition that it's a common behavior).

# Does this PR introduce any user-facing change?
No

# How was this patch tested?
Unit test added to `PlanParserSuite`
```
sbt> project catalyst
sbt> testOnly *PlanParserSuite
[info] - case insensitive (565 milliseconds)
[info] - explain (9 milliseconds)
[info] - set operations (41 milliseconds)
[info] - common table expressions (31 milliseconds)
[info] - simple select query (47 milliseconds)
[info] - hive-style single-FROM statement (11 milliseconds)
[info] - multi select query (32 milliseconds)
[info] - query organization (41 milliseconds)
[info] - insert into (12 milliseconds)
[info] - aggregation (24 milliseconds)
[info] - limit (11 milliseconds)
[info] - window spec (11 milliseconds)
[info] - lateral view (17 milliseconds)
[info] - joins (62 milliseconds)
[info] - sampled relations (11 milliseconds)
[info] - sub-query (11 milliseconds)
[info] - scalar sub-query (9 milliseconds)
[info] - table reference (2 milliseconds)
[info] - table valued function (8 milliseconds)
[info] - SPARK-20311 range(N) as alias (2 milliseconds)
[info] - SPARK-20841 Support table column aliases in FROM clause (3 milliseconds)
[info] - SPARK-20962 Support subquery column aliases in FROM clause (4 milliseconds)
[info] - SPARK-20963 Support aliases for join relations in FROM clause (3 milliseconds)
[info] - inline table (23 milliseconds)
[info] - simple select query with !> and !< (5 milliseconds)
[info] - select hint syntax (34 milliseconds)
[info] - SPARK-20854: select hint syntax with expressions (12 milliseconds)
[info] - SPARK-20854: multiple hints (4 milliseconds)
[info] - TRIM function (16 milliseconds)
[info] - OVERLAY function (16 milliseconds)
[info] - precedence of set operations (18 milliseconds)
[info] - create/alter view as insert into table (4 milliseconds)
[info] - Invalid insert constructs in the query (10 milliseconds)
[info] - relation in v2 catalog (3 milliseconds)
[info] - CTE with column alias (2 milliseconds)
[info] - statement containing terminal semicolons (3 milliseconds)
[info] ScalaTest
[info] Run completed in 3 seconds, 129 milliseconds.
[info] Total number of tests run: 36
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 36, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 36, Failed 0, Errors 0, Passed 36
```

### Current behavior:
#### scala
```scala
scala> val df = sql("select 1")
// df: org.apache.spark.sql.DataFrame = [1: int]
scala> df.show()
// +---+
// |  1|
// +---+
// |  1|
// +---+

scala> val df = sql("select 1;")
// org.apache.spark.sql.catalyst.parser.ParseException:
// extraneous input ';' expecting <EOF>(line 1, pos 8)

// == SQL ==
// select 1;
// --------^^^

//   at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:263)
//   at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:130)
//   at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:52)
//   at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:76)
//   at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:605)
//   at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
//   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:605)
//   ... 47 elided
```
#### pyspark
```python
df = spark.sql('select 1')
df.show()
#+---+
#|  1|
#+---+
#|  1|
#+---+

df = spark.sql('select 1;')
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "/Users/ssetegne/spark/python/pyspark/sql/session.py", line 646, in sql
#     return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
#   File "/Users/ssetegne/spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1286, in # __call__
#   File "/Users/ssetegne/spark/python/pyspark/sql/utils.py", line 102, in deco
#     raise converted
# pyspark.sql.utils.ParseException:
# extraneous input ';' expecting <EOF>(line 1, pos 8)

# == SQL ==
# select 1;
# --------^^^
```

### Behavior after proposed fix:
#### scala
```scala
scala> val df = sql("select 1")
// df: org.apache.spark.sql.DataFrame = [1: int]
scala> df.show()
// +---+
// |  1|
// +---+
// |  1|
// +---+

scala> val df = sql("select 1;")
// df: org.apache.spark.sql.DataFrame = [1: int]
scala> df.show()
// +---+
// |  1|
// +---+
// |  1|
// +---+
```
#### pyspark
```python
df = spark.sql('select 1')
df.show()
#+---+
#|    1 |
#+---+
#|    1 |
#+---+

df = spark.sql('select 1;')
df.show()
#+---+
#|    1 |
#+---+
#|    1 |
#+---+
```

Closes #27567 from samsetegne/semicolon.

Authored-by: samsetegne <samuelsetegne@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 44431d4)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
# What changes were proposed in this pull request?
This change proposes ignoring a terminating semicolon from queries submitted by the user (if included) instead of raising a parse exception.

# Why are the changes needed?
When a user submits a directly executable SQL statement terminated with a semicolon, they receive an `org.apache.spark.sql.catalyst.parser.ParseException` of `extraneous input ';' expecting <EOF>`. SQL-92 describes a direct SQL statement as having the format of `<directly executable statement> <semicolon>` and the majority of SQL implementations either require the semicolon as a statement terminator, or make it optional (meaning not raising an exception when it's included, seemingly in recognition that it's a common behavior).

# Does this PR introduce any user-facing change?
No

# How was this patch tested?
Unit test added to `PlanParserSuite`
```
sbt> project catalyst
sbt> testOnly *PlanParserSuite
[info] - case insensitive (565 milliseconds)
[info] - explain (9 milliseconds)
[info] - set operations (41 milliseconds)
[info] - common table expressions (31 milliseconds)
[info] - simple select query (47 milliseconds)
[info] - hive-style single-FROM statement (11 milliseconds)
[info] - multi select query (32 milliseconds)
[info] - query organization (41 milliseconds)
[info] - insert into (12 milliseconds)
[info] - aggregation (24 milliseconds)
[info] - limit (11 milliseconds)
[info] - window spec (11 milliseconds)
[info] - lateral view (17 milliseconds)
[info] - joins (62 milliseconds)
[info] - sampled relations (11 milliseconds)
[info] - sub-query (11 milliseconds)
[info] - scalar sub-query (9 milliseconds)
[info] - table reference (2 milliseconds)
[info] - table valued function (8 milliseconds)
[info] - SPARK-20311 range(N) as alias (2 milliseconds)
[info] - SPARK-20841 Support table column aliases in FROM clause (3 milliseconds)
[info] - SPARK-20962 Support subquery column aliases in FROM clause (4 milliseconds)
[info] - SPARK-20963 Support aliases for join relations in FROM clause (3 milliseconds)
[info] - inline table (23 milliseconds)
[info] - simple select query with !> and !< (5 milliseconds)
[info] - select hint syntax (34 milliseconds)
[info] - SPARK-20854: select hint syntax with expressions (12 milliseconds)
[info] - SPARK-20854: multiple hints (4 milliseconds)
[info] - TRIM function (16 milliseconds)
[info] - OVERLAY function (16 milliseconds)
[info] - precedence of set operations (18 milliseconds)
[info] - create/alter view as insert into table (4 milliseconds)
[info] - Invalid insert constructs in the query (10 milliseconds)
[info] - relation in v2 catalog (3 milliseconds)
[info] - CTE with column alias (2 milliseconds)
[info] - statement containing terminal semicolons (3 milliseconds)
[info] ScalaTest
[info] Run completed in 3 seconds, 129 milliseconds.
[info] Total number of tests run: 36
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 36, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 36, Failed 0, Errors 0, Passed 36
```

### Current behavior:
#### scala
```scala
scala> val df = sql("select 1")
// df: org.apache.spark.sql.DataFrame = [1: int]
scala> df.show()
// +---+
// |  1|
// +---+
// |  1|
// +---+

scala> val df = sql("select 1;")
// org.apache.spark.sql.catalyst.parser.ParseException:
// extraneous input ';' expecting <EOF>(line 1, pos 8)

// == SQL ==
// select 1;
// --------^^^

//   at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:263)
//   at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:130)
//   at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:52)
//   at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:76)
//   at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:605)
//   at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
//   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:605)
//   ... 47 elided
```
#### pyspark
```python
df = spark.sql('select 1')
df.show()
#+---+
#|  1|
#+---+
#|  1|
#+---+

df = spark.sql('select 1;')
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "/Users/ssetegne/spark/python/pyspark/sql/session.py", line 646, in sql
#     return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
#   File "/Users/ssetegne/spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1286, in # __call__
#   File "/Users/ssetegne/spark/python/pyspark/sql/utils.py", line 102, in deco
#     raise converted
# pyspark.sql.utils.ParseException:
# extraneous input ';' expecting <EOF>(line 1, pos 8)

# == SQL ==
# select 1;
# --------^^^
```

### Behavior after proposed fix:
#### scala
```scala
scala> val df = sql("select 1")
// df: org.apache.spark.sql.DataFrame = [1: int]
scala> df.show()
// +---+
// |  1|
// +---+
// |  1|
// +---+

scala> val df = sql("select 1;")
// df: org.apache.spark.sql.DataFrame = [1: int]
scala> df.show()
// +---+
// |  1|
// +---+
// |  1|
// +---+
```
#### pyspark
```python
df = spark.sql('select 1')
df.show()
#+---+
#|    1 |
#+---+
#|    1 |
#+---+

df = spark.sql('select 1;')
df.show()
#+---+
#|    1 |
#+---+
#|    1 |
#+---+
```

Closes apache#27567 from samsetegne/semicolon.

Authored-by: samsetegne <samuelsetegne@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants