[SPARK-30822][SQL] Remove semicolon at the end of a sql query by samredai · Pull Request #27567 · apache/spark

samredai · 2020-02-13T20:55:24Z

What changes were proposed in this pull request?

This change proposes ignoring a terminating semicolon from queries submitted by the user (if included) instead of raising a parse exception.

Why are the changes needed?

When a user submits a directly executable SQL statement terminated with a semicolon, they receive an org.apache.spark.sql.catalyst.parser.ParseException of extraneous input ';' expecting <EOF>. SQL-92 describes a direct SQL statement as having the format of <directly executable statement> <semicolon> and the majority of SQL implementations either require the semicolon as a statement terminator, or make it optional (meaning not raising an exception when it's included, seemingly in recognition that it's a common behavior).

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit test added to PlanParserSuite

sbt> project catalyst
sbt> testOnly *PlanParserSuite
[info] - case insensitive (565 milliseconds)
[info] - explain (9 milliseconds)
[info] - set operations (41 milliseconds)
[info] - common table expressions (31 milliseconds)
[info] - simple select query (47 milliseconds)
[info] - hive-style single-FROM statement (11 milliseconds)
[info] - multi select query (32 milliseconds)
[info] - query organization (41 milliseconds)
[info] - insert into (12 milliseconds)
[info] - aggregation (24 milliseconds)
[info] - limit (11 milliseconds)
[info] - window spec (11 milliseconds)
[info] - lateral view (17 milliseconds)
[info] - joins (62 milliseconds)
[info] - sampled relations (11 milliseconds)
[info] - sub-query (11 milliseconds)
[info] - scalar sub-query (9 milliseconds)
[info] - table reference (2 milliseconds)
[info] - table valued function (8 milliseconds)
[info] - SPARK-20311 range(N) as alias (2 milliseconds)
[info] - SPARK-20841 Support table column aliases in FROM clause (3 milliseconds)
[info] - SPARK-20962 Support subquery column aliases in FROM clause (4 milliseconds)
[info] - SPARK-20963 Support aliases for join relations in FROM clause (3 milliseconds)
[info] - inline table (23 milliseconds)
[info] - simple select query with !> and !< (5 milliseconds)
[info] - select hint syntax (34 milliseconds)
[info] - SPARK-20854: select hint syntax with expressions (12 milliseconds)
[info] - SPARK-20854: multiple hints (4 milliseconds)
[info] - TRIM function (16 milliseconds)
[info] - OVERLAY function (16 milliseconds)
[info] - precedence of set operations (18 milliseconds)
[info] - create/alter view as insert into table (4 milliseconds)
[info] - Invalid insert constructs in the query (10 milliseconds)
[info] - relation in v2 catalog (3 milliseconds)
[info] - CTE with column alias (2 milliseconds)
[info] - statement containing terminal semicolons (3 milliseconds)
[info] ScalaTest
[info] Run completed in 3 seconds, 129 milliseconds.
[info] Total number of tests run: 36
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 36, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 36, Failed 0, Errors 0, Passed 36

Current behavior:

scala

scala> val df = sql("select 1")
// df: org.apache.spark.sql.DataFrame = [1: int]
scala> df.show()
// +---+
// |  1|
// +---+
// |  1|
// +---+

scala> val df = sql("select 1;")
// org.apache.spark.sql.catalyst.parser.ParseException:
// extraneous input ';' expecting <EOF>(line 1, pos 8)

// == SQL ==
// select 1;
// --------^^^

//   at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:263)
//   at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:130)
//   at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:52)
//   at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:76)
//   at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:605)
//   at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
//   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:605)
//   ... 47 elided

pyspark

df = spark.sql('select 1')
df.show()
#+---+
#|  1|
#+---+
#|  1|
#+---+

df = spark.sql('select 1;')
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "/Users/ssetegne/spark/python/pyspark/sql/session.py", line 646, in sql
#     return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
#   File "/Users/ssetegne/spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1286, in # __call__
#   File "/Users/ssetegne/spark/python/pyspark/sql/utils.py", line 102, in deco
#     raise converted
# pyspark.sql.utils.ParseException: 
# extraneous input ';' expecting <EOF>(line 1, pos 8)

# == SQL ==
# select 1;
# --------^^^

Behavior after proposed fix:

scala

scala> val df = sql("select 1")
// df: org.apache.spark.sql.DataFrame = [1: int]
scala> df.show()
// +---+
// |  1|
// +---+
// |  1|
// +---+

scala> val df = sql("select 1;")
// df: org.apache.spark.sql.DataFrame = [1: int]
scala> df.show()
// +---+
// |  1|
// +---+
// |  1|
// +---+

pyspark

df = spark.sql('select 1')
df.show()
#+---+
#|    1 |
#+---+
#|    1 |
#+---+

df = spark.sql('select 1;')
df.show()
#+---+
#|    1 |
#+---+
#|    1 |
#+---+

dongjoon-hyun

Thank you for making a PR, @samsetegne .
BTW, if we want to do this, we need to be consistent in the other environments, Scala/Java/R.
Could you try to do this in Scala side instead? Then, Java/Python/R will follow it consistently.

samredai · 2020-02-22T03:27:21Z

@dongjoon-hyun sure thing! I'll update the PR.

maropu · 2020-02-23T08:39:03Z

In database systems, a semicolon is a standard way to split multiple statements into each one. spark.sql does not support multiple statements now (https://issues.apache.org/jira/browse/SPARK-24260) though, is this support still useful?

On the other hand, in bin/spark-sql, we can use semicolons to input multiple queries;

spark-sql> select 1; select 1; select 1;
1
Time taken: 0.219 seconds, Fetched 1 row(s)
1
Time taken: 0.129 seconds, Fetched 1 row(s)
1
Time taken: 0.151 seconds, Fetched 1 row(s)

samredai · 2020-02-23T09:54:07Z

@maropu I think the two issues are related but very much separate at the same time. There is the question of should spark.sql accept multiple statements separated by a semicolon? which is the discussion posed by [SPARK-24260]. However, even if the spark.sql API was to never accept multiple statements, the issue presented here would still remain; should spark.sql fail when a single valid SQL statement is provided with a terminal semicolon?

As examples, here are two popular MySQL clients for two different languages that do not accept multiple SQL statements yet do not fail when a single statement is provided with a terminal semicolon:
https://github.com/sidorares/node-mysql2
https://github.com/PyMySQL/PyMySQL

Even when using java.sql.DriverManager in scala, including the semicolon still makes for a successful query.

connection = DriverManager.getConnection(url, username, password)
val statement = connection.createStatement()
val results = statement.executeQuery("select 'Bojack' as horseman;")

maropu · 2020-02-23T11:59:36Z

Ah, I see. Do most JDBC clients ignore a semicolon in the end? Could you check the other JDBC client behaivours, too?

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

samredai · 2020-02-23T17:54:15Z

Ah, I see. Do most JDBC clients ignore a semicolon in the end? Could you check the other JDBC client behaivours, too?

The acceptance of the terminating semicolon happens at the database layer so most (if not all) JDBC clients ship the query to the database with the semicolon included. Even if the client does not support multiple statements, the database does and so I imagine that a statement that looks like <statement>; would just be split by the dbms into <statement> and an empty string that's ignored.

Here's an example in the Hive CLI source code where a statement is split at the semicolon and each statement has a terminating semicolon appended to it. There's also some logic to prevent confusing ";" (quoted text containing a semicolon) for a statement terminating semicolon.
https://github.com/apache/hive/blob/master/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java#L405

Terminating semicolons are also commonly accepted by NoSQL dbms', i.e. Cassandra which identifies a CQL statement as: cql_statement ::= statement [ ';' ]
https://cassandra.apache.org/doc/latest/cql/definitions.html#statements

maropu · 2020-02-24T01:21:29Z

ok, so I'll accept the tests after some tests addded in this pr.

samredai · 2020-02-24T11:45:24Z

@maropu Thanks! I added a unit test to SparkSqlParserSuite and updated the PR comment

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

dongjoon-hyun · 2020-02-25T23:00:43Z

ok to test

sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala

SparkQA · 2020-02-26T03:51:37Z

Test build #118934 has finished for PR 27567 at commit 8798af0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-02-26T04:21:23Z

Test build #118936 has finished for PR 27567 at commit be1cc43.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-02-26T07:55:09Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

Can we fix this in the SqlBase.g4 side? Then, add tests in PlanParserSuite?

Yeah that would actually be much cleaner and is the more appropriate place for the fix. What do you think about adding the quantifier as (';')*? to allow for any number of semicolons, i.e.

sql("select name from people;; ;; ; ")

It goes beyond handling the accidental additional semicolon but I don't see why any number of semicolons at the end would make a query ambiguous.

@maropu I updated the PR as described including some tests, let me know what you think. To expand on the example in my earlier reply:

// Below query is understood to be a single statement sql("select 1, 2, 3;; ;; ; ") res1: org.apache.spark.sql.DataFrame = [1: int, 2: int ... 1 more field] // Below multi-statement query appropriately fails sql("select 1, 2, 3;; ;; select 4, 5, 6; ") org.apache.spark.sql.catalyst.parser.ParseException: extraneous input 'select' expecting {<EOF>, ';'}(line 1, pos 20) // == SQL == // select 1, 2, 3;; ;; select 4, 5, 6 // --------------------^^^

How about this new behaviour? That looks fine to me though. @cloud-fan @dongjoon-hyun @HyukjinKwon

SparkQA · 2020-02-26T21:53:02Z

Test build #118991 has finished for PR 27567 at commit 29eb531.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-02-27T04:34:25Z

Test build #119004 has finished for PR 27567 at commit a67a310.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-02-27T08:05:02Z

Test build #119013 has finished for PR 27567 at commit 4bca772.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

samredai · 2020-02-27T09:51:56Z

All of the unit tests passed for this but it looks like there was some intermittent connection issue while installing R.

maropu · 2020-02-27T13:28:57Z

retest this please

SparkQA · 2020-02-27T19:36:37Z

Test build #119033 has finished for PR 27567 at commit 4bca772.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-02-27T22:29:15Z

retest this please

SparkQA · 2020-02-28T03:11:35Z

Test build #119055 has finished for PR 27567 at commit 4bca772.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-03-05T11:29:02Z

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

should a single statement support multiple ;?

@cloud-fan I would expect the most common scenario would be the user unintentionally submitted an extra semicolon. Do you think the query should fail in that case or is the intention so obvious that it's essentially explicit? In cases where multiple statements are allowed i.e. spark-sql, would the additional semicolons just equate to empty statements that are ignored or would something else happen that could have an effect on performance?

seems fine.

BTW should ';'* be good enough? * means 0 or more occurrence.

Yes @cloud-fan you're right, I updated this and also squashed all the commits.

…icolons in SQL statements When a user submits a sql query that is terminated with a semicolon, currently they are met with an `org.apache.spark.sql.catalyst.parser.ParseException` of `extraneous input ';' expecting <EOF>`. This fixes this by updating the ANTLR grammar to allow any number of consecutive terminating semicolons for a SQL statement. Added tests to PlanParserSuite for terminal semicolons. For `describe-query.sql`, `grouping_set.sql`, `interval.sql`, and DDLParserSuite the changes to the grammar rules for singleStatement require the portion of the exception message that reads "...expecting [<EOF>]..." to be updated to "...expecting [{<EOF>, ';'}]...".

SparkQA · 2020-03-24T00:44:31Z

Test build #120220 has finished for PR 27567 at commit 22d9ee4.

This patch fails PySpark pip packaging tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-03-24T10:42:36Z

retest this please

SparkQA · 2020-03-24T15:54:55Z

Test build #120268 has finished for PR 27567 at commit 22d9ee4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-03-25T07:00:49Z

thanks, merging to master/3.0!

dongjoon-hyun · 2020-03-25T07:12:53Z

Late LGTM. Thank you all.

# What changes were proposed in this pull request? This change proposes ignoring a terminating semicolon from queries submitted by the user (if included) instead of raising a parse exception. # Why are the changes needed? When a user submits a directly executable SQL statement terminated with a semicolon, they receive an `org.apache.spark.sql.catalyst.parser.ParseException` of `extraneous input ';' expecting <EOF>`. SQL-92 describes a direct SQL statement as having the format of `<directly executable statement> <semicolon>` and the majority of SQL implementations either require the semicolon as a statement terminator, or make it optional (meaning not raising an exception when it's included, seemingly in recognition that it's a common behavior). # Does this PR introduce any user-facing change? No # How was this patch tested? Unit test added to `PlanParserSuite` ``` sbt> project catalyst sbt> testOnly *PlanParserSuite [info] - case insensitive (565 milliseconds) [info] - explain (9 milliseconds) [info] - set operations (41 milliseconds) [info] - common table expressions (31 milliseconds) [info] - simple select query (47 milliseconds) [info] - hive-style single-FROM statement (11 milliseconds) [info] - multi select query (32 milliseconds) [info] - query organization (41 milliseconds) [info] - insert into (12 milliseconds) [info] - aggregation (24 milliseconds) [info] - limit (11 milliseconds) [info] - window spec (11 milliseconds) [info] - lateral view (17 milliseconds) [info] - joins (62 milliseconds) [info] - sampled relations (11 milliseconds) [info] - sub-query (11 milliseconds) [info] - scalar sub-query (9 milliseconds) [info] - table reference (2 milliseconds) [info] - table valued function (8 milliseconds) [info] - SPARK-20311 range(N) as alias (2 milliseconds) [info] - SPARK-20841 Support table column aliases in FROM clause (3 milliseconds) [info] - SPARK-20962 Support subquery column aliases in FROM clause (4 milliseconds) [info] - SPARK-20963 Support aliases for join relations in FROM clause (3 milliseconds) [info] - inline table (23 milliseconds) [info] - simple select query with !> and !< (5 milliseconds) [info] - select hint syntax (34 milliseconds) [info] - SPARK-20854: select hint syntax with expressions (12 milliseconds) [info] - SPARK-20854: multiple hints (4 milliseconds) [info] - TRIM function (16 milliseconds) [info] - OVERLAY function (16 milliseconds) [info] - precedence of set operations (18 milliseconds) [info] - create/alter view as insert into table (4 milliseconds) [info] - Invalid insert constructs in the query (10 milliseconds) [info] - relation in v2 catalog (3 milliseconds) [info] - CTE with column alias (2 milliseconds) [info] - statement containing terminal semicolons (3 milliseconds) [info] ScalaTest [info] Run completed in 3 seconds, 129 milliseconds. [info] Total number of tests run: 36 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 36, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [info] Passed: Total 36, Failed 0, Errors 0, Passed 36 ``` ### Current behavior: #### scala ```scala scala> val df = sql("select 1") // df: org.apache.spark.sql.DataFrame = [1: int] scala> df.show() // +---+ // | 1| // +---+ // | 1| // +---+ scala> val df = sql("select 1;") // org.apache.spark.sql.catalyst.parser.ParseException: // extraneous input ';' expecting <EOF>(line 1, pos 8) // == SQL == // select 1; // --------^^^ // at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:263) // at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:130) // at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:52) // at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:76) // at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:605) // at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) // at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:605) // ... 47 elided ``` #### pyspark ```python df = spark.sql('select 1') df.show() #+---+ #| 1| #+---+ #| 1| #+---+ df = spark.sql('select 1;') # Traceback (most recent call last): # File "<stdin>", line 1, in <module> # File "/Users/ssetegne/spark/python/pyspark/sql/session.py", line 646, in sql # return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) # File "/Users/ssetegne/spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1286, in # __call__ # File "/Users/ssetegne/spark/python/pyspark/sql/utils.py", line 102, in deco # raise converted # pyspark.sql.utils.ParseException: # extraneous input ';' expecting <EOF>(line 1, pos 8) # == SQL == # select 1; # --------^^^ ``` ### Behavior after proposed fix: #### scala ```scala scala> val df = sql("select 1") // df: org.apache.spark.sql.DataFrame = [1: int] scala> df.show() // +---+ // | 1| // +---+ // | 1| // +---+ scala> val df = sql("select 1;") // df: org.apache.spark.sql.DataFrame = [1: int] scala> df.show() // +---+ // | 1| // +---+ // | 1| // +---+ ``` #### pyspark ```python df = spark.sql('select 1') df.show() #+---+ #| 1 | #+---+ #| 1 | #+---+ df = spark.sql('select 1;') df.show() #+---+ #| 1 | #+---+ #| 1 | #+---+ ``` Closes #27567 from samsetegne/semicolon. Authored-by: samsetegne <samuelsetegne@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 44431d4) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

# What changes were proposed in this pull request? This change proposes ignoring a terminating semicolon from queries submitted by the user (if included) instead of raising a parse exception. # Why are the changes needed? When a user submits a directly executable SQL statement terminated with a semicolon, they receive an `org.apache.spark.sql.catalyst.parser.ParseException` of `extraneous input ';' expecting <EOF>`. SQL-92 describes a direct SQL statement as having the format of `<directly executable statement> <semicolon>` and the majority of SQL implementations either require the semicolon as a statement terminator, or make it optional (meaning not raising an exception when it's included, seemingly in recognition that it's a common behavior). # Does this PR introduce any user-facing change? No # How was this patch tested? Unit test added to `PlanParserSuite` ``` sbt> project catalyst sbt> testOnly *PlanParserSuite [info] - case insensitive (565 milliseconds) [info] - explain (9 milliseconds) [info] - set operations (41 milliseconds) [info] - common table expressions (31 milliseconds) [info] - simple select query (47 milliseconds) [info] - hive-style single-FROM statement (11 milliseconds) [info] - multi select query (32 milliseconds) [info] - query organization (41 milliseconds) [info] - insert into (12 milliseconds) [info] - aggregation (24 milliseconds) [info] - limit (11 milliseconds) [info] - window spec (11 milliseconds) [info] - lateral view (17 milliseconds) [info] - joins (62 milliseconds) [info] - sampled relations (11 milliseconds) [info] - sub-query (11 milliseconds) [info] - scalar sub-query (9 milliseconds) [info] - table reference (2 milliseconds) [info] - table valued function (8 milliseconds) [info] - SPARK-20311 range(N) as alias (2 milliseconds) [info] - SPARK-20841 Support table column aliases in FROM clause (3 milliseconds) [info] - SPARK-20962 Support subquery column aliases in FROM clause (4 milliseconds) [info] - SPARK-20963 Support aliases for join relations in FROM clause (3 milliseconds) [info] - inline table (23 milliseconds) [info] - simple select query with !> and !< (5 milliseconds) [info] - select hint syntax (34 milliseconds) [info] - SPARK-20854: select hint syntax with expressions (12 milliseconds) [info] - SPARK-20854: multiple hints (4 milliseconds) [info] - TRIM function (16 milliseconds) [info] - OVERLAY function (16 milliseconds) [info] - precedence of set operations (18 milliseconds) [info] - create/alter view as insert into table (4 milliseconds) [info] - Invalid insert constructs in the query (10 milliseconds) [info] - relation in v2 catalog (3 milliseconds) [info] - CTE with column alias (2 milliseconds) [info] - statement containing terminal semicolons (3 milliseconds) [info] ScalaTest [info] Run completed in 3 seconds, 129 milliseconds. [info] Total number of tests run: 36 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 36, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [info] Passed: Total 36, Failed 0, Errors 0, Passed 36 ``` ### Current behavior: #### scala ```scala scala> val df = sql("select 1") // df: org.apache.spark.sql.DataFrame = [1: int] scala> df.show() // +---+ // | 1| // +---+ // | 1| // +---+ scala> val df = sql("select 1;") // org.apache.spark.sql.catalyst.parser.ParseException: // extraneous input ';' expecting <EOF>(line 1, pos 8) // == SQL == // select 1; // --------^^^ // at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:263) // at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:130) // at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:52) // at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:76) // at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:605) // at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) // at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:605) // ... 47 elided ``` #### pyspark ```python df = spark.sql('select 1') df.show() #+---+ #| 1| #+---+ #| 1| #+---+ df = spark.sql('select 1;') # Traceback (most recent call last): # File "<stdin>", line 1, in <module> # File "/Users/ssetegne/spark/python/pyspark/sql/session.py", line 646, in sql # return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped) # File "/Users/ssetegne/spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1286, in # __call__ # File "/Users/ssetegne/spark/python/pyspark/sql/utils.py", line 102, in deco # raise converted # pyspark.sql.utils.ParseException: # extraneous input ';' expecting <EOF>(line 1, pos 8) # == SQL == # select 1; # --------^^^ ``` ### Behavior after proposed fix: #### scala ```scala scala> val df = sql("select 1") // df: org.apache.spark.sql.DataFrame = [1: int] scala> df.show() // +---+ // | 1| // +---+ // | 1| // +---+ scala> val df = sql("select 1;") // df: org.apache.spark.sql.DataFrame = [1: int] scala> df.show() // +---+ // | 1| // +---+ // | 1| // +---+ ``` #### pyspark ```python df = spark.sql('select 1') df.show() #+---+ #| 1 | #+---+ #| 1 | #+---+ df = spark.sql('select 1;') df.show() #+---+ #| 1 | #+---+ #| 1 | #+---+ ``` Closes apache#27567 from samsetegne/semicolon. Authored-by: samsetegne <samuelsetegne@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

dongjoon-hyun changed the title ~~[SPARK-30822] Remove semicolon at the end of a sql query~~ [SPARK-30822][SQL] Remove semicolon at the end of a sql query Feb 20, 2020

dongjoon-hyun requested changes Feb 20, 2020

View reviewed changes

dongjoon-hyun added the SQL label Feb 20, 2020

samredai requested a review from dongjoon-hyun February 22, 2020 04:51

maropu reviewed Feb 23, 2020

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala Outdated Show resolved Hide resolved

dongjoon-hyun reviewed Feb 25, 2020

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala Outdated Show resolved Hide resolved

maropu reviewed Feb 25, 2020

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala Outdated Show resolved Hide resolved

maropu reviewed Feb 26, 2020

View reviewed changes

dongjoon-hyun added the PYSPARK label Feb 28, 2020

cloud-fan reviewed Mar 5, 2020

View reviewed changes

maropu approved these changes Mar 25, 2020

View reviewed changes

cloud-fan closed this in 44431d4 Mar 25, 2020

Conversation

samredai commented Feb 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Current behavior:

scala

pyspark

Behavior after proposed fix:

scala

pyspark

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samredai commented Feb 22, 2020

Uh oh!

maropu commented Feb 23, 2020

Uh oh!

samredai commented Feb 23, 2020

Uh oh!

maropu commented Feb 23, 2020

Uh oh!

Uh oh!

samredai commented Feb 23, 2020

Uh oh!

maropu commented Feb 24, 2020

Uh oh!

samredai commented Feb 24, 2020

Uh oh!

Uh oh!

dongjoon-hyun commented Feb 25, 2020

Uh oh!

Uh oh!

SparkQA commented Feb 26, 2020

Uh oh!

SparkQA commented Feb 26, 2020

Uh oh!

maropu Feb 26, 2020

Choose a reason for hiding this comment

Uh oh!

samredai Feb 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samredai Feb 26, 2020

Choose a reason for hiding this comment

Uh oh!

maropu Mar 5, 2020

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 26, 2020

Uh oh!

SparkQA commented Feb 27, 2020

Uh oh!

SparkQA commented Feb 27, 2020

Uh oh!

samredai commented Feb 27, 2020

Uh oh!

maropu commented Feb 27, 2020

Uh oh!

SparkQA commented Feb 27, 2020

Uh oh!

maropu commented Feb 27, 2020

Uh oh!

SparkQA commented Feb 28, 2020

Uh oh!

cloud-fan Mar 5, 2020

Choose a reason for hiding this comment

Uh oh!

samredai Mar 5, 2020

Choose a reason for hiding this comment

Uh oh!

cloud-fan Mar 6, 2020

Choose a reason for hiding this comment

Uh oh!

samredai Mar 23, 2020

samredai commented Feb 13, 2020 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

samredai Feb 26, 2020 •

edited

Loading