[SPARK-12689][SQL] Migrate DDL parsing to the newly absorbed parser #10723

viirya · 2016-01-12T10:08:26Z

JIRA: https://issues.apache.org/jira/browse/SPARK-12689

DDLParser processes three commands: createTable, describeTable and refreshTable.
This patch migrates the three commands to newly absorbed parser.

SparkQA · 2016-01-12T10:37:55Z

Test build #49237 has finished for PR 10723 at commit f886f67.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-01-12T20:33:30Z

cc @hvanhovell

@viirya can you rename the title to "migrate describe table parsing to ..."

viirya · 2016-01-13T05:07:42Z

Because SQLContext still uses DDLParser, looks like I can't simply remove describeTable command from DDLParser. So I can't gradually migrate these three commands in three PRs.

SparkQA · 2016-01-13T16:35:35Z

Test build #49317 has finished for PR 10723 at commit c60cd9e.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2016-01-13T17:59:01Z

@cloud-fan Can you also take a look? It is related to the work of adding DDL support for creating bucketed tables.

cloud-fan · 2016-01-13T18:18:53Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkQl.scala

@@ -52,26 +57,30 @@ private[sql] class SparkQl(conf: ParserConf = SimpleParserConf()) extends Cataly
          nodeToDescribeFallback(node)
        } else {
          tableType match {
-            case Token("TOK_TABTYPE", Token("TOK_TABNAME", nameParts :: Nil) :: Nil) =>
+            case Token("TOK_TABTYPE", Token("TOK_TABNAME", nameParts) :: Nil) =>


Why change this? You didn't touch the describe stuff in SparkSqlParser.g right?

Yes. I think it is incorrect from beginning but not be tested it out because we don't reach here before. I've tested it locally. Once all three commands are migrated, we can see this passing tests.

if we parse the following SQL using the parse driver org.apache.spark.sql.catalyst.parser.ParseDriver.parsePlan("DESCRIBE EXTENDED tbl.a", null)

We would end up with the following AST:

TOK_DESCTABLE 1, 0, 6, 18 :- TOK_TABTYPE 1, 4, 6, 18 : +- TOK_TABNAME 1, 4, 6, 18 : :- tbl 1, 4, 4, 18 : +- a 1, 6, 6, 22 +- EXTENDED 1, 2, 2, 9

This change would pick this up, and old code didn't (I am sure I tested this though :S ). You can disable this in the DDL parser, to see if it works now.

Could we add a test for this? The Hive test suite apparently misses this one. I could also address in a different PR.

Actually we have test for describe table command in HiveQuerySuite. Do we need another test?

Conflicts: sql/catalyst/src/main/antlr3/org/apache/spark/sql/catalyst/parser/SparkSqlLexer.g

SparkQA · 2016-01-18T08:24:56Z

Test build #49585 has finished for PR 10723 at commit 5a6cc4a.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2016-01-18T09:12:04Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkQl.scala

                  // It is describing a column with the format like "describe db.table column".
                  nodeToDescribeFallback(node)
-                case tableName =>
+                case tableName :: Nil =>
                  // It is describing a table with the format like "describe table".
                  datasources.DescribeCommand(
                    UnresolvedRelation(TableIdentifier(tableName.text), None),


cleanIdentifier?

hvanhovell · 2016-01-18T09:52:36Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala

@@ -316,7 +316,7 @@ class HiveContext private[hive](
  }

  protected[sql] override def parseSql(sql: String): LogicalPlan = {
-    super.parseSql(substitutor.substitute(hiveconf, sql))
+    sqlParser.parsePlan(substitutor.substitute(hiveconf, sql))


How about gradually moving functionality from the DLL parser to SparkQl? That would allow us to test this in the meantime.

~~DDLParser is still used in SQLContext. Do we want to completely remove it?~~ Because I already migrate three commands. I think we can test them all together.

hvanhovell · 2016-01-18T09:57:50Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkQl.scala

+            case Token("TOK_TABLEOPTIONS", options) =>
+              options.map {
+                case Token("TOK_TABLEOPTION", Token(key, _) :: Token(value, _) :: Nil) =>
+                  (key, value.replaceAll("^\'|^\"|\"$|\'$", ""))


Why not use unquoteString this does the same and is easier to read?

Don't know there is unquoteString. Thanks.

SparkQA · 2016-01-18T11:28:59Z

Test build #49591 has finished for PR 10723 at commit 1c145eb.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

Conflicts: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DDLParser.scala

viirya · 2016-01-19T08:42:04Z

sql/catalyst/src/main/antlr3/org/apache/spark/sql/catalyst/parser/ExpressionParser.g

+looseNonReserved
+    : nonReserved | KW_FROM | KW_TO
+    ;
+


We are allowed to use From and To in CreateTableUsing command's options (actually seems we can use any string as the option key). But we can't simply add them into nonReserved because by doing that we mess other existing rules. So we create a looseIdentifier and looseNonReserved here.

Why not add this to the option rule directly?

Because I don't know if we will add other reserved words later. If so, the option rule might be too long. I don't count if any keywords are not included in nonReserved.

Both (current approach or adding it to the option rule) are okay for me.

Could add your initial line commentaar as a comment in the code?

Thanks for reminding. I've added it.

SparkQA · 2016-01-19T08:54:22Z

Test build #49673 has finished for PR 10723 at commit 838f701.

This patch fails MiMa tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2016-01-19T09:02:22Z

Test build #49669 has finished for PR 10723 at commit d800c58.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-01-19T09:11:40Z

Test build #49675 has finished for PR 10723 at commit 7e0f218.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2016-01-27T17:46:30Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkQl.scala

+                    case Token(k, Nil) => k
+                  }.mkString(".")
+                  val value = unquoteString(keysAndValue.last.text)
+                  (key, unquoteString(value))


Unquoting twice?

hvanhovell · 2016-01-27T18:25:00Z

@viirya I have done another round.

Most things are minor, but I would to know why you want to change the treatment of quoted identifiers?

…Parser commands to new Parser This PR moves all the functionality provided by the SparkSQLParser/ExtendedHiveQlParser to the new Parser hierarchy (SparkQl/HiveQl). This also improves the current SET command parsing: the current implementation swallows ```set role ...``` and ```set autocommit ...``` commands, this PR respects these commands (and passes them on to Hive). This PR and #10723 end the use of Parser-Combinator parsers for SQL parsing. As a result we can also remove the ```AbstractSQLParser``` in Catalyst. The PR is marked WIP as long as it doesn't pass all tests. cc rxin viirya winningsix (this touches #10144) Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #10905 from hvanhovell/SPARK-12866.

Conflicts: sql/catalyst/src/main/antlr3/org/apache/spark/sql/catalyst/parser/SparkSqlParser.g sql/core/src/main/scala/org/apache/spark/sql/execution/SparkQl.scala sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala

viirya · 2016-01-28T06:44:40Z

@hvanhovell Thanks for reviewing this. I've updated this to address your comments. Please see if it is proper for you.

SparkQA · 2016-01-28T08:27:06Z

Test build #50258 has finished for PR 10723 at commit 7350f07.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-01-28T16:14:17Z

Test build #50276 has finished for PR 10723 at commit 7d31844.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2016-01-28T16:49:42Z

LGTM

Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkQl.scala

SparkQA · 2016-01-29T08:46:17Z

Test build #50355 has finished for PR 10723 at commit 8b7086e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class SetDatabaseCommand(databaseName: String) extends RunnableCommand

viirya · 2016-01-29T08:50:50Z

retest this please.

SparkQA · 2016-01-29T10:23:50Z

Test build #50366 has finished for PR 10723 at commit 8b7086e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class SetDatabaseCommand(databaseName: String) extends RunnableCommand

viirya · 2016-01-29T10:42:01Z

It's weird.

viirya · 2016-01-29T10:42:16Z

retest this please.

hvanhovell · 2016-01-29T10:43:52Z

It is, can't make sense of this either. Are tests passing locally?

viirya · 2016-01-29T10:46:02Z

Yeah, I think so. And I don't update codes since last successful test.

viirya · 2016-01-29T10:46:31Z

See how another round of test shows.

viirya · 2016-01-29T11:07:00Z

Many unrelated failures like can't find hive jar file.

SparkQA · 2016-01-29T12:50:26Z

Test build #50377 has finished for PR 10723 at commit 8b7086e.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class SetDatabaseCommand(databaseName: String) extends RunnableCommand

viirya · 2016-01-29T12:58:47Z

ping @rxin

hvanhovell · 2016-01-29T13:02:33Z

@viirya I am gonna trigger another test to make sure things keep working.

hvanhovell · 2016-01-29T13:02:39Z

retest this please

viirya · 2016-01-29T13:18:42Z

@hvanhovell ok, thanks.

SparkQA · 2016-01-29T15:06:03Z

Test build #50382 has finished for PR 10723 at commit 8b7086e.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class SetDatabaseCommand(databaseName: String) extends RunnableCommand

viirya · 2016-01-31T02:41:25Z

cc @rxin

rxin · 2016-01-31T07:03:28Z

Thanks - merging this in master.

Migrate DDL parsing to the newly absorbed parser: describe command.

f886f67

viirya changed the title ~~[SPARK-12689][SQL] Migrate DDL parsing to the newly absorbed parser: describe command~~ [SPARK-12689][SQL][WIP] Migrate DDL parsing to the newly absorbed parser Jan 13, 2016

Migrate refreshTable command.

c60cd9e

cloud-fan reviewed Jan 13, 2016
View reviewed changes

viirya added 2 commits January 18, 2016 16:06

Migrate createTable ddl command.

f2d6fa6

Merge remote-tracking branch 'upstream/master' into migrate-ddl-describe

5a6cc4a

Conflicts: sql/catalyst/src/main/antlr3/org/apache/spark/sql/catalyst/parser/SparkSqlLexer.g

viirya changed the title ~~[SPARK-12689][SQL][WIP] Migrate DDL parsing to the newly absorbed parser~~ [SPARK-12689][SQL] Migrate DDL parsing to the newly absorbed parser Jan 18, 2016

hvanhovell reviewed Jan 18, 2016
View reviewed changes

Remove quotes from option values.

78f1b7c

hvanhovell reviewed Jan 18, 2016
View reviewed changes

Allow dotted provider name.

1c145eb

hvanhovell reviewed Jan 18, 2016
View reviewed changes

viirya added 3 commits January 19, 2016 15:03

Address comments and fix bug.

d800c58

Remove DDLParser.

838f701

Merge remote-tracking branch 'upstream/master' into migrate-ddl-describe

7e0f218

Conflicts: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DDLParser.scala

viirya reviewed Jan 19, 2016
View reviewed changes

hvanhovell reviewed Jan 27, 2016
View reviewed changes

For comments.

7d31844

Merge remote-tracking branch 'upstream/master' into migrate-ddl-describe

8b7086e

Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkQl.scala

asfgit closed this in 0e6d92d Jan 31, 2016

viirya deleted the migrate-ddl-describe branch December 27, 2023 18:18

[SPARK-12689][SQL] Migrate DDL parsing to the newly absorbed parser #10723

[SPARK-12689][SQL] Migrate DDL parsing to the newly absorbed parser #10723

Conversation

viirya commented Jan 12, 2016

SparkQA commented Jan 12, 2016

rxin commented Jan 12, 2016

viirya commented Jan 13, 2016

SparkQA commented Jan 13, 2016

yhuai commented Jan 13, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jan 18, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jan 18, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jan 19, 2016

SparkQA commented Jan 19, 2016

SparkQA commented Jan 19, 2016

Choose a reason for hiding this comment

hvanhovell commented Jan 27, 2016

viirya commented Jan 28, 2016

SparkQA commented Jan 28, 2016

SparkQA commented Jan 28, 2016

hvanhovell commented Jan 28, 2016

SparkQA commented Jan 29, 2016

viirya commented Jan 29, 2016

SparkQA commented Jan 29, 2016

viirya commented Jan 29, 2016

viirya commented Jan 29, 2016

hvanhovell commented Jan 29, 2016

viirya commented Jan 29, 2016

viirya commented Jan 29, 2016

viirya commented Jan 29, 2016

SparkQA commented Jan 29, 2016

viirya commented Jan 29, 2016

hvanhovell commented Jan 29, 2016

hvanhovell commented Jan 29, 2016

viirya commented Jan 29, 2016

SparkQA commented Jan 29, 2016

viirya commented Jan 31, 2016

rxin commented Jan 31, 2016