[SPARK-13139][SQL] Create native DDL commands #11048

viirya · 2016-02-03T10:09:45Z

JIRA: https://issues.apache.org/jira/browse/SPARK-13139

From JIRA: We currently delegate most DDLs directly to Hive, through NativePlaceholder in HiveQl.scala. In Spark 2.0, we want to provide native implementations for DDLs for both SQLContext and HiveContext.

This PR will do the first step to parse DDL commands and create logical commands that encapsulate them. Actual implementations still delegate to HiveNativeCommand now.

SparkQA · 2016-02-03T10:25:59Z

Test build #50658 has finished for PR 11048 at commit 77252af.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class CreateDataBase(
- case class CreateFunction(

SparkQA · 2016-02-03T15:38:01Z

Test build #50660 has finished for PR 11048 at commit 2b38d11.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-02-03T21:28:38Z

Thanks - this looks pretty good as a start. We will need to add many other ddls, including alter/drop table, etc.

viirya · 2016-02-08T13:23:02Z

@rxin I've added alter table command support. As this command and corresponding change is big, I think we should let this PR only cover these three commands and do other commands in other PRs. How do you think?

SparkQA · 2016-02-08T13:37:04Z

Test build #50919 has finished for PR 11048 at commit f9c1397.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-08T15:55:05Z

Test build #50922 has finished for PR 11048 at commit eb1fab7.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-09T11:10:30Z

Test build #50965 has finished for PR 11048 at commit 3db2e1d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-02-10T23:27:45Z

@viirya yes we can do this incrementally. Let's just create subtasks under https://issues.apache.org/jira/browse/SPARK-13139

rxin · 2016-02-10T23:31:44Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkQl.scala

@@ -62,6 +66,458 @@ private[sql] class SparkQl(conf: ParserConf = SimpleParserConf()) extends Cataly
        val tableIdent = extractTableIdent(nameParts)
        RefreshTable(tableIdent)

+      case Token("TOK_CREATEDATABASE", Token(databaseName, Nil) :: createDatabaseArgs) =>


cc @hvanhovell

any suggestions on how we can make this file/function more modular? It is getting too long and we are about to add a lot more statements to it.

The main problem is LogicalPlan parsing, which we can split up in command/ddl/query parsing. We could use partial functions (or something like that) to implement the different parts of the parsing logic.

For instance:

abstract class BaseParser(val conf: ParserConf) extends ParserInterface { val planParsers: Seq[PlanParser] lazy val planParser = planParsers.reduce(_.orElse(_)) def nodeToPlan(node: ASTNode): LogicalPlan = { planParser.applyOrElse(node, throw new NotImplementedError(node.text)) } } abstract class PlanParser extends PartialFunction[ASTNode, LogicalPlan] case class ExplainCommandParser(base: BaseParser) extends PlanParser { override def isDefinedAt(node: ASTNode): Boolean = node.text == "TOK_EXPLAIN" override def apply(v1: ASTNode): LogicalPlan = v1.children match { case (crtTbl @ Token("TOK_CREATETABLE" | "TOK_QUERY", _)) :: rest => val extended = rest.exists(_.text.toUpperCase == "EXTENDED") ExplainCommand(base.nodeToPlan(crtTbl), extended) } } class SomeParser(conf: ParserConf) extends BaseParser { val planParsers: Seq[PlanParser] = Seq( ExplainCommandParser(this)) }

rxin · 2016-02-11T09:44:38Z

btw @viirya can we create a execution.commands package for this?

hvanhovell · 2016-02-11T14:57:00Z

sql/core/src/test/scala/org/apache/spark/sql/execution/SparkQlSuite.scala

+
+import org.apache.spark.sql.catalyst.plans.PlanTest
+
+class SparkQlSuite extends PlanTest {


We really should test the resulting plans here, and not wait for an AnalysisException to be thrown. I know this is a PITA, but it will save us a lot of headaches in the future.

hvanhovell · 2016-02-11T19:04:59Z

@viirya I have made an initial pass. This PR is large enough as it is, lets not more commands to it.

SparkQA · 2016-02-16T10:16:39Z

Test build #51356 has finished for PR 11048 at commit 170bd77.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- abstract class BaseParser(val conf: ParserConf) extends ParserInterface with ParserBase
- trait ParserBase
- abstract class NativeDDLCommands(val sql: String) extends RunnableCommand
- case class AlterTableSetProperties(
- case class AlterTableDropProperties(
- case class AlterTableCommandParser(base: CatalystQl) extends PlanParser

andrewor14 · 2016-03-02T21:37:34Z

sql/core/src/main/scala/org/apache/spark/sql/execution/commands/ddl.scala

+
+abstract class NativeDDLCommands(val sql: String) extends RunnableCommand {
+  override def run(sqlContext: SQLContext): Seq[Row] = {
+    sqlContext.catalog.runNativeCommand(sql)


I see, it's because we want the NativeDDLCommands to be in the sql package, where it shouldn't know anything about Hive. In the future these commands will no longer be passed to Hive directly as a string so they shouldn't be "native" anymore. For now, I would create a temporary method in SQLContext:

// In SQLContext.scala // TODO: remove this once we call specific operations in the catalog instead protected[sql] def runDDLCommand(text: String): Seq[Row] = { throw new UnsupportedOperationException } // In HiveContext.scala protected[sql] override def runDDLCommand(text: String): Seq[Row] = { runHiveSql(text).map(Row(_)) }

even though it's temporary I still think it's cleaner than doing it in the catalog.

andrewor14 · 2016-03-03T01:40:48Z

@viirya Thanks for working on this. The overall approach is very reasonable but a few things can be improved:

many files need to be moved to the new packages
many unused imports in all the files
many fields and methods can be marked as private, but are not
the splitting of commands into 2 files is kind of arbitrary. IIUC all native commands we currently pass to Hive directly today are grouped under ddl.scala, and the rest are grouped under commands.scala. In the future, however, we would like to handle all the DDLs ourselves, so this division won't make sense anymore.
general lack of comments in this area of the code makes things difficult to follow (not your fault)

Additionally I think the reason why this patch is so big is because we moved around a lot of files. The moving itself can be done in a separate PR, which would reduce the diff significantly and make the patch easier to review.

Addressing all the outstanding comments will likely take a long time. Would you mind that I take this over? I'll be sure to give you credit in the final patch.

viirya · 2016-03-03T04:04:08Z

@andrewor14 Thanks for reviewing. I don't mind if you want to take this over. Thanks for the credit!

## What changes were proposed in this pull request? This patch simply moves things to a new package in an effort to reduce the size of the diff in #11048. Currently the new package only has one file, but in the future we'll add many new commands in SPARK-13139. ## How was this patch tested? Jenkins. Author: Andrew Or <andrew@databricks.com> Closes #11482 from andrewor14/commands-package.

## What changes were proposed in this pull request? This patch simply moves things to existing package `o.a.s.sql.catalyst.parser` in an effort to reduce the size of the diff in #11048. This is conceptually the same as a recently merged patch #11482. ## How was this patch tested? Jenkins. Author: Andrew Or <andrew@databricks.com> Closes #11506 from andrewor14/parser-package.

## What changes were proposed in this pull request? When we add more DDL parsing logic in the future, SparkQl will become very big. To keep it smaller, we'll introduce helper "parser objects", e.g. one to parse alter table commands. However, these parser objects will need to access some helper methods that exist in CatalystQl. The proposal is to move those methods to an isolated ParserUtils object. This is based on viirya's changes in #11048. It prefaces the bigger fix for SPARK-13139 to make the diff of that patch smaller. ## How was this patch tested? No change in functionality, so just Jenkins. Author: Andrew Or <andrew@databricks.com> Closes #11529 from andrewor14/parser-utils.

## What changes were proposed in this pull request? This patch is ported over from viirya's changes in #11048. Currently for most DDLs we just pass the query text directly to Hive. Instead, we should parse these commands ourselves and in the future (not part of this patch) use the `HiveCatalog` to process these DDLs. This is a pretext to merging `SQLContext` and `HiveContext`. Note: As of this patch we still pass the query text to Hive. The difference is that we now parse the commands ourselves so in the future we can just use our own catalog. ## How was this patch tested? Jenkins, new `DDLCommandSuite`, which comprises of about 40% of the changes here. Author: Andrew Or <andrew@databricks.com> Closes #11573 from andrewor14/parser-plus-plus.

## What changes were proposed in this pull request? This patch simply moves things to a new package in an effort to reduce the size of the diff in apache#11048. Currently the new package only has one file, but in the future we'll add many new commands in SPARK-13139. ## How was this patch tested? Jenkins. Author: Andrew Or <andrew@databricks.com> Closes apache#11482 from andrewor14/commands-package.

## What changes were proposed in this pull request? This patch simply moves things to existing package `o.a.s.sql.catalyst.parser` in an effort to reduce the size of the diff in apache#11048. This is conceptually the same as a recently merged patch apache#11482. ## How was this patch tested? Jenkins. Author: Andrew Or <andrew@databricks.com> Closes apache#11506 from andrewor14/parser-package.

## What changes were proposed in this pull request? When we add more DDL parsing logic in the future, SparkQl will become very big. To keep it smaller, we'll introduce helper "parser objects", e.g. one to parse alter table commands. However, these parser objects will need to access some helper methods that exist in CatalystQl. The proposal is to move those methods to an isolated ParserUtils object. This is based on viirya's changes in apache#11048. It prefaces the bigger fix for SPARK-13139 to make the diff of that patch smaller. ## How was this patch tested? No change in functionality, so just Jenkins. Author: Andrew Or <andrew@databricks.com> Closes apache#11529 from andrewor14/parser-utils.

## What changes were proposed in this pull request? This patch is ported over from viirya's changes in apache#11048. Currently for most DDLs we just pass the query text directly to Hive. Instead, we should parse these commands ourselves and in the future (not part of this patch) use the `HiveCatalog` to process these DDLs. This is a pretext to merging `SQLContext` and `HiveContext`. Note: As of this patch we still pass the query text to Hive. The difference is that we now parse the commands ourselves so in the future we can just use our own catalog. ## How was this patch tested? Jenkins, new `DDLCommandSuite`, which comprises of about 40% of the changes here. Author: Andrew Or <andrew@databricks.com> Closes apache#11573 from andrewor14/parser-plus-plus.

init import: create database and create function.

77252af

Fix import order.

2b38d11

viirya added 3 commits February 8, 2016 13:08

Merge remote-tracking branch 'upstream/master' into native-ddl

cee8056

alter table command.

127a684

Fix it.

f9c1397

Fix style.

eb1fab7

List TOK_ALTERTABLE as not explanable command.

3db2e1d

rxin reviewed Feb 10, 2016
View reviewed changes

hvanhovell reviewed Feb 11, 2016
View reviewed changes

viirya added 3 commits February 15, 2016 06:19

Merge remote-tracking branch 'upstream/master' into native-ddl

8a7d34e

Add execution.commands package.

2e78f52

Refactor.

170bd77

viirya changed the title ~~[SPARK-13139][SQL][WIP] Create native DDL commands~~ [SPARK-13139][SQL] Create native DDL commands Feb 16, 2016

andrewor14 reviewed Mar 2, 2016
View reviewed changes

andrewor14 mentioned this pull request Mar 3, 2016

[SPARK-13632] [SQL] Move commands.scala to command package #11482

Closed

andrewor14 mentioned this pull request Mar 4, 2016

[SPARK-13633] [SQL] Move things into catalyst.parser package #11506

Closed

andrewor14 mentioned this pull request Mar 4, 2016

[SPARK-13689] [SQL] Move helper things in CatalystQl to new utils object #11529

Closed

andrewor14 pushed a commit to andrewor14/spark that referenced this pull request Mar 8, 2016

Port over viirya's changes in apache#11048

010afdd

andrewor14 mentioned this pull request Mar 8, 2016

[SPARK-13139][SQL] Parse Hive DDL commands ourselves #11573

Closed

viirya closed this Mar 11, 2016

viirya deleted the native-ddl branch December 27, 2023 18:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-13139][SQL] Create native DDL commands #11048

[SPARK-13139][SQL] Create native DDL commands #11048

viirya commented Feb 3, 2016

SparkQA commented Feb 3, 2016

SparkQA commented Feb 3, 2016

rxin commented Feb 3, 2016

viirya commented Feb 8, 2016

SparkQA commented Feb 8, 2016

SparkQA commented Feb 8, 2016

SparkQA commented Feb 9, 2016

rxin commented Feb 10, 2016

rxin Feb 10, 2016

hvanhovell Feb 11, 2016

rxin commented Feb 11, 2016

hvanhovell Feb 11, 2016

hvanhovell commented Feb 11, 2016

SparkQA commented Feb 16, 2016

andrewor14 Mar 2, 2016

andrewor14 commented Mar 3, 2016

viirya commented Mar 3, 2016


		import org.apache.spark.sql.catalyst.plans.PlanTest

		class SparkQlSuite extends PlanTest {

[SPARK-13139][SQL] Create native DDL commands #11048

[SPARK-13139][SQL] Create native DDL commands #11048

Conversation

viirya commented Feb 3, 2016

SparkQA commented Feb 3, 2016

SparkQA commented Feb 3, 2016

rxin commented Feb 3, 2016

viirya commented Feb 8, 2016

SparkQA commented Feb 8, 2016

SparkQA commented Feb 8, 2016

SparkQA commented Feb 9, 2016

rxin commented Feb 10, 2016

rxin Feb 10, 2016

Choose a reason for hiding this comment

hvanhovell Feb 11, 2016

Choose a reason for hiding this comment

rxin commented Feb 11, 2016

hvanhovell Feb 11, 2016

Choose a reason for hiding this comment

hvanhovell commented Feb 11, 2016

SparkQA commented Feb 16, 2016

andrewor14 Mar 2, 2016

Choose a reason for hiding this comment

andrewor14 commented Mar 3, 2016

viirya commented Mar 3, 2016