[SPARK-19115] [SQL] Supporting Create External Table Like Location #16638

ouyangxiaochen · 2017-01-19T03:20:34Z

What changes were proposed in this pull request?

Support CREATE [EXTERNAL] TABLE LIKE LOCATION... syntax for Hive tables.
In this PR,we follow SparkSQL design rules :

supporting create external table like view or physical table or temporary view with location.
creating an external table without location,we will throw an OpreationNotAllowed exception.
creating a managed table with location,this table will be an external table other than managed table.

How was this patch tested?

Add new test cases and update existing test cases

gatorsmile · 2017-01-19T08:21:27Z

Could you follow the title requirement in http://spark.apache.org/contributing.html?

gatorsmile · 2017-01-19T08:22:31Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

@@ -58,6 +58,7 @@ import org.apache.spark.util.Utils
 case class CreateTableLikeCommand(


Please update the comment of this class.

ok,i will update it later,Thanks!

gatorsmile · 2017-01-19T08:24:00Z

We have a few test cases you can follow. Please create test cases. Thanks!

gatorsmile · 2017-01-19T08:27:16Z

In the PR, you might need to consider more scenarios. For example, let me ask a question. How does Hive behave when the specified location is not empty?

ouyangxiaochen · 2017-01-19T09:18:32Z

Here is the differences between Hive and Spark2.x as follow:
1.Hive
create table test(id int); --> MANAGED_TABLE
create table test(id int) location '/warehouse/test'; --> MANAGED_TABLE
create external table test(id int) location '/warehouse/test'; --> EXTERNAL_TABLE
create external table test(id int); --> EXTERNAL_TABLE

2.Spark2.x:
create table test(id int); --> MANAGED_TABLE
create table test(id int) location '/warehouse/test'; --> EXTERNAL_TABLE
create external table test(id int) location '/warehouse/test'; --> EXTERNAL_TABLE
create external table test(id int); --> operationNotAllowed("CREATE EXTERNAL TABLE must be accompanied by LOCATION", ctx)

So,this PR follows the spark2.x‘s design rules,thanks!

gatorsmile · 2017-01-23T06:42:17Z

First, please change the PR title to [SPARK-19115][SQL]Supporting Create External Table Like Location

gatorsmile · 2017-01-23T06:44:18Z

Let me rephrase it. If the directory specified in the LOCATION spec contains the other files, what does Hive behave?

gatorsmile · 2017-01-23T06:46:13Z

Please keep updating your PR description. For example, this PR is not relying on manual tests. In addition, you also need to summarize what this PR did. List more details to help reviewers understand your changes and impacts. Thanks!

ouyangxiaochen · 2017-01-23T07:47:54Z

I am sorry that I did't grasp the key points of your question. In Hive, if there are data files under the specified path while creating an external table, then Hive will identify the files as table data files.
In many spark applications, external table data is generated by other applications under the external table path. So, Hive did nothing with the directory specified in the LOCATION.
Thank you for your patience and guidance. @gatorsmile

gatorsmile · 2017-01-24T00:01:25Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

   * }}}
   */
  override def visitCreateTableLike(ctx: CreateTableLikeContext): LogicalPlan = withOrigin(ctx) {
    val targetTable = visitTableIdentifier(ctx.target)
    val sourceTable = visitTableIdentifier(ctx.source)
-    CreateTableLikeCommand(targetTable, sourceTable, ctx.EXISTS != null)
+    val location = Option(ctx.locationSpec).map(visitLocationSpec)
+    if (ctx.EXTERNAL != null && location.isEmpty) {


Add a comment above this line:

// If we are creating an EXTERNAL table, then the LOCATION field is required

OK, I'll do it later, Thanks!

gatorsmile · 2017-01-24T00:01:55Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

+    val location = Option(ctx.locationSpec).map(visitLocationSpec)
+    if (ctx.EXTERNAL != null && location.isEmpty) {
+      operationNotAllowed("CREATE EXTERNAL TABLE LIKE must be accompanied by LOCATION", ctx)
+    }


To the other reviewers, we are following what we did in visitCreateHiveTable

gatorsmile · 2017-01-24T00:04:43Z

@ouyangxiaochen Please do not duplicate the test cases. Try to combine them.

@cloud-fan @yhuai Could you please check whether such a DDL support is desirable?

gatorsmile · 2017-01-24T00:04:54Z

ok to test.

SparkQA · 2017-01-24T01:39:19Z

Test build #71887 has finished for PR 16638 at commit 713ca97.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

…ala file. 2. repair the error for test cases in HiveDDLSuite.scala file, sql statements lost a pair of single quotes.

ouyangxiaochen · 2017-01-24T03:28:28Z

I have fixed the error of test cases and they run successfully. So,please run the test cases again.Thanks a lot! @SparkQA

cloud-fan · 2017-01-24T03:41:41Z

sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

@@ -81,8 +81,8 @@ statement
        rowFormat?  createFileFormat? locationSpec?
        (TBLPROPERTIES tablePropertyList)?
        (AS? query)?                                                   #createHiveTable
-    | CREATE TABLE (IF NOT EXISTS)? target=tableIdentifier
-        LIKE source=tableIdentifier                                    #createTableLike
+    | CREATE EXTERNAL? TABLE (IF NOT EXISTS)? target=tableIdentifier


since Spark 2.2, we wanna hide the manage/external concept from users. It looks reasonable to add a LOCATION statement in CREATE TABLE LIKE, but do we really need the EXTERNAL keyword? We don't need to be exactly same with hive.

ok then let's simplify the logic: if location is specified, we create an external table internally. Else, create managed table.

SparkQA · 2017-01-24T05:54:09Z

Test build #71904 has finished for PR 16638 at commit b80f8e6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-01-25T23:30:26Z

ping @ouyangxiaochen : )

ouyangxiaochen · 2017-02-06T01:04:01Z

Happy Chinese New Year ! @gatorsmile
The Spring Festival holiday just ended, and I return to work today, what work do I need to do?

ouyangxiaochen · 2017-02-07T09:14:41Z

ping @gatorsmile

cloud-fan · 2017-02-07T14:07:02Z

please address #16638 (comment)

2.simplify the logic: if location is specified, we create an external table internally. Else, create managed table 3.update test cases

cloud-fan · 2017-02-08T02:15:29Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala

@@ -51,13 +51,14 @@ import org.apache.spark.util.Utils
 *
 * The syntax of using this command in SQL is:
 * {{{
- *   CREATE TABLE [IF NOT EXISTS] [db_name.]table_name
- *   LIKE [other_db_name.]existing_table_name
+ *   CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name


no EXTERNAL

cloud-fan · 2017-02-08T02:17:02Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveDDLCommandSuite.scala

@@ -518,8 +518,8 @@ class HiveDDLCommandSuite extends PlanTest with SQLTestUtils with TestHiveSingle

  test("create table like") {
    val v1 = "CREATE TABLE table1 LIKE table2"
-    val (target, source, exists) = parser.parsePlan(v1).collect {
-      case CreateTableLikeCommand(t, s, allowExisting) => (t, s, allowExisting)
+    val (target, source, location, exists) = parser.parsePlan(v1).collect {


add an assert to check location is empty

cloud-fan · 2017-02-08T02:17:39Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveDDLCommandSuite.scala

@@ -528,8 +528,8 @@ class HiveDDLCommandSuite extends PlanTest with SQLTestUtils with TestHiveSingle
    assert(source.table == "table2")

    val v2 = "CREATE TABLE IF NOT EXISTS table1 LIKE table2"


add one more test case to check CREATE TABLE LIKE with location

cloud-fan · 2017-02-08T02:18:12Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala

-          TableIdentifier(targetTabName, Some("default")))
-
-        checkCreateTableLike(sourceTable, targetTable)
+  test("CREATE TABLE LIKE a temporary view [LOCATION]...") {


actually we don't need to change the test name

cloud-fan · 2017-02-08T02:19:29Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala

-        checkCreateTableLike(sourceTable, targetTable)
+  test("CREATE TABLE LIKE a temporary view [LOCATION]...") {
+    var createdTableType = "MANAGED"
+    for ( i <- 0 to 1 ) {


you can create a method with parameter location: Option[String], instead of writing a for loop with 2 iterations...

I write this for the purpose of reusing this piece of public code, because the basic logic of these two scenarios are almost the same.

creating a method and wrap this piece of code can also reuse the code.

cloud-fan · 2017-02-08T02:20:33Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala

+  private def checkCreateTableLike(
+      sourceTable: CatalogTable,
+      targetTable: CatalogTable,
+      tableType: String): Unit = {


why not pass in a CataogTableType instead of a string?

cloud-fan · 2017-02-08T02:20:49Z

please resolve the conflict too, thanks!

SparkQA · 2017-02-08T04:40:14Z

Test build #72550 has finished for PR 16638 at commit 71f1d12.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

ouyangxiaochen · 2017-02-08T10:41:24Z

I met some troubles when I resolving the conflict, So can u give me some guidances? Thanks a lot! @cloud-fan

cloud-fan · 2017-02-08T10:52:06Z

you can start with a new branch and apply the changes manually, e.g. copy code from this PR to the new branch.

SparkQA · 2017-02-08T12:52:39Z

Test build #72576 has finished for PR 16638 at commit 5dd21b2.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

ouyangxiaochen · 2017-02-09T03:11:36Z

Should I delete my remote master repository firstly，and fork a new one again? @cloud-fan

gatorsmile · 2017-02-09T05:42:32Z

Your master is clean (i.e., exactly identical to the upstream/master), right?

ouyangxiaochen · 2017-02-09T05:56:41Z

My master branch with the master of Apache is not synchronized, and then I did the pull operation, my master branch still not synchronized, and finally I removed my remote repository.
But I do not know how to associate a new branch with this PR? I Think I made a misopreation. @gatorsmile

gatorsmile · 2017-02-09T05:58:32Z

You might not be familiar with the Github/Git. How about submitting a new PR? : )

ouyangxiaochen · 2017-02-09T06:10:17Z

Here's how I create a PR:
1.fork the master of Apache;
2.create a new branch in my master branch
3.select my new branch menu and create a new PR.
4.edit my new branch code.
5.commit and push.
Can u point lost or mistake steps for me, Thank u for your guidances！ @gatorsmile

gatorsmile · 2017-02-09T06:13:14Z

You do not need to do the step 1 every time. You might miss the following two steps when you want to resolve your conflicts.

git fetch upstream
git merge upstream/master

ouyangxiaochen · 2017-02-09T06:27:56Z

Oh, I See, I miss a step ’git remote add upstream ...‘.
But now, I have delete my repository in my profile. So this PR can‘t know which repository should be associated. So, do u have a method to help me cover this problem?

gatorsmile · 2017-02-09T06:32:38Z

No worry, open/submit a new PR. : )

gatorsmile · 2017-02-09T06:33:38Z

You might be able to make it by forcefully pushing the new changes by git push -f origin NEW_BRANCH:REMOTE_BRANCH

ouyangxiaochen · 2017-02-09T06:38:15Z

OK. I'll try it immediately. Thank U very much!

ouyangxiaochen · 2017-02-09T08:47:53Z

I have created a PR at https://github.com/apache/spark/pull/16868, please review it, Thanks! @gatorsmile @cloud-fan

Closes apache#11785 Closes apache#13027 Closes apache#13614 Closes apache#13761 Closes apache#15197 Closes apache#14006 Closes apache#12576 Closes apache#15447 Closes apache#13259 Closes apache#15616 Closes apache#14473 Closes apache#16638 Closes apache#16146 Closes apache#17269 Closes apache#17313 Closes apache#17418 Closes apache#17485 Closes apache#17551 Closes apache#17463 Closes apache#17625 Closes apache#10739 Closes apache#15193 Closes apache#15344 Closes apache#14804 Closes apache#16993 Closes apache#17040 Closes apache#15180 Closes apache#17238

This pr proposed to close stale PRs. Currently, we have 400+ open PRs and there are some stale PRs whose JIRA tickets have been already closed and whose JIRA tickets does not exist (also, they seem not to be minor issues). // Open PRs whose JIRA tickets have been already closed Closes apache#11785 Closes apache#13027 Closes apache#13614 Closes apache#13761 Closes apache#15197 Closes apache#14006 Closes apache#12576 Closes apache#15447 Closes apache#13259 Closes apache#15616 Closes apache#14473 Closes apache#16638 Closes apache#16146 Closes apache#17269 Closes apache#17313 Closes apache#17418 Closes apache#17485 Closes apache#17551 Closes apache#17463 Closes apache#17625 // Open PRs whose JIRA tickets does not exist and they are not minor issues Closes apache#10739 Closes apache#15193 Closes apache#15344 Closes apache#14804 Closes apache#16993 Closes apache#17040 Closes apache#15180 Closes apache#17238 N/A Author: Takeshi Yamamuro <yamamuro@apache.org> Closes apache#17734 from maropu/resolved_pr. Change-Id: Id2e590aa7283fe5ac01424d30a40df06da6098b5

spark-19115

adde008

gatorsmile reviewed Jan 19, 2017

View reviewed changes

update test cases and comments

713ca97

ouyangxiaochen changed the title ~~spark-19115~~ [SPARK-19115] [SQL] Supporting Create External Table Like Location Jan 23, 2017

gatorsmile reviewed Jan 24, 2017

View reviewed changes

1. add a comment for method visitCreateTableLike in SparkSqlParser.sc…

b80f8e6

…ala file. 2. repair the error for test cases in HiveDDLSuite.scala file, sql statements lost a pair of single quotes.

cloud-fan reviewed Jan 24, 2017

View reviewed changes

1.remove EXTERNAL key word in sqlbase.g4 file

71f1d12

2.simplify the logic: if location is specified, we create an external table internally. Else, create managed table 3.update test cases

cloud-fan reviewed Feb 8, 2017

View reviewed changes

ouyangxiaochen added 3 commits February 8, 2017 17:41

update test cases

9e59fb4

resolve the conflict

bb3660a

update test cases

5dd21b2

maropu mentioned this pull request Apr 23, 2017

[BUILD] Close stale PRs #17734

Closed

asfgit closed this in e9f9715 Apr 24, 2017

		@@ -58,6 +58,7 @@ import org.apache.spark.util.Utils
		case class CreateTableLikeCommand(

		@@ -528,8 +528,8 @@ class HiveDDLCommandSuite extends PlanTest with SQLTestUtils with TestHiveSingle
		assert(source.table == "table2")

		val v2 = "CREATE TABLE IF NOT EXISTS table1 LIKE table2"

[SPARK-19115] [SQL] Supporting Create External Table Like Location #16638

[SPARK-19115] [SQL] Supporting Create External Table Like Location #16638

Conversation

ouyangxiaochen commented Jan 19, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

gatorsmile commented Jan 19, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gatorsmile commented Jan 19, 2017 • edited Loading

gatorsmile commented Jan 19, 2017

ouyangxiaochen commented Jan 19, 2017

gatorsmile commented Jan 23, 2017 • edited Loading

gatorsmile commented Jan 23, 2017

gatorsmile commented Jan 23, 2017

ouyangxiaochen commented Jan 23, 2017

gatorsmile Jan 24, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gatorsmile commented Jan 24, 2017

gatorsmile commented Jan 24, 2017

SparkQA commented Jan 24, 2017

ouyangxiaochen commented Jan 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jan 24, 2017

gatorsmile commented Jan 25, 2017

ouyangxiaochen commented Feb 6, 2017

ouyangxiaochen commented Feb 7, 2017

cloud-fan commented Feb 7, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented Feb 8, 2017

SparkQA commented Feb 8, 2017

ouyangxiaochen commented Feb 8, 2017

cloud-fan commented Feb 8, 2017

SparkQA commented Feb 8, 2017

ouyangxiaochen commented Feb 9, 2017 • edited Loading

gatorsmile commented Feb 9, 2017

ouyangxiaochen commented Feb 9, 2017

gatorsmile commented Feb 9, 2017

ouyangxiaochen commented Feb 9, 2017

gatorsmile commented Feb 9, 2017

ouyangxiaochen commented Feb 9, 2017

gatorsmile commented Feb 9, 2017

gatorsmile commented Feb 9, 2017 • edited Loading

ouyangxiaochen commented Feb 9, 2017

ouyangxiaochen commented Feb 9, 2017

ouyangxiaochen commented Jan 19, 2017 •

edited

Loading

gatorsmile commented Jan 19, 2017 •

edited

Loading

gatorsmile commented Jan 23, 2017 •

edited

Loading

gatorsmile Jan 24, 2017 •

edited

Loading

ouyangxiaochen commented Feb 9, 2017 •

edited

Loading

gatorsmile commented Feb 9, 2017 •

edited

Loading