Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19115] [SQL] Supporting Create Table Like Location #16868

Closed
wants to merge 5 commits into from

Conversation

ouyangxiaochen
Copy link

@ouyangxiaochen ouyangxiaochen commented Feb 9, 2017

What changes were proposed in this pull request?

Support CREATE [EXTERNAL] TABLE LIKE LOCATION... syntax for Hive serde and datasource tables.
In this PR,we follow SparkSQL design rules :

supporting create table like view or physical table or temporary view with location.
creating a table with location,this table will be an external table other than managed table.

How was this patch tested?

Add new test cases and update existing test cases

@ouyangxiaochen
Copy link
Author

I have created a new PR. Please review it, Thanks! @gatorsmile @cloud-fan

@cloud-fan
Copy link
Contributor

OK to test

val sourceViewName = "tab1"
val targetTabName = "tab2"
var createdTableType = CatalogTableType.MANAGED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can make it more explicitly:

val tableType = if (location.isDefiend) EXTERNAL else MANAGED

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine, I'll update it.

@gatorsmile
Copy link
Member

ok to test

@@ -682,8 +682,8 @@ class HiveDDLSuite
))
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this back?

sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName")
} else {
sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName" +
s" LOCATION '${location.getOrElse("")}'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above logics can be simplified to

val locationClause = if (location.nonEmpty) "LOCATION '${location.getOrElse("")}'" else ""
sql(s"CREATE TABLE $targetTabName LIKE $sourceTabName $locationClause")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same to the other test cases

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@ouyangxiaochen
Copy link
Author

I have run test cases successfully. Please run the test cases again.Thanks a lot! @SparkQA

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Feb 12, 2017

Test build #72745 has finished for PR 16868 at commit 81feda4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -70,12 +71,19 @@ case class CreateTableLikeCommand(
sourceTableDesc.provider
}

// If location is specified, we create an external table internally.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> If the

@@ -70,12 +71,19 @@ case class CreateTableLikeCommand(
sourceTableDesc.provider
}

// If location is specified, we create an external table internally.
// Else create managed table.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, create a

CatalogTableType.MANAGED
} else {
CatalogTableType.EXTERNAL
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shorten it to one line?

  val tblType = if (location.isEmpty) CatalogTableType.MANAGED else CatalogTableType.EXTERNAL

@@ -833,54 +833,95 @@ class HiveDDLSuite
}

test("CREATE TABLE LIKE a temporary view") {
// CREATE TABLE LIKE a temporary view.
withCreateTableLikeTempView(None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

withCreateTableLikeTempView(None) -> withCreateTableLikeTempView(location = None)

withCreateTableLikeTempView(None)

// CREATE TABLE LIKE a temporary view location ...
withTempDir {tmpDir =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{tmpDir -> { tmpDir

}
}
}

test("CREATE TABLE LIKE a data source table") {
// CREATE TABLE LIKE a data source table.
withCreateTableLikeDSTable(None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here.

}
}

test("CREATE TABLE LIKE an external data source table") {
// CREATE TABLE LIKE an external data source table.
withCreateTableLikeExtDSTable(None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here

|LOCATION '$basePath'
""".stripMargin)
|LOCATION '$basePath1'
""".stripMargin)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert it back

for (ds <- Seq("2008-04-08", "2008-04-09"); hr <- Seq("11", "12")) {
sql(
s"""
|INSERT OVERWRITE TABLE $sourceTabName
|partition (ds='$ds',hr='$hr')
|SELECT 1, 'a'
""".stripMargin)
""".stripMargin)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert it back

withTempDir { tmpDir =>
val basePath = tmpDir.toURI
val basePath1 = tmpDir.toURI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert it back?

}
}
}

test("CREATE TABLE LIKE a view") {
// CREATE TABLE LIKE a view.
withCreateTableLikeView(None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here.

// The created table should be a MANAGED table or EXTERNAL table with empty view text
// and original text.
assert(targetTable.tableType == tableType,
s"the created table must be a Hive ${tableType.name} table")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a Hive table? It could be a data source table too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply.Ok, i will follow your suggestion.

@gatorsmile
Copy link
Member

Please update the PR description. This statement can be used for data source tables too.

@tejasapatil
Copy link
Contributor

You have good coverage for DDL statements. How about a test case for reading from external table creating using the command this PR enables ?

@tejasapatil
Copy link
Contributor

BTW: Hive can allow location of external table to be same as the source table. Can you please try that ? It should work. In the test case validations I see this check which will fail in such case:

assert(sourceTable.storage.locationUri != targetTable.storage.locationUri, ....

@gatorsmile
Copy link
Member

Please add a test case based on what @tejasapatil suggested. Thanks!

@ouyangxiaochen
Copy link
Author

I think @tejasapatil's suggestion is reasonable, because the location is specified by users, So the sourceTable.storage.locationUri and targetTable.storage.locationUri can be same or different, Whether we need to be exactly the same as Hive?

@ouyangxiaochen
Copy link
Author

In @tejasapatil's comment, Whether we need to be exactly the same as Hive? @gatorsmile

@tejasapatil
Copy link
Contributor

tejasapatil commented Feb 13, 2017

There are two main uses of EXTERNAL tables I am aware of:

  1. Ingest data from non-hive locations into Hive tables. This can be covered by adding test case for reading from external table creating using the command this PR enables

  2. Create a logical "pointer" to an existing hive table / partition (without creating multiple copies of the underlying data). Testing if the destination table can have the same location as of the source table will cover this.

I don't think Spark's interpretation of external tables is different from Hive's so its OK to support both.

BTW: If you are supporting 1st use case, one can mimic to get behavior of 2nd use case by creating external table with a fake location and later issuing a ALTER TABLE SET LOCATION command to make it point to an existing table's location. There is really no mechanism to guard against having EXTERNAL tables to not point to an existing table / partition in Spark. So, both use cases were already possible in Spark

@ouyangxiaochen
Copy link
Author

Do you mean that we don't need to do check whether the targetTable.storage.locationUri is the same with sourceTable.storage.locationUri or not ? @tejasapatil

@tejasapatil
Copy link
Contributor

we don't need to do check whether the targetTable.storage.locationUri is the same with sourceTable.storage.locationUri

We should not do that check for external tables. But continue doing that for other types of tables.

@ouyangxiaochen
Copy link
Author

Very thoughtful consideration. Thanks for your explanation and suggestion! @tejasapatil what do you think? @gatorsmile @cloud-fan

ouyangxiaochen added 2 commits February 13, 2017 16:18
2. ignore the validation of target table's location when we creating an external table.
@ouyangxiaochen ouyangxiaochen changed the title [SPARK-19115] [SQL] Supporting Create External Table Like Location [SPARK-19115] [SQL] Supporting Create Table Like Location Feb 13, 2017
@gatorsmile
Copy link
Member

ok to test

@gatorsmile
Copy link
Member

Yes, the location can be the same or different from the original table.

LGTM pending test

@SparkQA
Copy link

SparkQA commented Feb 14, 2017

Test build #72830 has finished for PR 16868 at commit 71f4f1c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merging to master.

@asfgit asfgit closed this in 6e45b54 Feb 14, 2017
@ouyangxiaochen ouyangxiaochen deleted the spark19115 branch February 14, 2017 06:27
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
What changes were proposed in this pull request?

Support CREATE [EXTERNAL] TABLE LIKE LOCATION... syntax for Hive serde and datasource tables.
In this PR,we follow SparkSQL design rules :

    supporting create table like view or physical table or temporary view with location.
    creating a table with location,this table will be an external table other than managed table.

How was this patch tested?

Add new test cases and update existing test cases

Author: ouyangxiaochen <ou.yangxiaochen@zte.com.cn>

Closes apache#16868 from ouyangxiaochen/spark19115.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants