Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29563][SQL] CREATE TABLE LIKE should look up catalog/table like v2 commands #26219

Closed

Conversation

dilipbiswal
Copy link
Contributor

What changes were proposed in this pull request?

Change to make sure CREATE TABLES LIKE statement go through the same catalog/table resolution framework of v2 commands.

Why are the changes needed?

It's important to make all the commands have the same table resolution behavior, to avoid confusing end-users.

Does this PR introduce any user-facing change?

Yes. Attempting to execute CREATE TABLE LIKE on v2 catalog results in an error.

How was this patch tested?

Added unit tests.

@dilipbiswal dilipbiswal changed the title [SPARK-29563] CREATE TABLE LIKE should look up catalog/table like v2 commands [SPARK-29563][SQL] CREATE TABLE LIKE should look up catalog/table like v2 commands Oct 23, 2019
@SparkQA
Copy link

SparkQA commented Oct 23, 2019

Test build #112497 has finished for PR 26219 at commit c4ba743.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
  • case class CreateTableLikeStatement(

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By using current APIs like loadTable and createTable in TableCatalog, I think we can implement v2 commend for CREATE TABLE LIKE? cc @cloud-fan @rdblue

@cloud-fan
Copy link
Contributor

Yea we can have a v2 CREATE TABLE LIKE command.

@dilipbiswal
Copy link
Contributor Author

dilipbiswal commented Oct 23, 2019

@cloud-fan @viirya
Per my understanding from the code (please correct me). "create table like" does not seem to copy the data ? If locationspec is specified, then it simply carries that over to the target table.

So i am not sure what V2 impl would look like ? Are we allowed to define a new semantics for it ? I am asking since "load" was mentioned in the comment.

@SparkQA
Copy link

SparkQA commented Oct 23, 2019

Test build #112504 has finished for PR 26219 at commit ea8ddbc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Oct 23, 2019

I think "create table like" just to create target table with same definition like storage, schema, partition columns, etc., as source table. You can check out current CreateTableLikeCommand.

#26183 is an example implementing v2 command. Basically we need to create a logical plan and physical plan for the command. Then update ResolveCatalogs rule to convert the statement to logical plan. And convert logical plan to physical plan in DataSourceV2Strategy.

@dilipbiswal
Copy link
Contributor Author

@viirya Is it possible to create a V2 table with a specific "storage location" and "bucket spec" ?

@viirya
Copy link
Member

viirya commented Oct 23, 2019

I think "bucket spec" is represented by Transform in V2. You can check V2SessionCatalog.convertTransforms.

@dilipbiswal
Copy link
Contributor Author

@viirya Thanks for the pointer. I will check it. In the meantime, i just quickly checked the v1 behaviour for a couple of cases. When location is not specified, we just seem to create an empty table.

spark-sql> select * from s1;
1	2
1	2

spark-sql> create table s4 like s1;
Time taken: 0.291 seconds
spark-sql> select * from s4;
Time taken: 0.094 seconds

spark-sql> create table s5 like s1 location 'file:/Users/dilipbiswal/mygit/apache/spark/spark-warehouse/s1'

spark-sql> select * from s5;
1	2
1	2
Time taken: 0.1 seconds, Fetched 2 row(s)

In V2, we will just have the "create table" behaviour ? i.e error out when location is specified ? Lets please decide on the behaviour and i will try to implement it by following the example PR you have mentioned.

@dongjoon-hyun
Copy link
Member

Thank you for update, @dilipbiswal . Please resolve the conflicts, too.

@SparkQA
Copy link

SparkQA commented Oct 25, 2019

Test build #112629 has finished for PR 26219 at commit d4cf00f.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class CreateTableLikeStatement(

@SparkQA
Copy link

SparkQA commented Oct 25, 2019

Test build #112627 has finished for PR 26219 at commit bac4fc6.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
  • case class CreateTableLikeStatement(

test("CREATE TABLE LIKE") {
val targetTable = "testcat.ns1.ns2.tbl1"
val sourceTable = "testcat.ns1.ns2.tbl2"
withTable(targetTable, sourceTable) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is withTable needed since tables are not created?

@SparkQA
Copy link

SparkQA commented Oct 25, 2019

Test build #112632 has finished for PR 26219 at commit bff1751.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 25, 2019

Test build #112647 has finished for PR 26219 at commit c3fe33f.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Oct 25, 2019

Test build #112654 has finished for PR 26219 at commit c3fe33f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Hi, @cloud-fan . Is this PR aiming the following too? Or, do we have a separate JIRA issue for that?

Yea we can have a v2 CREATE TABLE LIKE command.

@SparkQA
Copy link

SparkQA commented Oct 25, 2019

Test build #112685 has finished for PR 26219 at commit 01be1e7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor Author

dilipbiswal commented Nov 5, 2019

@viirya @cloud-fan @dongjoon-hyun

I am assuming we do need a V2 implementation for this command. I have made an attempt. Since i am not entirely familiar with the V2 frame work, i may be missing something. Below are some details:

  • V2 implementation is constrained by what is supported in createTable. So things like, location, storage specs etc are not considered.
  • V1 implementation can optionally create a table by pointing to a location specified by user. Which means it can create a table which is populated with data. V2 implementation creates an empty table with just the definition from the source table.

Please let me know what you guys think.

@cloud-fan
Copy link
Contributor

cloud-fan commented Nov 5, 2019

This command is tricky as it involves 2 tables: source and target. When we need to create the target table in the session catalog, shall we create a v1 or v2 table?

For CREATE TABLE, the rule is simple: we check the provider and create v2 tables if it's a v2 provider.

For this case, it's complicated as the source table may be a v2 table and doesn't have a provider. My proposal: if the source table doesn't have a provider, use the default provider (set by spark.sql.sources.default). Then follow CREATE TABLE and create v2 table if provider is v2.

In summary:

  1. resolve the source table first.
  2. if target table needs to be created in a non-session catalog, use v2 CREATE TABLE LIKE
  3. if target table needs to be created in the session catalog, check the provider of the source table or use the default provider. If it's a v2 provider, use v2 CREATE TABLE LIKE, otherwise use the v1 command.

v2 CREATE TABLE LIKE implementation: get schema and table properties from the source table, and create the target table.

@dilipbiswal
Copy link
Contributor Author

dilipbiswal commented Nov 5, 2019

@cloud-fan Thanks !! I did wonder about cross catalog operation as well. I have a question -
Below is the description of the function CreateTableLikeCommand

/**
 * A command to create a table with the same definition of the given existing table.
 * In the target table definition, the table comment is always empty but the column comments
 * are identical to the ones defined in the source table.
 *
 * The CatalogTable attributes copied from the source table are storage(inputFormat, outputFormat,
 * serde, compressed, properties), schema, provider, partitionColumnNames, bucketSpec.
 *

So in the case where source is a V1 table and target is a V2 table, we are not able to copy some of the V1 metadata like serde, compressed, inputformat, outputformat etc. Right ? Is that okay ? I guess because of this confusion i thought may be we should restrict it to V1->V1 and V2->V2.

@cloud-fan
Copy link
Contributor

I think it's OK. It reminds me that we should not blindly copy all table properties. We should get partition column, bucket spec from the source table and convert them to table properties.

@SparkQA
Copy link

SparkQA commented Nov 5, 2019

Test build #113252 has finished for PR 26219 at commit a02868b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor Author

@cloud-fan Thank you. I will try to implement per ur suggestion.

@SparkQA
Copy link

SparkQA commented Nov 11, 2019

Test build #113554 has finished for PR 26219 at commit 4c772e5.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 11, 2019

Test build #113563 has finished for PR 26219 at commit 11ea11b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

def validateLocation(loc: Option[String]) = {
if (loc.isDefined) {
throw new AnalysisException("Location clause not supported for " +
"CREATE TABLE LIKE statement when tables are of V2 type")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can support the LOCATION clause. See CatalogV2Utils.convertTableProperties, we can store the location in a special table property location.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan Thanks a lot. I was not aware of this. I will check.

Some(sCatalog.asTableCatalog),
s,
ifNotExists)
case (NonSessionCatalog(tCatalog, t), SessionCatalog(sCatalog, s)) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to catch session catalog, we should move the case to ResolveSessionCatalog. It's OK to create v2 command in ResolveSessionCatalog

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan NonSessionCatalog is not available in ResolveSessionCatalog, right ? this case catches both NonSessionCatalog and SessionCatalog ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok let's leave it.

validateLocation(loc)
CreateTableLike(tCatalog.asTableCatalog,
t,
None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we pass sCatalog in ?

targetCatalog: TableCatalog,
targetTableName: Seq[String],
sourceCatalog: Option[TableCatalog],
sourceTableName: Seq[String],
Copy link
Contributor

@cloud-fan cloud-fan Nov 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the source table, what we really care is the table itself, not which catalog it comes from. I think it's better to define the plan as

case class CreateTableLike(
    targetCatalog: TableCatalog,
    targetTableName: Seq[String],
    sourceTable: NamedRelation,
    location: Option[String],
    provider: Option[String],
    ifNotExists: Boolean)

In the planner, we match CreateTableLike(..., r: DataSourceV2Relation, ..), and create the physical plan with source table r.table

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan I am a bit confused. The source can be both V1 or V2, right ? So how can we expect a DataSourceV2Relation as source ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the v1 table has a v2 adapter called V1Table. In ResolveCatalogs, we can lookup table from session catalog, create a DataSourceV2Relation, and pass it into CreateTableLike.

@SparkQA
Copy link

SparkQA commented Feb 17, 2020

Test build #118598 has finished for PR 26219 at commit 11ea11b.

  • This patch fails build dependency tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jul 26, 2020
@github-actions github-actions bot closed this Jul 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants