New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29563][SQL] CREATE TABLE LIKE should look up catalog/table like v2 commands #26219
[SPARK-29563][SQL] CREATE TABLE LIKE should look up catalog/table like v2 commands #26219
Conversation
Test build #112497 has finished for PR 26219 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
c4ba743
to
ea8ddbc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By using current APIs like loadTable and createTable in TableCatalog, I think we can implement v2 commend for CREATE TABLE LIKE? cc @cloud-fan @rdblue
Yea we can have a v2 CREATE TABLE LIKE command. |
@cloud-fan @viirya So i am not sure what V2 impl would look like ? Are we allowed to define a new semantics for it ? I am asking since "load" was mentioned in the comment. |
Test build #112504 has finished for PR 26219 at commit
|
I think "create table like" just to create target table with same definition like storage, schema, partition columns, etc., as source table. You can check out current CreateTableLikeCommand. #26183 is an example implementing v2 command. Basically we need to create a logical plan and physical plan for the command. Then update ResolveCatalogs rule to convert the statement to logical plan. And convert logical plan to physical plan in DataSourceV2Strategy. |
@viirya Is it possible to create a V2 table with a specific "storage location" and "bucket spec" ? |
I think "bucket spec" is represented by Transform in V2. You can check V2SessionCatalog.convertTransforms. |
@viirya Thanks for the pointer. I will check it. In the meantime, i just quickly checked the v1 behaviour for a couple of cases. When location is not specified, we just seem to create an empty table.
In V2, we will just have the "create table" behaviour ? i.e error out when location is specified ? Lets please decide on the behaviour and i will try to implement it by following the example PR you have mentioned. |
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
Outdated
Show resolved
Hide resolved
ea8ddbc
to
bac4fc6
Compare
Thank you for update, @dilipbiswal . Please resolve the conflicts, too. |
bac4fc6
to
d4cf00f
Compare
Test build #112629 has finished for PR 26219 at commit
|
Test build #112627 has finished for PR 26219 at commit
|
test("CREATE TABLE LIKE") { | ||
val targetTable = "testcat.ns1.ns2.tbl1" | ||
val sourceTable = "testcat.ns1.ns2.tbl2" | ||
withTable(targetTable, sourceTable) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is withTable
needed since tables are not created?
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
Outdated
Show resolved
Hide resolved
Test build #112632 has finished for PR 26219 at commit
|
Test build #112647 has finished for PR 26219 at commit
|
retest this please |
Test build #112654 has finished for PR 26219 at commit
|
c3fe33f
to
01be1e7
Compare
Hi, @cloud-fan . Is this PR aiming the following too? Or, do we have a separate JIRA issue for that?
|
Test build #112685 has finished for PR 26219 at commit
|
01be1e7
to
a02868b
Compare
@viirya @cloud-fan @dongjoon-hyun I am assuming we do need a V2 implementation for this command. I have made an attempt. Since i am not entirely familiar with the V2 frame work, i may be missing something. Below are some details:
Please let me know what you guys think. |
This command is tricky as it involves 2 tables: source and target. When we need to create the target table in the session catalog, shall we create a v1 or v2 table? For CREATE TABLE, the rule is simple: we check the provider and create v2 tables if it's a v2 provider. For this case, it's complicated as the source table may be a v2 table and doesn't have a provider. My proposal: if the source table doesn't have a provider, use the default provider (set by In summary:
v2 CREATE TABLE LIKE implementation: get schema and table properties from the source table, and create the target table. |
@cloud-fan Thanks !! I did wonder about cross catalog operation as well. I have a question -
So in the case where source is a V1 table and target is a V2 table, we are not able to copy some of the V1 metadata like serde, compressed, inputformat, outputformat etc. Right ? Is that okay ? I guess because of this confusion i thought may be we should restrict it to V1->V1 and V2->V2. |
I think it's OK. It reminds me that we should not blindly copy all table properties. We should get partition column, bucket spec from the source table and convert them to table properties. |
Test build #113252 has finished for PR 26219 at commit
|
@cloud-fan Thank you. I will try to implement per ur suggestion. |
a02868b
to
4c772e5
Compare
Test build #113554 has finished for PR 26219 at commit
|
Test build #113563 has finished for PR 26219 at commit
|
def validateLocation(loc: Option[String]) = { | ||
if (loc.isDefined) { | ||
throw new AnalysisException("Location clause not supported for " + | ||
"CREATE TABLE LIKE statement when tables are of V2 type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can support the LOCATION clause. See CatalogV2Utils.convertTableProperties
, we can store the location in a special table property location
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan Thanks a lot. I was not aware of this. I will check.
Some(sCatalog.asTableCatalog), | ||
s, | ||
ifNotExists) | ||
case (NonSessionCatalog(tCatalog, t), SessionCatalog(sCatalog, s)) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need to catch session catalog, we should move the case to ResolveSessionCatalog
. It's OK to create v2 command in ResolveSessionCatalog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan NonSessionCatalog is not available in ResolveSessionCatalog, right ? this case catches both NonSessionCatalog and SessionCatalog ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok let's leave it.
validateLocation(loc) | ||
CreateTableLike(tCatalog.asTableCatalog, | ||
t, | ||
None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we pass sCatalog
in ?
targetCatalog: TableCatalog, | ||
targetTableName: Seq[String], | ||
sourceCatalog: Option[TableCatalog], | ||
sourceTableName: Seq[String], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the source table, what we really care is the table itself, not which catalog it comes from. I think it's better to define the plan as
case class CreateTableLike(
targetCatalog: TableCatalog,
targetTableName: Seq[String],
sourceTable: NamedRelation,
location: Option[String],
provider: Option[String],
ifNotExists: Boolean)
In the planner, we match CreateTableLike(..., r: DataSourceV2Relation, ..)
, and create the physical plan with source table r.table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan I am a bit confused. The source can be both V1 or V2, right ? So how can we expect a DataSourceV2Relation as source ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the v1 table has a v2 adapter called V1Table
. In ResolveCatalogs
, we can lookup table from session catalog, create a DataSourceV2Relation
, and pass it into CreateTableLike
.
Test build #118598 has finished for PR 26219 at commit
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
Change to make sure
CREATE TABLES LIKE
statement go through the same catalog/table resolution framework of v2 commands.Why are the changes needed?
It's important to make all the commands have the same table resolution behavior, to avoid confusing end-users.
Does this PR introduce any user-facing change?
Yes. Attempting to execute
CREATE TABLE LIKE
on v2 catalog results in an error.How was this patch tested?
Added unit tests.