-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-19305][SQL] partitioned table should always put partition columns at the end of table schema #16655
Conversation
Test build #71705 has started for PR 16655 at commit |
retest this please |
Test build #71714 has finished for PR 16655 at commit
|
|
||
val columnNames = if (sparkSession.sessionState.conf.caseSensitiveAnalysis) { | ||
schema.map(_.name) | ||
c.copy(tableDesc = normalizedTable, query = Some(reorderedQuery)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding one more check above this line here?
assert(normalizedTable.schema.isEmpty,
"Schema may not be specified in a Create Table As Select (CTAS) statement")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be guaranteed by the parser, but we can check it again here.
LGTM pending test |
LGTM, after this merged, I will contiune the work #16593 thanks~ |
Test build #71754 has finished for PR 16655 at commit
|
thanks for the review, merging to master! |
…mns at the end of table schema ## What changes were proposed in this pull request? For data source tables, we will always reorder the specified table schema, or the query in CTAS, to put partition columns at the end. e.g. `CREATE TABLE t(a int, b int, c int, d int) USING parquet PARTITIONED BY (d, b)` will create a table with schema `<a, c, d, b>` Hive serde tables don't have this problem before, because its CREATE TABLE syntax specifies data schema and partition schema individually. However, after we unifed the CREATE TABLE syntax, Hive serde table also need to do the reorder. This PR puts the reorder logic in a analyzer rule, which works with both data source tables and Hive serde tables. ## How was this patch tested? new regression test Author: Wenchen Fan <wenchen@databricks.com> Closes apache#16655 from cloud-fan/schema.
…mns at the end of table schema ## What changes were proposed in this pull request? For data source tables, we will always reorder the specified table schema, or the query in CTAS, to put partition columns at the end. e.g. `CREATE TABLE t(a int, b int, c int, d int) USING parquet PARTITIONED BY (d, b)` will create a table with schema `<a, c, d, b>` Hive serde tables don't have this problem before, because its CREATE TABLE syntax specifies data schema and partition schema individually. However, after we unifed the CREATE TABLE syntax, Hive serde table also need to do the reorder. This PR puts the reorder logic in a analyzer rule, which works with both data source tables and Hive serde tables. ## How was this patch tested? new regression test Author: Wenchen Fan <wenchen@databricks.com> Closes apache#16655 from cloud-fan/schema.
What changes were proposed in this pull request?
For data source tables, we will always reorder the specified table schema, or the query in CTAS, to put partition columns at the end. e.g.
CREATE TABLE t(a int, b int, c int, d int) USING parquet PARTITIONED BY (d, b)
will create a table with schema<a, c, d, b>
Hive serde tables don't have this problem before, because its CREATE TABLE syntax specifies data schema and partition schema individually.
However, after we unifed the CREATE TABLE syntax, Hive serde table also need to do the reorder. This PR puts the reorder logic in a analyzer rule, which works with both data source tables and Hive serde tables.
How was this patch tested?
new regression test