-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-19246][SQL]CataLogTable's partitionSchema order and exist check #16606
Conversation
Test build #71451 has finished for PR 16606 at commit
|
Test build #71453 has finished for PR 16606 at commit
|
Test build #71483 has started for PR 16606 at commit |
* keep the schema order with partitionColumnNames | ||
*/ | ||
def partitionSchema: StructType = StructType(partitionColumnNames.flatMap { | ||
p => schema.filter(_.name == p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: code style
xxx.map { p =>
xxx
}
c => partitionColumnNames.contains(c.name) | ||
/** | ||
* schema of this table's partition columns | ||
* keep the schema order with partitionColumnNames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's keep the previous document, I think it's clear enough.
@@ -481,4 +481,27 @@ class PartitionProviderCompatibilitySuite | |||
assert(spark.sql("show partitions test").count() == 5) | |||
} | |||
} | |||
|
|||
test("saveAsTable with inconsistent columns order" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move it to PartitionedWriteSuite
?
retest this please |
Test build #71494 has finished for PR 16606 at commit
|
@@ -184,8 +184,8 @@ case class CatalogTable( | |||
import CatalogTable._ | |||
|
|||
/** schema of this table's partition columns */ | |||
def partitionSchema: StructType = StructType(schema.filter { | |||
c => partitionColumnNames.contains(c.name) | |||
def partitionSchema: StructType = StructType(partitionColumnNames.flatMap { p => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use
xxx.map { p =>
schema.find(_.name == p).getOrElse {
throw ...
}
}
Test build #71497 has finished for PR 16606 at commit
|
Test build #71519 has finished for PR 16606 at commit
|
Test build #71552 has finished for PR 16606 at commit
|
Test build #71554 has finished for PR 16606 at commit
|
@@ -138,6 +138,7 @@ case class CreateDataSourceTableAsSelectCommand( | |||
val tableIdentWithDB = table.identifier.copy(database = Some(db)) | |||
val tableName = tableIdentWithDB.unquotedString | |||
|
|||
var tableWithSchema = table.copy(schema = query.output.toStructType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we set the schema in AnalyzeCreateTable
?
@@ -1374,4 +1377,47 @@ class HiveDDLSuite | |||
assert(e2.message.contains("Hive data source can only be used with tables")) | |||
} | |||
} | |||
|
|||
test("table partition schema should be ordered") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
table partition schema should respect the order of partition columns
|
||
test("table partition schema should be ordered") { | ||
withTable("t", "t1") { | ||
val path = Utils.createTempDir(namePrefix = "t") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use withTempDir
val path = Utils.createTempDir(namePrefix = "t") | ||
val path1 = Utils.createTempDir(namePrefix = "t1") | ||
try { | ||
spark.sql(s""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: code style, please follow existing code: https://github.com/apache/spark/pull/16606/files#diff-b7094baa12601424a5d19cb930e3402fR1255
|create table t (id long, P1 int, P2 int) | ||
|using parquet | ||
|options (path "$path") | ||
|partitioned by (P1, P2)""".stripMargin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test can pass without your changes right? I think we can just keep the below one.
@@ -92,6 +92,30 @@ class PartitionedWriteSuite extends QueryTest with SharedSQLContext { | |||
} | |||
} | |||
|
|||
|
|||
test("saveAsTable with inconsistent columns order" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this test improve the test coverage?
Test build #71580 has started for PR 16606 at commit |
5e60f14
to
206b232
Compare
how about we just add an assert? i.e. |
Test build #71853 has finished for PR 16606 at commit
|
|
||
StructType(schema.filter { c => | ||
partitionColumnNames.contains(c.name) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be shorten to a single line.
Test build #71889 has finished for PR 16606 at commit
|
LGTM pending test |
* schema of this table's partition columns | ||
*/ | ||
def partitionSchema: StructType = { | ||
assert(schema.takeRight(partitionColumnNames.length).map(_.name) == partitionColumnNames) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about
val partitionFields = schema.takeRight(partitionColumnNames.length)
assert(partitionFields.map(_.name) == partitionColumnNames)
StructType(partitionFields)
Test build #71902 has finished for PR 16606 at commit
|
Test build #71912 has started for PR 16606 at commit |
LGTM, pending tests |
retest this please |
Test build #71922 has finished for PR 16606 at commit
|
thanks, merging to master! |
## What changes were proposed in this pull request? CataLogTable's partitionSchema should check if each column name in partitionColumnNames must match one and only one field in schema, if not we should throw an exception and CataLogTable's partitionSchema should keep order with partitionColumnNames ## How was this patch tested? N/A Author: windpiger <songjun@outlook.com> Closes apache#16606 from windpiger/checkPartionColNameWithSchema.
## What changes were proposed in this pull request? CataLogTable's partitionSchema should check if each column name in partitionColumnNames must match one and only one field in schema, if not we should throw an exception and CataLogTable's partitionSchema should keep order with partitionColumnNames ## How was this patch tested? N/A Author: windpiger <songjun@outlook.com> Closes apache#16606 from windpiger/checkPartionColNameWithSchema.
What changes were proposed in this pull request?
CataLogTable's partitionSchema should check if each column name in partitionColumnNames must match one and only one field in schema, if not we should throw an exception
and CataLogTable's partitionSchema should keep order with partitionColumnNames
How was this patch tested?
N/A