[SPARK-37194][SQL] Avoid unnecessary sort in v1 write if it's not dynamic partition#37290
[SPARK-37194][SQL] Avoid unnecessary sort in v1 write if it's not dynamic partition#37290ulysses-you wants to merge 3 commits intoapache:masterfrom
Conversation
c21
left a comment
There was a problem hiding this comment.
Mostly LGTM from my side with minor comments. Thanks @ulysses-you
| options: Map[String, String], | ||
| numStaticPartitions: Int = 0) | ||
| : Set[String] = { | ||
| assert(partitionColumns.size >= numStaticPartitions) |
There was a problem hiding this comment.
nit: would require() be better?
| options: Map[String, String]): Seq[SortOrder] = { | ||
| options: Map[String, String], | ||
| numStaticPartitions: Int = 0): Seq[SortOrder] = { | ||
| assert(partitionColumns.size >= numStaticPartitions) |
| */ | ||
| private[sql] var outputOrderingMatched: Boolean = false | ||
|
|
||
| // scalastyle:off argcount |
There was a problem hiding this comment.
nit: we can pass in a wrapper class PartitionSpec(partitionColumns: Seq[Attribute], numStaticPartitions: Int) to avoid this.
There was a problem hiding this comment.
I used a new parameter with default value is for compatible with downstream project as far as possible, does it make sense to you ?
| |""".stripMargin) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
would it be good to have one more unit test for no static columns?
There was a problem hiding this comment.
the previous contains similar test, but I'd like to add it
| statsTrackers: Seq[WriteJobStatsTracker], | ||
| options: Map[String, String]) | ||
| options: Map[String, String], | ||
| numStaticPartitions: Int = 0) |
viirya
left a comment
There was a problem hiding this comment.
Looks good to me.
For v1 write, InsertIntoHadoopFsRelationCommand is the only case which adds a local sort even if the partition column is static.
Do you mean that InsertIntoHadoopFsRelationCommand is the only usecase for now for numStaticPartitions?
|
@viirya yes, the default of The reason is:
|
|
thanks, merging to master! |
|
thank you all |
What changes were proposed in this pull request?
This is a rework for #34468, since we pull out v1write required ordering.
This prs add a new parameter numStaticPartitions to v1write and FileFormatWriter so we can skip unnecessary local sort for static partition write.
Why are the changes needed?
The v1 write requires ordering for dynamic partition, bucket expression and sort column during writing. The reason is the
DynamicPartitionDataSingleWriterandDynamicPartitionDataConcurrentWriterassume the partition and bucket columns are continuous. Then if partition column is static, it's unnecessary to do the local sort.For v1 write,
InsertIntoHadoopFsRelationCommandis the only case which adds a local sort even if the partition column is static.Does this PR introduce any user-facing change?
no, only improve performance
How was this patch tested?
add test