New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32970][SQL][TEST] Reduce the runtime of an UT for SPARK-32019 #29842
Conversation
@ulysses-you, you were the original author, could you check if the test coverage is the same |
add to whitelist |
)) | ||
assert(table.rdd.partitions.length == 1) | ||
} | ||
withSQLConf(SQLConf.FILES_MAX_PARTITION_BYTES.key -> "2MB") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add the config spark.sql.files.openCostInBytes
? The result is based on it although we don't change the default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Test build #128997 has finished for PR 29842 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #129120 has finished for PR 29842 at commit
|
Could you describe what's a root cause of the slow tests and how-to-fix in the PR description? |
I tried to explain it a bit better. |
SQLConf.FILES_MAX_PARTITION_BYTES.key -> "2MB", | ||
SQLConf.FILES_OPEN_COST_IN_BYTES.key -> String.valueOf(4 * 1024 * 1024)) { | ||
|
||
withSQLConf(SQLConf.FILES_MIN_PARTITION_NUM.key -> "1") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is okay just to update the slow two tests only in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
val partitions = (1 to 800).map(i => s"file$i" -> 4 * 1024 * 1024) | ||
val table = createTable(files = partitions) | ||
assert(table.rdd.partitions.length == 50) | ||
withSQLConf(SQLConf.FILES_MIN_PARTITION_NUM.key -> "8") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8
-> 16
for keeping the original test context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #129188 has finished for PR 29842 at commit
|
Test build #129189 has finished for PR 29842 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged to master. |
What changes were proposed in this pull request?
The UT for SPARK-32019 (#28853) tries to write about 16GB of data do the disk. We must change the value of
spark.sql.files.maxPartitionBytes
to a smaller value do check the correct behavior with less data. By default it is128MB
.The other parameters in this UT are also changed to smaller values to keep the behavior the same.
Why are the changes needed?
The runtime of this one UT can be over 7 minutes on Jenkins. After the change it is few seconds.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Existing UT