[SPARK-32970][SQL][TEST] Reduce the runtime of an UT for SPARK-32019 #29842

tanelk · 2020-09-22T19:17:36Z

What changes were proposed in this pull request?

The UT for SPARK-32019 (#28853) tries to write about 16GB of data do the disk. We must change the value of spark.sql.files.maxPartitionBytes to a smaller value do check the correct behavior with less data. By default it is 128MB.
The other parameters in this UT are also changed to smaller values to keep the behavior the same.

Why are the changes needed?

The runtime of this one UT can be over 7 minutes on Jenkins. After the change it is few seconds.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing UT

tanelk · 2020-09-22T19:19:39Z

@ulysses-you, you were the original author, could you check if the test coverage is the same
@dongjoon-hyun and @cloud-fan you requested the extra tests in the original PR

HyukjinKwon · 2020-09-22T23:28:23Z

add to whitelist

ulysses-you · 2020-09-23T00:22:19Z

...core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala

-        ))
-      assert(table.rdd.partitions.length == 1)
-    }
+    withSQLConf(SQLConf.FILES_MAX_PARTITION_BYTES.key -> "2MB") {


Can we add the config spark.sql.files.openCostInBytes ? The result is based on it although we don't change the default value.

SparkQA · 2020-09-23T04:23:34Z

Test build #128997 has finished for PR 29842 at commit 45dee3f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-09-25T18:05:02Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33739/

SparkQA · 2020-09-25T18:21:54Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33739/

SparkQA · 2020-09-25T21:46:34Z

Test build #129120 has finished for PR 29842 at commit d71ad45.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-09-28T07:05:13Z

Could you describe what's a root cause of the slow tests and how-to-fix in the PR description?

tanelk · 2020-09-28T08:40:34Z

Could you describe what's a root cause of the slow tests and how-to-fix in the PR description?

I tried to explain it a bit better.

maropu · 2020-09-28T11:45:56Z

...core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala

+      SQLConf.FILES_MAX_PARTITION_BYTES.key -> "2MB",
+      SQLConf.FILES_OPEN_COST_IN_BYTES.key -> String.valueOf(4 * 1024 * 1024)) {
+
+      withSQLConf(SQLConf.FILES_MIN_PARTITION_NUM.key -> "1") {


I think it is okay just to update the slow two tests only in this PR.

maropu · 2020-09-28T14:04:46Z

...core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala

-      val partitions = (1 to 800).map(i => s"file$i" -> 4 * 1024 * 1024)
-      val table = createTable(files = partitions)
-      assert(table.rdd.partitions.length == 50)
+      withSQLConf(SQLConf.FILES_MIN_PARTITION_NUM.key -> "8") {


8 -> 16 for keeping the original test context?

SparkQA · 2020-09-28T14:07:54Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33801/

SparkQA · 2020-09-28T14:25:54Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33801/

SparkQA · 2020-09-28T15:25:17Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33804/

SparkQA · 2020-09-28T15:49:40Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33804/

SparkQA · 2020-09-28T18:04:56Z

Test build #129188 has finished for PR 29842 at commit 4f386f4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-09-28T19:06:18Z

Test build #129189 has finished for PR 29842 at commit 42020ee.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu

cc: @HyukjinKwon @dongjoon-hyun

HyukjinKwon · 2020-09-29T07:51:28Z

Merged to master.

Reduce the runtime of an UT

45dee3f

probot-autolabeler bot added the SQL label Sep 22, 2020

ulysses-you reviewed Sep 23, 2020

View reviewed changes

Add FILES_OPEN_COST_IN_BYTES conf to UT

d71ad45

maropu reviewed Sep 28, 2020

View reviewed changes

maropu changed the title ~~[SPARK-32970][TEST] Reduce the runtime of an UT for SPARK-32019~~ [SPARK-32970][SQL][TEST] Reduce the runtime of an UT for SPARK-32019 Sep 28, 2020

Move config closer to slow tests

4f386f4

maropu reviewed Sep 28, 2020

View reviewed changes

8 -> 16

42020ee

maropu approved these changes Sep 29, 2020

View reviewed changes

HyukjinKwon approved these changes Sep 29, 2020

View reviewed changes

HyukjinKwon closed this in 90e86f6 Sep 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-32970][SQL][TEST] Reduce the runtime of an UT for SPARK-32019 #29842

[SPARK-32970][SQL][TEST] Reduce the runtime of an UT for SPARK-32019 #29842

tanelk commented Sep 22, 2020 •

edited

tanelk commented Sep 22, 2020

HyukjinKwon commented Sep 22, 2020

ulysses-you Sep 23, 2020

tanelk Sep 25, 2020

SparkQA commented Sep 23, 2020

SparkQA commented Sep 25, 2020

SparkQA commented Sep 25, 2020

SparkQA commented Sep 25, 2020

maropu commented Sep 28, 2020

tanelk commented Sep 28, 2020

maropu Sep 28, 2020

tanelk Sep 28, 2020

maropu Sep 28, 2020

tanelk Sep 28, 2020

SparkQA commented Sep 28, 2020

SparkQA commented Sep 28, 2020

SparkQA commented Sep 28, 2020

SparkQA commented Sep 28, 2020

SparkQA commented Sep 28, 2020

SparkQA commented Sep 28, 2020

maropu left a comment

HyukjinKwon commented Sep 29, 2020

[SPARK-32970][SQL][TEST] Reduce the runtime of an UT for SPARK-32019 #29842

[SPARK-32970][SQL][TEST] Reduce the runtime of an UT for SPARK-32019 #29842

Conversation

tanelk commented Sep 22, 2020 • edited

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

tanelk commented Sep 22, 2020

HyukjinKwon commented Sep 22, 2020

ulysses-you Sep 23, 2020

Choose a reason for hiding this comment

tanelk Sep 25, 2020

Choose a reason for hiding this comment

SparkQA commented Sep 23, 2020

SparkQA commented Sep 25, 2020

SparkQA commented Sep 25, 2020

SparkQA commented Sep 25, 2020

maropu commented Sep 28, 2020

tanelk commented Sep 28, 2020

maropu Sep 28, 2020

Choose a reason for hiding this comment

tanelk Sep 28, 2020

Choose a reason for hiding this comment

maropu Sep 28, 2020

Choose a reason for hiding this comment

tanelk Sep 28, 2020

Choose a reason for hiding this comment

SparkQA commented Sep 28, 2020

SparkQA commented Sep 28, 2020

SparkQA commented Sep 28, 2020

SparkQA commented Sep 28, 2020

SparkQA commented Sep 28, 2020

SparkQA commented Sep 28, 2020

maropu left a comment

Choose a reason for hiding this comment

HyukjinKwon commented Sep 29, 2020

tanelk commented Sep 22, 2020 •

edited