[SPARK-35327][SQL][TESTS] Filters out the TPC-DS queries that can cause flaky test results #32454

maropu · 2021-05-06T13:57:17Z

What changes were proposed in this pull request?

This PR proposes to filter out TPCDS v1.4 q6 and q75 in TPCDSQueryTestSuite.

I sawTPCDSQueryTestSuite failed nondeterministically because output row orders were different with those in the golden files. For example, the failure in the GA job, https://github.com/linhongliu-db/spark/runs/2507928605?check_suite_focus=true, happened because the tpcds/q6.sql query output rows were only sorted by cnt:

spark/sql/core/src/test/resources/tpcds/q6.sql

Line 20 in a0c76a8

ORDER BY cnt

Actually, tpcds/q6.sql and tpcds-v2.7.0/q6.sql are almost the same and the only difference is that tpcds-v2.7.0/q6.sql sorts both cnt and a.ca_state:

spark/sql/core/src/test/resources/tpcds-v2.7.0/q6.sql

Line 22 in a0c76a8

order by cnt, a.ca_state

So, I think it's okay just to test tpcds-v2.7.0/q6.sql in this case (q75 has the same issue).

Why are the changes needed?

For stable testing.

Does this PR introduce any user-facing change?

No, dev-only.

How was this patch tested?

GA passed.

SparkQA · 2021-05-06T14:46:54Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42734/

SparkQA · 2021-05-06T14:51:05Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42734/

dongjoon-hyun

Sorry but I'm not sure why do we need this merging, @maropu . Are we dropping TPCDS v1.4 gradually?

SparkQA · 2021-05-06T18:08:09Z

Test build #138212 has finished for PR 32454 at commit 03f731c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2021-05-07T00:42:16Z

Sorry but I'm not sure why do we need this merging, @maropu . Are we dropping TPCDS v1.4 gradually?

Ah, on second thought, it is okay just to filter out these queries in TPCDSQueryTestSuite. How about the current one?

SparkQA · 2021-05-07T01:09:45Z

Test build #138222 has finished for PR 32454 at commit 18c6875.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-05-07T01:15:59Z

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42744/

This reverts commit 03f731ce0d658bfb8f9506af73ff2d60b2e85917.

SparkQA · 2021-05-07T02:48:46Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42748/

SparkQA · 2021-05-07T02:48:48Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42748/

SparkQA · 2021-05-07T06:22:27Z

Test build #138226 has finished for PR 32454 at commit 386d666.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2021-05-08T04:39:59Z

cc: @HyukjinKwon

dongjoon-hyun

I agree with the purpose of this PR. +1.

cc @gatorsmile and @cloud-fan too since this has been here for a long time.

maropu · 2021-05-08T12:43:57Z

Thank you, @dongjoon-hyun ~ Merged to master.

…se flaky test results This PR proposes to filter out TPCDS v1.4 q6 and q75 in `TPCDSQueryTestSuite`. I saw`TPCDSQueryTestSuite` failed nondeterministically because output row orders were different with those in the golden files. For example, the failure in the GA job, https://github.com/linhongliu-db/spark/runs/2507928605?check_suite_focus=true, happened because the `tpcds/q6.sql` query output rows were only sorted by `cnt`: https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds/q6.sql#L20 Actually, `tpcds/q6.sql` and `tpcds-v2.7.0/q6.sql` are almost the same and the only difference is that `tpcds-v2.7.0/q6.sql` sorts both `cnt` and `a.ca_state`: https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds-v2.7.0/q6.sql#L22 So, I think it's okay just to test `tpcds-v2.7.0/q6.sql` in this case (q75 has the same issue). For stable testing. No, dev-only. GA passed. Closes apache#32454 from maropu/CleanUpTpcdsQueries. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>

cloud-fan · 2021-05-10T06:43:52Z

sql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala

@@ -24,7 +24,7 @@ import org.apache.spark.sql.test.SharedSparkSession
 trait TPCDSBase extends SharedSparkSession with TPCDSSchema {

  // The TPCDS queries below are based on v1.4
-  val tpcdsQueries = Seq(
+  def tpcdsQueries: Seq[String] = Seq(
    "q1", "q2", "q3", "q4", "q5", "q6", "q7", "q8", "q9", "q10", "q11",


shall we remove q6 from here for all the tests, if the only difference is an extra order by column?

okay, I'll check it and make a PR to fix it.

github-actions bot added the SQL label May 6, 2021

dongjoon-hyun reviewed May 6, 2021

View reviewed changes

maropu changed the title ~~[SPARK-35327][SQL][TESTS] Merge similar v1.4/v2.7 TPCDS queries~~ [SPARK-35327][SQL][TESTS] Filters out the TPC-DS queries that can cause flaky test results May 7, 2021

maropu added 3 commits May 7, 2021 10:22

Fix

b384bba

Revert "Fix"

92493c8

This reverts commit 03f731ce0d658bfb8f9506af73ff2d60b2e85917.

Fix

386d666

maropu force-pushed the CleanUpTpcdsQueries branch from 18c6875 to 386d666 Compare May 7, 2021 01:29

dongjoon-hyun approved these changes May 8, 2021

View reviewed changes

maropu closed this in 06c4009 May 8, 2021

cloud-fan reviewed May 10, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-35327][SQL][TESTS] Filters out the TPC-DS queries that can cause flaky test results #32454

[SPARK-35327][SQL][TESTS] Filters out the TPC-DS queries that can cause flaky test results #32454

maropu commented May 6, 2021 •

edited

Loading

SparkQA commented May 6, 2021

SparkQA commented May 6, 2021

dongjoon-hyun left a comment

SparkQA commented May 6, 2021

maropu commented May 7, 2021

SparkQA commented May 7, 2021

SparkQA commented May 7, 2021

SparkQA commented May 7, 2021

SparkQA commented May 7, 2021

SparkQA commented May 7, 2021

maropu commented May 8, 2021

dongjoon-hyun left a comment

maropu commented May 8, 2021

cloud-fan May 10, 2021

maropu May 11, 2021

maropu May 12, 2021

[SPARK-35327][SQL][TESTS] Filters out the TPC-DS queries that can cause flaky test results #32454

[SPARK-35327][SQL][TESTS] Filters out the TPC-DS queries that can cause flaky test results #32454

Conversation

maropu commented May 6, 2021 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

SparkQA commented May 6, 2021

SparkQA commented May 6, 2021

dongjoon-hyun left a comment

Choose a reason for hiding this comment

SparkQA commented May 6, 2021

maropu commented May 7, 2021

SparkQA commented May 7, 2021

SparkQA commented May 7, 2021

SparkQA commented May 7, 2021

SparkQA commented May 7, 2021

SparkQA commented May 7, 2021

maropu commented May 8, 2021

dongjoon-hyun left a comment

Choose a reason for hiding this comment

maropu commented May 8, 2021

cloud-fan May 10, 2021

Choose a reason for hiding this comment

maropu May 11, 2021

Choose a reason for hiding this comment

maropu May 12, 2021

Choose a reason for hiding this comment

maropu commented May 6, 2021 •

edited

Loading