Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35327][SQL][TESTS] Filters out the TPC-DS queries that can cause flaky test results #32454

Closed
wants to merge 3 commits into from

Conversation

maropu
Copy link
Member

@maropu maropu commented May 6, 2021

What changes were proposed in this pull request?

This PR proposes to filter out TPCDS v1.4 q6 and q75 in TPCDSQueryTestSuite.

I sawTPCDSQueryTestSuite failed nondeterministically because output row orders were different with those in the golden files. For example, the failure in the GA job, https://github.com/linhongliu-db/spark/runs/2507928605?check_suite_focus=true, happened because the tpcds/q6.sql query output rows were only sorted by cnt:


Actually, tpcds/q6.sql and tpcds-v2.7.0/q6.sql are almost the same and the only difference is that tpcds-v2.7.0/q6.sql sorts both cnt and a.ca_state:

So, I think it's okay just to test tpcds-v2.7.0/q6.sql in this case (q75 has the same issue).

Why are the changes needed?

For stable testing.

Does this PR introduce any user-facing change?

No, dev-only.

How was this patch tested?

GA passed.

@github-actions github-actions bot added the SQL label May 6, 2021
@SparkQA
Copy link

SparkQA commented May 6, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42734/

@SparkQA
Copy link

SparkQA commented May 6, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42734/

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but I'm not sure why do we need this merging, @maropu . Are we dropping TPCDS v1.4 gradually?

@SparkQA
Copy link

SparkQA commented May 6, 2021

Test build #138212 has finished for PR 32454 at commit 03f731c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented May 7, 2021

Sorry but I'm not sure why do we need this merging, @maropu . Are we dropping TPCDS v1.4 gradually?

Ah, on second thought, it is okay just to filter out these queries in TPCDSQueryTestSuite. How about the current one?

@maropu maropu changed the title [SPARK-35327][SQL][TESTS] Merge similar v1.4/v2.7 TPCDS queries [SPARK-35327][SQL][TESTS] Filters out the TPC-DS queries that can cause flaky test results May 7, 2021
@SparkQA
Copy link

SparkQA commented May 7, 2021

Test build #138222 has finished for PR 32454 at commit 18c6875.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 7, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42744/

maropu added 3 commits May 7, 2021 10:22
This reverts commit 03f731ce0d658bfb8f9506af73ff2d60b2e85917.
@SparkQA
Copy link

SparkQA commented May 7, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42748/

@SparkQA
Copy link

SparkQA commented May 7, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42748/

@SparkQA
Copy link

SparkQA commented May 7, 2021

Test build #138226 has finished for PR 32454 at commit 386d666.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented May 8, 2021

cc: @HyukjinKwon

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the purpose of this PR. +1.

cc @gatorsmile and @cloud-fan too since this has been here for a long time.

@maropu maropu closed this in 06c4009 May 8, 2021
@maropu
Copy link
Member Author

maropu commented May 8, 2021

Thank you, @dongjoon-hyun ~ Merged to master.

maropu added a commit to maropu/spark that referenced this pull request May 8, 2021
…se flaky test results

This PR proposes to filter out TPCDS v1.4 q6 and q75 in `TPCDSQueryTestSuite`.

I saw`TPCDSQueryTestSuite` failed nondeterministically because output row orders were different with those in the golden files. For example, the failure in the GA job, https://github.com/linhongliu-db/spark/runs/2507928605?check_suite_focus=true, happened because the `tpcds/q6.sql` query output rows were only sorted by `cnt`:

https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds/q6.sql#L20
Actually, `tpcds/q6.sql`  and `tpcds-v2.7.0/q6.sql` are almost the same and the only difference is that `tpcds-v2.7.0/q6.sql` sorts both `cnt` and `a.ca_state`:
https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds-v2.7.0/q6.sql#L22
So, I think it's okay just to test `tpcds-v2.7.0/q6.sql` in this case (q75 has the same issue).

For stable testing.

No, dev-only.

GA passed.

Closes apache#32454 from maropu/CleanUpTpcdsQueries.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
maropu added a commit to maropu/spark that referenced this pull request May 8, 2021
…se flaky test results

This PR proposes to filter out TPCDS v1.4 q6 and q75 in `TPCDSQueryTestSuite`.

I saw`TPCDSQueryTestSuite` failed nondeterministically because output row orders were different with those in the golden files. For example, the failure in the GA job, https://github.com/linhongliu-db/spark/runs/2507928605?check_suite_focus=true, happened because the `tpcds/q6.sql` query output rows were only sorted by `cnt`:

https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds/q6.sql#L20
Actually, `tpcds/q6.sql`  and `tpcds-v2.7.0/q6.sql` are almost the same and the only difference is that `tpcds-v2.7.0/q6.sql` sorts both `cnt` and `a.ca_state`:
https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds-v2.7.0/q6.sql#L22
So, I think it's okay just to test `tpcds-v2.7.0/q6.sql` in this case (q75 has the same issue).

For stable testing.

No, dev-only.

GA passed.

Closes apache#32454 from maropu/CleanUpTpcdsQueries.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
@@ -24,7 +24,7 @@ import org.apache.spark.sql.test.SharedSparkSession
trait TPCDSBase extends SharedSparkSession with TPCDSSchema {

// The TPCDS queries below are based on v1.4
val tpcdsQueries = Seq(
def tpcdsQueries: Seq[String] = Seq(
"q1", "q2", "q3", "q4", "q5", "q6", "q7", "q8", "q9", "q10", "q11",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we remove q6 from here for all the tests, if the only difference is an extra order by column?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I'll check it and make a PR to fix it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #32520

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants