-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-35327][SQL][TESTS] Filters out the TPC-DS queries that can cause flaky test results #32454
Conversation
Kubernetes integration test starting |
Kubernetes integration test status failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but I'm not sure why do we need this merging, @maropu . Are we dropping TPCDS v1.4
gradually?
Test build #138212 has finished for PR 32454 at commit
|
Ah, on second thought, it is okay just to filter out these queries in |
Test build #138222 has finished for PR 32454 at commit
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
18c6875
to
386d666
Compare
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #138226 has finished for PR 32454 at commit
|
cc: @HyukjinKwon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the purpose of this PR. +1.
cc @gatorsmile and @cloud-fan too since this has been here for a long time.
Thank you, @dongjoon-hyun ~ Merged to master. |
…se flaky test results This PR proposes to filter out TPCDS v1.4 q6 and q75 in `TPCDSQueryTestSuite`. I saw`TPCDSQueryTestSuite` failed nondeterministically because output row orders were different with those in the golden files. For example, the failure in the GA job, https://github.com/linhongliu-db/spark/runs/2507928605?check_suite_focus=true, happened because the `tpcds/q6.sql` query output rows were only sorted by `cnt`: https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds/q6.sql#L20 Actually, `tpcds/q6.sql` and `tpcds-v2.7.0/q6.sql` are almost the same and the only difference is that `tpcds-v2.7.0/q6.sql` sorts both `cnt` and `a.ca_state`: https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds-v2.7.0/q6.sql#L22 So, I think it's okay just to test `tpcds-v2.7.0/q6.sql` in this case (q75 has the same issue). For stable testing. No, dev-only. GA passed. Closes apache#32454 from maropu/CleanUpTpcdsQueries. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
…se flaky test results This PR proposes to filter out TPCDS v1.4 q6 and q75 in `TPCDSQueryTestSuite`. I saw`TPCDSQueryTestSuite` failed nondeterministically because output row orders were different with those in the golden files. For example, the failure in the GA job, https://github.com/linhongliu-db/spark/runs/2507928605?check_suite_focus=true, happened because the `tpcds/q6.sql` query output rows were only sorted by `cnt`: https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds/q6.sql#L20 Actually, `tpcds/q6.sql` and `tpcds-v2.7.0/q6.sql` are almost the same and the only difference is that `tpcds-v2.7.0/q6.sql` sorts both `cnt` and `a.ca_state`: https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds-v2.7.0/q6.sql#L22 So, I think it's okay just to test `tpcds-v2.7.0/q6.sql` in this case (q75 has the same issue). For stable testing. No, dev-only. GA passed. Closes apache#32454 from maropu/CleanUpTpcdsQueries. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
@@ -24,7 +24,7 @@ import org.apache.spark.sql.test.SharedSparkSession | |||
trait TPCDSBase extends SharedSparkSession with TPCDSSchema { | |||
|
|||
// The TPCDS queries below are based on v1.4 | |||
val tpcdsQueries = Seq( | |||
def tpcdsQueries: Seq[String] = Seq( | |||
"q1", "q2", "q3", "q4", "q5", "q6", "q7", "q8", "q9", "q10", "q11", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we remove q6 from here for all the tests, if the only difference is an extra order by column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I'll check it and make a PR to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #32520
What changes were proposed in this pull request?
This PR proposes to filter out TPCDS v1.4 q6 and q75 in
TPCDSQueryTestSuite
.I saw
TPCDSQueryTestSuite
failed nondeterministically because output row orders were different with those in the golden files. For example, the failure in the GA job, https://github.com/linhongliu-db/spark/runs/2507928605?check_suite_focus=true, happened because thetpcds/q6.sql
query output rows were only sorted bycnt
:spark/sql/core/src/test/resources/tpcds/q6.sql
Line 20 in a0c76a8
Actually,
tpcds/q6.sql
andtpcds-v2.7.0/q6.sql
are almost the same and the only difference is thattpcds-v2.7.0/q6.sql
sorts bothcnt
anda.ca_state
:spark/sql/core/src/test/resources/tpcds-v2.7.0/q6.sql
Line 22 in a0c76a8
So, I think it's okay just to test
tpcds-v2.7.0/q6.sql
in this case (q75 has the same issue).Why are the changes needed?
For stable testing.
Does this PR introduce any user-facing change?
No, dev-only.
How was this patch tested?
GA passed.