[SPARK-44045][SQL][TESTS] Mark `WholeStageCodegenSparkSubmitSuite` as `ExtendedSQLTest` by dongjoon-hyun · Pull Request #41579 · apache/spark

dongjoon-hyun · 2023-06-13T19:55:58Z

What changes were proposed in this pull request?

This PR aims to move WholeStageCodegenSparkSubmitSuite to sql - slow pipeline to mitigate the recent sql - others pipeline's flakiness.

Why are the changes needed?

WholeStageCodegenSparkSubmitSuite is the only test suite using SparkSubmitTestUtils in sql module.

$ git grep 'SparkSubmitTestUtils' | grep sql/core
sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSparkSubmitSuite.scala:import org.apache.spark.deploy.SparkSubmitTestUtils
sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSparkSubmitSuite.scala:class WholeStageCodegenSparkSubmitSuite extends SparkSubmitTestUtils

Like the following, this test case contributes the flakiness.

https://github.com/wangyum/spark/actions/runs/5253058423/jobs/9489919333

2023-06-13T11:05:31.3387316Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[32mWholeStageCodegenSparkSubmitSuite:�[0m�[0m
2023-06-13T11:05:36.6680896Z 2023-06-13 04:05:36.667 - stderr> 23/06/13 11:05:36 INFO SparkContext: Running Spark version 3.5.0-SNAPSHOT
...
2023-06-13T11:06:47.4402222Z 2023-06-13 04:06:47.408 - stderr> 23/06/13 11:06:47 INFO TaskSetManager: Finished task 52.0 in stage 2.0 (TID 63) in 148 ms on 127.0.0.1 (executor 0) (60/200)
2023-06-13T11:06:48.1484169Z 
2023-06-13T11:06:48.8633864Z ##[error]The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
2023-06-13T11:06:48.9660849Z Session terminated, killing shell...
2023-06-13T11:06:49.2756183Z ##[error]The operation was canceled.
2023-06-13T11:06:49.4597252Z Cleaning up orphan processes
2023-06-13T11:06:49.6684941Z Terminate orphan process: pid (4061) (java)
2023-06-13T11:06:49.7698091Z Terminate orphan process: pid (661115) (java)

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs.

… `ExtendedSQLTest`

dongjoon-hyun · 2023-06-13T20:01:10Z

cc @HyukjinKwon , @LuciferYang , @viirya

viirya

I'm wondering why moving to sql - slow pipeline can mitigate flakiness?

dongjoon-hyun · 2023-06-13T20:09:20Z

Thank you for review. The pipeline is flaky due to The runner has received a shutdown signal.. We are suspecting the spiky tests in these days. A SparkSubmit-based test is one of the heavy known tests.

dongjoon-hyun · 2023-06-13T20:11:53Z

#41533 is one of the example we tried. And more bigger approach is here, #41552 .

dongjoon-hyun · 2023-06-13T20:19:38Z

BTW, I must say that this is not the only reason why the sql - others pipeline is flaky. So, I wrote like 'mitigate' as a best-effort approach.

viirya

Got it. Okay, I think we can try this and see if it can mitigate the issue.

dongjoon-hyun · 2023-06-13T20:31:14Z

Thank you!

dongjoon-hyun · 2023-06-13T21:57:08Z

I verified that it's moved into sql - slow correctly.

https://github.com/dongjoon-hyun/spark/actions/runs/5259819811/jobs/9505917572

2023-06-13T21:43:35.3930154Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[32mWholeStageCodegenSparkSubmitSuite:�[0m�[0m
2023-06-13T21:43:40.9666468Z 2023-06-13 14:43:40.965 - stderr> 23/06/13 21:43:40 INFO SparkContext: Running Spark version 3.5.0-SNAPSHOT
2023-06-13T21:43:41.0897608Z 2023-06-13 14:43:41.089 - stderr> 23/06/13 21:43:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

dongjoon-hyun · 2023-06-13T21:58:00Z

Merged to master.

LuciferYang · 2023-06-14T00:33:32Z

late LGTM

HyukjinKwon · 2023-06-14T00:39:31Z

LGTM2

dongjoon-hyun · 2023-06-14T01:12:25Z

Thank you, @LuciferYang and @HyukjinKwon !

panbingkun · 2023-06-14T02:36:08Z

late LGTM

dongjoon-hyun · 2023-06-14T04:43:04Z

Now, it seems to be much better although it's still too early to say. Two commits (SPARK-44045 and SPARK-44021) passes the tests without flakiness.

… `ExtendedSQLTest` ### What changes were proposed in this pull request? This PR aims to move `WholeStageCodegenSparkSubmitSuite` to `sql - slow` pipeline to mitigate the recent `sql - others` pipeline's flakiness. ### Why are the changes needed? `WholeStageCodegenSparkSubmitSuite` is the only test suite using `SparkSubmitTestUtils` in `sql` module. ``` $ git grep 'SparkSubmitTestUtils' | grep sql/core sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSparkSubmitSuite.scala:import org.apache.spark.deploy.SparkSubmitTestUtils sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSparkSubmitSuite.scala:class WholeStageCodegenSparkSubmitSuite extends SparkSubmitTestUtils ``` Like the following, this test case contributes the flakiness. - https://github.com/wangyum/spark/actions/runs/5253058423/jobs/9489919333 ``` 2023-06-13T11:05:31.3387316Z �[0m[�[0m�[0minfo�[0m] �[0m�[0m�[32mWholeStageCodegenSparkSubmitSuite:�[0m�[0m 2023-06-13T11:05:36.6680896Z 2023-06-13 04:05:36.667 - stderr> 23/06/13 11:05:36 INFO SparkContext: Running Spark version 3.5.0-SNAPSHOT ... 2023-06-13T11:06:47.4402222Z 2023-06-13 04:06:47.408 - stderr> 23/06/13 11:06:47 INFO TaskSetManager: Finished task 52.0 in stage 2.0 (TID 63) in 148 ms on 127.0.0.1 (executor 0) (60/200) 2023-06-13T11:06:48.1484169Z 2023-06-13T11:06:48.8633864Z ##[error]The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled. 2023-06-13T11:06:48.9660849Z Session terminated, killing shell... 2023-06-13T11:06:49.2756183Z ##[error]The operation was canceled. 2023-06-13T11:06:49.4597252Z Cleaning up orphan processes 2023-06-13T11:06:49.6684941Z Terminate orphan process: pid (4061) (java) 2023-06-13T11:06:49.7698091Z Terminate orphan process: pid (661115) (java) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Closes apache#41579 from dongjoon-hyun/SPARK-44045. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

[SPARK-44045][SQL][TESTS] Mark WholeStageCodegenSparkSubmitSuite as…

4bef3ae

… `ExtendedSQLTest`

github-actions bot added the SQL label Jun 13, 2023

viirya reviewed Jun 13, 2023

View reviewed changes

viirya approved these changes Jun 13, 2023

View reviewed changes

dongjoon-hyun closed this in e8aa23a Jun 13, 2023

dongjoon-hyun deleted the SPARK-44045 branch June 14, 2023 01:12

panbingkun mentioned this pull request Jun 14, 2023

[SPARK-44039][CONNECT][TESTS] Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite #41572

Closed

Conversation

dongjoon-hyun commented Jun 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

dongjoon-hyun commented Jun 13, 2023

Uh oh!

viirya left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jun 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Jun 13, 2023

Uh oh!

dongjoon-hyun commented Jun 13, 2023

Uh oh!

viirya left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jun 13, 2023

Uh oh!

dongjoon-hyun commented Jun 13, 2023

Uh oh!

dongjoon-hyun commented Jun 13, 2023

Uh oh!

LuciferYang commented Jun 14, 2023

Uh oh!

HyukjinKwon commented Jun 14, 2023

Uh oh!

dongjoon-hyun commented Jun 14, 2023

Uh oh!

panbingkun commented Jun 14, 2023

Uh oh!

dongjoon-hyun commented Jun 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Comments

dongjoon-hyun commented Jun 13, 2023 •

edited

Loading

dongjoon-hyun commented Jun 13, 2023 •

edited

Loading

dongjoon-hyun commented Jun 14, 2023 •

edited

Loading