Skip to content

[SPARK-40096][CORE][TESTS][FOLLOW-UP] Explicitly check the element and length#37989

Closed
HyukjinKwon wants to merge 1 commit intoapache:masterfrom
HyukjinKwon:SPARK-40096-followup
Closed

[SPARK-40096][CORE][TESTS][FOLLOW-UP] Explicitly check the element and length#37989
HyukjinKwon wants to merge 1 commit intoapache:masterfrom
HyukjinKwon:SPARK-40096-followup

Conversation

@HyukjinKwon
Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

This PR is a followup of #37533 that works around the test failure by explicitly checking the element and length in the test.

Why are the changes needed?

For an unknown reason the test added in is flaky even though both ArrayBuffer and List are Sep and the test should pass up to my best knowledge. See https://github.com/apache/spark/actions/runs/3109851954/jobs/5040465291

[info] - SPARK-40096: Send finalize events even if shuffle merger blocks indefinitely with registerMergeResults is false *** FAILED *** (90 milliseconds)
[info]   ArrayBuffer("hostB") did not equal List("hostB") (DAGSchedulerSuite.scala:4498)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info]   at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info]   at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
[info]   at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
[info]   at org.apache.spark.scheduler.DAGSchedulerSuite.$anonfun$new$286(DAGSchedulerSuite.scala:4498)
[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
[info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:207)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
[info]   at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:66)
[info]   at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
[info]   at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
[info]   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:66)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
[info]   at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
[info]   at scala.collection.immutable.List.foreach(List.scala:431)

Does this PR introduce any user-facing change?

No, test-only.

How was this patch tested?

CI in this PR should verify that.

@github-actions github-actions Bot added the CORE label Sep 25, 2022
@HyukjinKwon
Copy link
Copy Markdown
Member Author

cc @mridulm FYI

@srowen
Copy link
Copy Markdown
Member

srowen commented Sep 25, 2022

Merged to master

@srowen srowen closed this in ff1b57d Sep 25, 2022
@mridulm
Copy link
Copy Markdown
Contributor

mridulm commented Sep 26, 2022

This is very interesting behavior !
Thanks for fixing this @HyukjinKwon

.finalizeShuffleMerge(any(), any(), any(), any(), any())
assert(sentHosts === Seq("hostB"))
assert(sentHosts.nonEmpty)
assert(sentHosts.head === "hostB" && sentHosts.length == 1)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay. Seems like this is flaky because of the timeout:

[info] - SPARK-40096: Send finalize events even if shuffle merger blocks indefinitely with registerMergeResults is false *** FAILED *** (106 milliseconds)
[info]   ArrayBuffer("hostB") was empty (DAGSchedulerSuite.scala:4498)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info]   at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info]   at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
[info]   at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
[info]   at org.apache.spark.scheduler.DAGSchedulerSuite.$anonfun$new$286(DAGSchedulerSuite.scala:4498)
[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)

https://github.com/apache/spark/actions/runs/3129263557/jobs/5078150518

I think it was empty when the condition is checked. Later, I think the array is filled at the time when the exception is actually printed out.

Should we increase the timeout, @mridulm and @wankunde?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might a consequence of using MyDAGScheduler - scheduleShuffleMergeFinalize is overridden and runs within the calling thread, instead of async - which this test is expecting.

+CC @otterc, @venkata91

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm .. this is actually more flaky than I thought:

Would be great if @wankunde has a chance to take a look ..

asfgit pushed a commit that referenced this pull request Nov 8, 2022
### What changes were proposed in this pull request?

This PR is a followup of #37533 that fix the flaky test case.

### Why are the changes needed?

The test case is flaky, and will failure due to some unexpected error.

#37989
https://github.com/apache/spark/actions/runs/3145115911/jobs/5112006948
https://github.com/apache/spark/actions/runs/3146198025/jobs/5114387367

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

CI in this PR should verify that.

Closes #38091 from wankunde/SPARK-40096-2.

Authored-by: Kun Wan <wankun@apache.org>
Signed-off-by: Mridul <mridul<at>gmail.com>
SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
### What changes were proposed in this pull request?

This PR is a followup of apache#37533 that fix the flaky test case.

### Why are the changes needed?

The test case is flaky, and will failure due to some unexpected error.

apache#37989
https://github.com/apache/spark/actions/runs/3145115911/jobs/5112006948
https://github.com/apache/spark/actions/runs/3146198025/jobs/5114387367

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

CI in this PR should verify that.

Closes apache#38091 from wankunde/SPARK-40096-2.

Authored-by: Kun Wan <wankun@apache.org>
Signed-off-by: Mridul <mridul<at>gmail.com>
@HyukjinKwon HyukjinKwon deleted the SPARK-40096-followup branch January 15, 2024 00:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants