Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33928][TEST][CORE] Fix flaky o.a.s.ExecutorAllocationManagerSuite - "SPARK-23365 Don't update target num executors when killing idle executors" #30956

Closed
wants to merge 1 commit into from

Conversation

Ngone51
Copy link
Member

@Ngone51 Ngone51 commented Dec 29, 2020

What changes were proposed in this pull request?

Use the testing mode for the test to fix the flaky.

Why are the changes needed?

The test is flaky:

[info] - SPARK-23365 Don't update target num executors when killing idle executors *** FAILED *** (126 milliseconds)
[info] 1 did not equal 2 (ExecutorAllocationManagerSuite.scala:1615)
[info] org.scalatest.exceptions.TestFailedException:
[info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
[info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
[info] at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
[info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503)
[info] at org.apache.spark.ExecutorAllocationManagerSuite.$anonfun$new$84(ExecutorAllocationManagerSuite.scala:1617)
...

The root cause should be the same as #29773 since the test run under non-testing mode.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manually checked. Flaky is gone by running the test hundreds of times after this fix.

@github-actions github-actions bot added the CORE label Dec 29, 2020
@Ngone51
Copy link
Member Author

Ngone51 commented Dec 29, 2020

cc @tgravescs @cloud-fan Please take a look, thanks!

clock.advance(3000)
schedule(manager)
assert(maxNumExecutorsNeededPerResourceProfile(manager, defaultProfile) === 1)
assert(numExecutorsTargetForDefaultProfileId(manager) === 1)
// here's the important verify -- we did kill the executors, but did not adjust the target count
verify(client).killExecutors(Seq("executor-1"), false, false, false)
assert(manager.executorMonitor.executorsPendingToRemove() === Set("executor-1"))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In non-testing mode, client.killExecutors will be invoked. But in testing mode, it's no longer invoked. So we now use the executorMonitor.executorsPendingToRemove() to perform the same check.

@SparkQA
Copy link

SparkQA commented Dec 29, 2020

Test build #133464 has finished for PR 30956 at commit ba0a4bc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.1!

@cloud-fan cloud-fan closed this in 1ef7ddd Dec 29, 2020
cloud-fan pushed a commit that referenced this pull request Dec 29, 2020
…tionManagerSuite - " Don't update target num executors when killing idle executors"

### What changes were proposed in this pull request?

Use the testing mode for the test to fix the flaky.

### Why are the changes needed?

The test is flaky:

```scala
[info] - SPARK-23365 Don't update target num executors when killing idle executors *** FAILED *** (126 milliseconds)
[info] 1 did not equal 2 (ExecutorAllocationManagerSuite.scala:1615)
[info] org.scalatest.exceptions.TestFailedException:
[info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
[info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
[info] at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
[info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503)
[info] at org.apache.spark.ExecutorAllocationManagerSuite.$anonfun$new$84(ExecutorAllocationManagerSuite.scala:1617)
...
```
The root cause should be the same as #29773 since the test run under non-testing mode.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually checked. Flaky is gone by running the test hundreds of times after this fix.

Closes #30956 from Ngone51/fix-flaky-SPARK-23365.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 1ef7ddd)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants