Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35832][CORE][ML][K8S][TESTS] Add LocalRootDirsTest trait #32986

Closed
wants to merge 2 commits into from
Closed

[SPARK-35832][CORE][ML][K8S][TESTS] Add LocalRootDirsTest trait #32986

wants to merge 2 commits into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jun 20, 2021

What changes were proposed in this pull request?

To make the test suite more robust, this PR aims to add a new trait, LocalRootDirsTest, by refactoring SortShuffleSuite's helper functions and applying it to the following:

  • ShuffleNettySuite
  • ShuffleOldFetchProtocolSuite
  • ExternalShuffleServiceSuite
  • KubernetesLocalDiskShuffleDataIOSuite
  • LocalDirsSuite
  • RDDCleanerSuite
  • ALSCleanerSuite

In addition, this fixes a UT in KubernetesLocalDiskShuffleDataIOSuite.

Why are the changes needed?

ShuffleSuite is extended by four classes but only SortShuffleSuite does the clean-up correctly.

ShuffleSuite
- SortShuffleSuite
- ShuffleNettySuite
- ShuffleOldFetchProtocolSuite
- ExternalShuffleServiceSuite

Since KubernetesLocalDiskShuffleDataIOSuite is looking for the other storage directory, the leftover of ShuffleSuite causes flakiness.

org.apache.spark.SparkException: Job aborted due to stage failure: task 0.0 in stage 1.0 (TID 3) had a not serializable result: org.apache.spark.ShuffleSuite$NonJavaSerializableClass
...
org.apache.spark.shuffle.KubernetesLocalDiskShuffleDataIOSuite.$anonfun$new$2(KubernetesLocalDiskShuffleDataIOSuite.scala:52)

For the other suites, the clean-up implementation is used but not complete. So, they are refactored to use new trait.

Does this PR introduce any user-facing change?

No, this is a test-only change.

How was this patch tested?

Pass the CIs.

@dongjoon-hyun
Copy link
Member Author

Could you review this when you have some time, @HyukjinKwon and @viirya ?

import org.apache.spark.util.Utils


trait LocalRootDirsTest extends SparkFunSuite with LocalSparkContext {
Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Jun 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This functionality came from SortShuffleSuite which had the most complete clean-up logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some comments for this trait? So others in the future can know quickly what this trait is for.

@github-actions github-actions bot added the ML label Jun 20, 2021
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-35832][CORE][K8S][TESTS] Add LocalRootDirsTest trait [SPARK-35832][CORE][ML][K8S][TESTS] Add LocalRootDirsTest trait Jun 20, 2021
@dongjoon-hyun
Copy link
Member Author

This is also a patch to fix flaky tests, @gengliangwang .

@SparkQA
Copy link

SparkQA commented Jun 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44569/

@SparkQA
Copy link

SparkQA commented Jun 20, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44571/

@SparkQA
Copy link

SparkQA commented Jun 20, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44569/

@SparkQA
Copy link

SparkQA commented Jun 20, 2021

Test build #140042 has finished for PR 32986 at commit 6cd8df9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait LocalRootDirsTest extends SparkFunSuite with LocalSparkContext
  • abstract class ShuffleSuite extends SparkFunSuite with Matchers with LocalRootDirsTest
  • class KubernetesLocalDiskShuffleDataIOSuite extends SparkFunSuite with LocalRootDirsTest

@SparkQA
Copy link

SparkQA commented Jun 20, 2021

Test build #140044 has finished for PR 32986 at commit d282968.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class RDDCleanerSuite extends SparkFunSuite with LocalRootDirsTest
  • class LocalDirsSuite extends SparkFunSuite with LocalRootDirsTest
  • class ALSCleanerSuite extends SparkFunSuite with LocalRootDirsTest

@@ -192,7 +195,7 @@ class KubernetesLocalDiskShuffleDataIOSuite extends SparkFunSuite with LocalSpar
val rdd3 = rdd2.reduceByKey(_ + _)
val rdd4 = rdd3.sortByKey()

assert(rdd4.count() === 3)
assert(rdd4.count() === 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this count changes?

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I have a question about a change looks unrelated to the cleaning up.

@dongjoon-hyun
Copy link
Member Author

Thank you, @viirya . Yes, that was a hidden bug and uncovered by this. I mentioned in the PR description like the following.

In addition, this fixes a UT in KubernetesLocalDiskShuffleDataIOSuite.

Merged to master.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-35832 branch June 20, 2021 17:54
@viirya
Copy link
Member

viirya commented Jun 20, 2021

Thank you @dongjoon-hyun

@HyukjinKwon
Copy link
Member

lgtm2

domybest11 pushed a commit to domybest11/spark that referenced this pull request Jun 15, 2022
### What changes were proposed in this pull request?

To make the test suite more robust, this PR aims to add a new trait, `LocalRootDirsTest`, by refactoring `SortShuffleSuite`'s helper functions and applying it to the following:
- ShuffleNettySuite
- ShuffleOldFetchProtocolSuite
- ExternalShuffleServiceSuite
- KubernetesLocalDiskShuffleDataIOSuite
- LocalDirsSuite
- RDDCleanerSuite
- ALSCleanerSuite

In addition, this fixes a UT in `KubernetesLocalDiskShuffleDataIOSuite`.

### Why are the changes needed?

`ShuffleSuite` is extended by four classes but only `SortShuffleSuite` does the clean-up correctly.
```
ShuffleSuite
- SortShuffleSuite
- ShuffleNettySuite
- ShuffleOldFetchProtocolSuite
- ExternalShuffleServiceSuite
```

Since `KubernetesLocalDiskShuffleDataIOSuite` is looking for the other storage directory, the leftover of `ShuffleSuite` causes flakiness.
- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/2649/testReport/junit/org.apache.spark.shuffle/KubernetesLocalDiskShuffleDataIOSuite/recompute_is_not_blocked_by_the_recovery/
```
org.apache.spark.SparkException: Job aborted due to stage failure: task 0.0 in stage 1.0 (TID 3) had a not serializable result: org.apache.spark.ShuffleSuite$NonJavaSerializableClass
...
org.apache.spark.shuffle.KubernetesLocalDiskShuffleDataIOSuite.$anonfun$new$2(KubernetesLocalDiskShuffleDataIOSuite.scala:52)
```

For the other suites, the clean-up implementation is used but not complete. So, they are refactored to use new trait.

### Does this PR introduce _any_ user-facing change?

No, this is a test-only change.

### How was this patch tested?

Pass the CIs.

Closes apache#32986 from dongjoon-hyun/SPARK-35832.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants