Skip to content

[SPARK-44049][K8S][TESTS] Fix KubernetesSuite to use inNamespace for validating driver pod cleanup#41586

Closed
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-44049
Closed

[SPARK-44049][K8S][TESTS] Fix KubernetesSuite to use inNamespace for validating driver pod cleanup#41586
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-44049

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jun 14, 2023

What changes were proposed in this pull request?

This PR aims to fix KubernetesSuite to use inNamespace API for validating driver pod cleanup.

Why are the changes needed?

This is a trick bug because of the following two reasons.

  • Although all test cases passed, currently K8s integration tests are running extremely slowly.
  • The individual test case running time shows correctly.
  • The slowness happens during the transition from a test to another test.

The main root cause is that K8s test shows namespace not specified error after passing tests and this bug blocks every test case at the driver pod clean-up and validation stage up to 3 minutes (the maximum timeouts).

[info]   The code passed to eventually never returned normally. Attempted 190 times over 3.011156453483333 minutes.
Last failure message: namespace not specified for an operation requiring one and no default was found in the Config.. (KubernetesSuite.scala:612)

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs.

Also, I manually tested that the suite took 13 minutes correctly. Previously, it took over 1 hour.

[info] YuniKornSuite:
[info] - SPARK-42190: Run SparkPi with local[*] (17 seconds, 144 milliseconds)
[info] - Run SparkPi with no resources (20 seconds, 406 milliseconds)
[info] - Run SparkPi with no resources & statefulset allocation (15 seconds, 531 milliseconds)
...
[info] Run completed in 13 minutes, 46 seconds.
[info] Total number of tests run: 27
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 27, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 842 s (14:02), completed Jun 13, 2023, 9:33:02 PM

Eventually.eventually(TIMEOUT, INTERVAL) {
assert(kubernetesTestComponents.kubernetesClient
.pods()
.inNamespace(kubernetesTestComponents.namespace)
Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Jun 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same code pattern with line 610.

@dongjoon-hyun
Copy link
Member Author

Could you review this when you have some time, @viirya ?

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@dongjoon-hyun
Copy link
Member Author

Thank you, @pan3793 and @viirya ! Merged to master.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-44049 branch June 14, 2023 07:21
czxm pushed a commit to czxm/spark that referenced this pull request Jun 19, 2023
…r validating driver pod cleanup

### What changes were proposed in this pull request?

This PR aims to fix `KubernetesSuite` to use `inNamespace` API for validating driver pod cleanup.

### Why are the changes needed?

This is a trick bug because of the following two reasons.
- Although all test cases passed, currently K8s integration tests are running extremely slowly.
- The individual test case running time shows correctly.
- The slowness happens during the transition from a test to another test.

The main root cause is that K8s test shows `namespace not specified` error after passing tests and this bug blocks every test case at the driver pod clean-up and validation stage `up to 3 minutes` (the maximum timeouts).

```
[info]   The code passed to eventually never returned normally. Attempted 190 times over 3.011156453483333 minutes.
Last failure message: namespace not specified for an operation requiring one and no default was found in the Config.. (KubernetesSuite.scala:612)
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Also, I manually tested that the suite took 13 minutes correctly. Previously, it took over 1 hour.

```
[info] YuniKornSuite:
[info] - SPARK-42190: Run SparkPi with local[*] (17 seconds, 144 milliseconds)
[info] - Run SparkPi with no resources (20 seconds, 406 milliseconds)
[info] - Run SparkPi with no resources & statefulset allocation (15 seconds, 531 milliseconds)
...
[info] Run completed in 13 minutes, 46 seconds.
[info] Total number of tests run: 27
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 27, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 842 s (14:02), completed Jun 13, 2023, 9:33:02 PM
```

Closes apache#41586 from dongjoon-hyun/SPARK-44049.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants