Try to kill pod with label if no ApplicationInfo found to prevent pod leak #5206

turboFei · 2023-08-28T09:52:17Z

Why are the changes needed?

Now for KubernetesApplicationOperation, it rely on the appInfoStore.

For batch rest api, if the closeBatch request can not be send to the kyuubiInstance that created the batch, the current kyuubiInstance will try to kill the batch.

I wonder that, in this case, the applicationInfo might can not be found in the appInfoStore.

It is better to try the best to kill the pod with kyuubi-unique-tag label to prevent pod leak.

How was this patch tested?

Add some test cases that check the changes thoroughly including negative and positive cases if possible
Add screenshots for manual tests if appropriate
Run test locally before make a pull request

Was this patch authored or co-authored using generative AI tooling?

codecov-commenter · 2023-08-28T11:28:46Z

Codecov Report

Merging #5206 (630bcd0) into master (37f2c98) will not change coverage.
The diff coverage is 0.00%.

@@          Coverage Diff           @@
##           master   #5206   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files         589     589           
  Lines       33238   33246    +8     
  Branches     4387    4390    +3     
======================================
- Misses      33238   33246    +8

Files Changed	Coverage Δ
...kyuubi/engine/KubernetesApplicationOperation.scala	`0.00% <0.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

pan3793 · 2023-08-28T12:45:20Z

I think there is hardly an opportunity that appInfoStore does not contain the appInfo, in most cases, the appInfoStore should contain all appInfos of one K8s cluster, but anyway, this change does no harm.

pan3793 · 2023-08-28T12:47:51Z

Also cc @zwangsheng, as you are investigating issues about deleting Pod multiple times, this could be a new case.

turboFei · 2023-08-29T05:26:57Z

thanks, merged to master

zwangsheng · 2023-08-29T08:31:08Z

Thanks for your info @pan3793

IMO, pod name as a unique representation (under namespace), and deleting with pod name is less expensive than deleting with label. The api server is known to be unstable if there are too many concurrent requests.

~~We can delete with pod name first, if pod name deletion fails(due to NOT_FOUND), we can try to delete with labelAlready did.~~

As for your concern @turboFei , appInfoStore gets the app information from the kubernetes cluster, not when kyuubi instance creates the engine, so in theory, all kyuubi instance will be consistent (with maybe a little latency, depend on informer).

But LGTM, we should delete with label if no appInfo found, when user call to close batch.

delete with label

a9c22e0

github-actions bot added the module:server label Aug 28, 2023

turboFei changed the title ~~Try to kill pod with label if no ApplicationInfo found in appInfoStore~~ Trying to kill pod with label if no ApplicationInfo found to prevent pod leak Aug 28, 2023

turboFei self-assigned this Aug 28, 2023

turboFei added this to the v1.8.0 milestone Aug 28, 2023

warning

630bcd0

turboFei requested a review from pan3793 August 28, 2023 10:01

pan3793 changed the title ~~Trying to kill pod with label if no ApplicationInfo found to prevent pod leak~~ Try to kill pod with label if no ApplicationInfo found to prevent pod leak Aug 28, 2023

pan3793 approved these changes Aug 28, 2023

View reviewed changes

cxzl25 approved these changes Aug 28, 2023

View reviewed changes

turboFei closed this in be48b94 Aug 29, 2023

turboFei deleted the k8s_pod_kill branch August 29, 2023 05:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try to kill pod with label if no ApplicationInfo found to prevent pod leak #5206

Try to kill pod with label if no ApplicationInfo found to prevent pod leak #5206

turboFei commented Aug 28, 2023 •

edited

codecov-commenter commented Aug 28, 2023

pan3793 commented Aug 28, 2023

pan3793 commented Aug 28, 2023

turboFei commented Aug 29, 2023

zwangsheng commented Aug 29, 2023 •

edited

Try to kill pod with label if no ApplicationInfo found to prevent pod leak #5206

Try to kill pod with label if no ApplicationInfo found to prevent pod leak #5206

Conversation

turboFei commented Aug 28, 2023 • edited

Why are the changes needed?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

codecov-commenter commented Aug 28, 2023

Codecov Report

pan3793 commented Aug 28, 2023

pan3793 commented Aug 28, 2023

turboFei commented Aug 29, 2023

zwangsheng commented Aug 29, 2023 • edited

turboFei commented Aug 28, 2023 •

edited

zwangsheng commented Aug 29, 2023 •

edited