Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to kill pod with label if no ApplicationInfo found to prevent pod leak #5206

Closed
wants to merge 2 commits into from

Conversation

turboFei
Copy link
Member

@turboFei turboFei commented Aug 28, 2023

Why are the changes needed?

Now for KubernetesApplicationOperation, it rely on the appInfoStore.

For batch rest api, if the closeBatch request can not be send to the kyuubiInstance that created the batch, the current kyuubiInstance will try to kill the batch.

I wonder that, in this case, the applicationInfo might can not be found in the appInfoStore.

It is better to try the best to kill the pod with kyuubi-unique-tag label to prevent pod leak.

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before make a pull request

Was this patch authored or co-authored using generative AI tooling?

@turboFei turboFei changed the title Try to kill pod with label if no ApplicationInfo found in appInfoStore Trying to kill pod with label if no ApplicationInfo found to prevent pod leak Aug 28, 2023
@turboFei turboFei self-assigned this Aug 28, 2023
@turboFei turboFei added this to the v1.8.0 milestone Aug 28, 2023
@turboFei turboFei requested a review from pan3793 August 28, 2023 10:01
@codecov-commenter
Copy link

Codecov Report

Merging #5206 (630bcd0) into master (37f2c98) will not change coverage.
The diff coverage is 0.00%.

@@          Coverage Diff           @@
##           master   #5206   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files         589     589           
  Lines       33238   33246    +8     
  Branches     4387    4390    +3     
======================================
- Misses      33238   33246    +8     
Files Changed Coverage Δ
...kyuubi/engine/KubernetesApplicationOperation.scala 0.00% <0.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@pan3793 pan3793 changed the title Trying to kill pod with label if no ApplicationInfo found to prevent pod leak Try to kill pod with label if no ApplicationInfo found to prevent pod leak Aug 28, 2023
@pan3793
Copy link
Member

pan3793 commented Aug 28, 2023

I think there is hardly an opportunity that appInfoStore does not contain the appInfo, in most cases, the appInfoStore should contain all appInfos of one K8s cluster, but anyway, this change does no harm.

@pan3793
Copy link
Member

pan3793 commented Aug 28, 2023

Also cc @zwangsheng, as you are investigating issues about deleting Pod multiple times, this could be a new case.

@turboFei turboFei closed this in be48b94 Aug 29, 2023
@turboFei turboFei deleted the k8s_pod_kill branch August 29, 2023 05:26
@turboFei
Copy link
Member Author

thanks, merged to master

@zwangsheng
Copy link
Contributor

zwangsheng commented Aug 29, 2023

Thanks for your info @pan3793

IMO, pod name as a unique representation (under namespace), and deleting with pod name is less expensive than deleting with label. The api server is known to be unstable if there are too many concurrent requests.

We can delete with pod name first, if pod name deletion fails(due to NOT_FOUND), we can try to delete with labelAlready did.

As for your concern @turboFei , appInfoStore gets the app information from the kubernetes cluster, not when kyuubi instance creates the engine, so in theory, all kyuubi instance will be consistent (with maybe a little latency, depend on informer).

But LGTM, we should delete with label if no appInfo found, when user call to close batch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants