New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to kill pod with label if no ApplicationInfo found to prevent pod leak #5206
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5206 +/- ##
======================================
Coverage 0.00% 0.00%
======================================
Files 589 589
Lines 33238 33246 +8
Branches 4387 4390 +3
======================================
- Misses 33238 33246 +8
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
I think there is hardly an opportunity that appInfoStore does not contain the appInfo, in most cases, the appInfoStore should contain all appInfos of one K8s cluster, but anyway, this change does no harm. |
Also cc @zwangsheng, as you are investigating issues about deleting Pod multiple times, this could be a new case. |
thanks, merged to master |
Thanks for your info @pan3793
As for your concern @turboFei , appInfoStore gets the app information from the kubernetes cluster, not when kyuubi instance creates the engine, so in theory, all kyuubi instance will be consistent (with maybe a little latency, depend on informer). But LGTM, we should delete with label if no appInfo found, when user call to close batch. |
Why are the changes needed?
Now for
KubernetesApplicationOperation
, it rely on the appInfoStore.For batch rest api, if the closeBatch request can not be send to the kyuubiInstance that created the batch, the current kyuubiInstance will try to kill the batch.
I wonder that, in this case, the applicationInfo might can not be found in the appInfoStore.
It is better to try the best to kill the pod with
kyuubi-unique-tag
label to prevent pod leak.How was this patch tested?
Add some test cases that check the changes thoroughly including negative and positive cases if possible
Add screenshots for manual tests if appropriate
Run test locally before make a pull request
Was this patch authored or co-authored using generative AI tooling?