[Question] Why sometimes the node-termination is not able to delete all the pods #38

santinoncs · 2020-12-02T14:25:56Z

Hi,

I got preemptible nodes with more than 40 pods.
For some reason is not able to delete all the pods. It starts and when it has deleted around 20 pods, it stops. No logs further this moment.
I tried to delete the pods at the same time that listing the pods

eviction.go:66

is taking place , but no success either.

Thanks for your help

The text was updated successfully, but these errors were encountered:

toms049 · 2020-12-17T13:10:04Z

Hi,
I have got the same issue.

Tried to test it, see output logs but no luck at all.
It deletes just 6 pods in order they are listed.
No logs further that.

Thanks for an idea

toms049 · 2020-12-18T21:10:44Z

I did some testing, it looks like it does the job, but only if there is less than 11 pods on a node. If so, it removes all of them, if not, it stucks, processes just a few of the pods and ends suddenly, no logs further. The rest of the pods is running till the node hardware shutdown. So it takes a lot of time to handle these by k8s and reschedule.

laxmiprasanna-gunna · 2021-01-21T10:10:42Z

Hi, Facing the same issue. I see from google docs that pre-empted node gets 30 seconds before it gets deleted.
This value is set to TRUE as soon as the instance is marked to be preempted but there might be some delay between the G2 signal and the instance metadata value query receiving a response with value 'TRUE'. In essence after the preempted value is set to “TRUE”, the instance would be preempted within 30 seconds.
But when I run node-termination-handler, I don't think it is capturing the right signal, because node-terminator doesn't seem to be getting 30 seconds in order to delete all the pods present on the node.
It was able to delete only some of the pods and then exits without any further log.

santinoncs · 2021-01-25T11:21:09Z

I follow the GCP article

https://cloud.google.com/solutions/running-web-applications-on-gke-using-cost-optimized-pvms-and-traffic-director#post-preemption_validations

and applied the recommendations , including the daemonset that creates a systemd service that blocks the shutdown of the Kubelet process.

I also delegate to an external service in another pod in another namespace to execute the deletion of all pods outside the machine that is being deleted/preempted. With this solution the deletion of pods is always done outside the proper node.

But with no success.

santinoncs · 2021-04-23T10:34:43Z

I am wathing these events from kubernetes when node-termination tries to delete the pods

TaintManagerEviction | Cancelling deletion of Pod yyy/xx

Do you know what this means?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Why sometimes the node-termination is not able to delete all the pods #38

[Question] Why sometimes the node-termination is not able to delete all the pods #38

santinoncs commented Dec 2, 2020

toms049 commented Dec 17, 2020

toms049 commented Dec 18, 2020

laxmiprasanna-gunna commented Jan 21, 2021

santinoncs commented Jan 25, 2021

santinoncs commented Apr 23, 2021

[Question] Why sometimes the node-termination is not able to delete all the pods #38

[Question] Why sometimes the node-termination is not able to delete all the pods #38

Comments

santinoncs commented Dec 2, 2020

toms049 commented Dec 17, 2020

toms049 commented Dec 18, 2020

laxmiprasanna-gunna commented Jan 21, 2021

santinoncs commented Jan 25, 2021

santinoncs commented Apr 23, 2021