Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pause container always runs ping.exe #1576

Open
marosset opened this issue Nov 28, 2022 · 9 comments
Open

Pause container always runs ping.exe #1576

marosset opened this issue Nov 28, 2022 · 9 comments

Comments

@marosset
Copy link
Member

createPod() always runs cmd /c ping -t 127.0.0.1 > nul for the pause container for Windows process isolated pods.
https://github.com/microsoft/hcsshim/blob/main/cmd/containerd-shim-runhcs-v1/pod.go#L240-L245

Kubernetes maintains a Windows pause image that is supposed to run pause.exe instead of ping.exe.
ping.exe creates unnecessary I/O which can cause performance issues on machines running large numbers of pods/containers.

@dims
Copy link
Contributor

dims commented Dec 2, 2022

xref: containerd/containerd#7752

@jterry75
Copy link
Contributor

jterry75 commented Dec 6, 2022

Oh for argon. Yea this is a bug for sure

@kiashok
Copy link
Contributor

kiashok commented Jan 24, 2023

Fix considered for resolving this issue is as follows:

Check if entrypoint is the same as the default entrypoint for nanoserver and servercore images which is "c:\windows\system32\cmd.exe".
If the entrypoint is not same as the default EP that nanoserver and servercore images come with, then we honor the entrypoint and do not overrride with ping.exe .
If they are the same, then we override with ping.exe with a suggestion to switch to a pause image with explicit cmd set.

Please note, that with this approach we override with ping.exe ONLY if the entrypoint of image is same as the default for nanoserver and servercore images, which is "c:\windows\system32\cmd.exe". If a new pause image is built with nanoserver/servercore as the base image and a different entrypoint such as "cmd.exe" or "windows/system32/cmd.exe" etc are used, then this fix could breaking as we will not be overriding with ping.exe

PR for this: #1615

@Jamie0
Copy link

Jamie0 commented Jun 1, 2023

Has anyone found a way of working around this on AKS? Or, I guess more accurately, what version includes that PR?

We're finding that PING.EXE launched inside the pause container gradually leaks memory, which after a few weeks causes kubelet to restart suddenly, breaking our disruption budgets.

@Jamie0
Copy link

Jamie0 commented Jun 1, 2023

Unfortunately it seems it's not possible to override the pause container image on Azure Kubernetes (even if you manually tag over the pause images and the local kubletwin/pause image on the host, hcsshim/containerd will use the baked-in pause container).

Indeed until this issue is fixed, long-running containers on Windows Server 2022 will eventually be killed (when PING.EXE is OOM killed)

Upon inspecting the contents of the pause container, I can see a copy of pause.exe located in the root of C:. Unlike the virtual pause.exe inside DOS/cmd.exe, this executable doesn't exit if there's no STDIN/console. Would this not be suitable as an entrypoint rather than cmd.exe /c ping -t 127.0.0.1?

@kiashok
Copy link
Contributor

kiashok commented Jun 1, 2023

Has anyone found a way of working around this on AKS? Or, I guess more accurately, what version includes that PR?

We're finding that PING.EXE launched inside the pause container gradually leaks memory, which after a few weeks causes kubelet to restart suddenly, breaking our disruption budgets.

This issue was fixed in #1634 to use pause.exe by default. This fix should be available on AKS soon.

@kiashok
Copy link
Contributor

kiashok commented Jun 1, 2023

Has anyone found a way of working around this on AKS? Or, I guess more accurately, what version includes that PR?
We're finding that PING.EXE launched inside the pause container gradually leaks memory, which after a few weeks causes kubelet to restart suddenly, breaking our disruption budgets.

This issue was fixed in #1634 to use pause.exe by default. This fix should be available on AKS soon.

cc @AbelHu

@AbelHu
Copy link

AbelHu commented Jun 20, 2023

AKS has picked up Windows containerd v1.6.21 which has contained this fix https://github.com/microsoft/hcsshim/pull/1634/files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants