Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale-in activity delays when admin container is enabled #3812

Open
rodrigobersa opened this issue Mar 7, 2024 · 2 comments
Open

Scale-in activity delays when admin container is enabled #3812

rodrigobersa opened this issue Mar 7, 2024 · 2 comments
Labels
area/core Issues core to the OS (variant independent) type/bug Something isn't working

Comments

@rodrigobersa
Copy link

Image I'm using:
bottlerocket-aws-k8s-1.28-x86_64-v1.19.1-c325a08b

What I expected to happen:
Scale-in activities should take the same average amount of time either with admin container enabled or disabled.

What actually happened:
Scale-in activities is taking more than 5 minutes when the admin container is enabled.
If not enable, the scale-in process takes less than 2 minutes.

Apparently there is a once sigterm hits containerd, systemd starts repeatedly trying to deactivate the mount for what seems to be the admin host container without success.

How to reproduce the problem:
Spin up a Managed Node Group, or Karpenter Nodepool with Bottlerocket family AMI.
Enable admin container.
Scale-out to any amount of replicas.
Scale-in.

Feb 15 10:31:03 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.cNhkWt.mount: Deactivated successfully.
Feb 15 10:31:13 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.V9Sqd1.mount: Deactivated successfully.
Feb 15 10:31:33 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.fb2IM1.mount: Deactivated successfully.
Feb 15 10:31:43 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.8yARBs.mount: Deactivated successfully.
Feb 15 10:31:53 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.nNPatX.mount: Deactivated successfully.
Feb 15 10:32:23 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.wNYNZV.mount: Deactivated successfully.
Feb 15 10:32:43 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.adSYp4.mount: Deactivated successfully.
Feb 15 10:32:53 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.KXK3eY.mount: Deactivated successfully.
Feb 15 10:33:03 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.8na3Hj.mount: Deactivated successfully.
Feb 15 10:33:03 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.Q2oofj.mount: Deactivated successfully.
Feb 15 10:33:23 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.rzEq2c.mount: Deactivated successfully.
Feb 15 10:33:33 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.hIHHGm.mount: Deactivated successfully.
Feb 15 10:33:43 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.cl4hiM.mount: Deactivated successfully.
Feb 15 10:34:03 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.V8Ow0G.mount: Deactivated successfully.
Feb 15 10:34:13 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.7Ys1Dd.mount: Deactivated successfully.
Feb 15 10:34:13 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.DEGKUp.mount: Deactivated successfully.
Feb 15 10:34:43 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.PvgYkQ.mount: Deactivated successfully.
Feb 15 10:34:43 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.DWMUA7.mount: Deactivated successfully.
Feb 15 10:34:53 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.2BYjLl.mount: Deactivated successfully.
Feb 15 10:35:43 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.tlckbN.mount: Deactivated successfully.
Feb 15 10:35:53 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.1par7Q.mount: Deactivated successfully.
Feb 15 10:35:53 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.HdZbur.mount: Deactivated successfully.
Feb 15 10:36:03 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.9QlOtj.mount: Deactivated successfully.
Feb 15 10:36:13 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.Eg7exB.mount: Deactivated successfully.
Feb 15 10:36:23 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.ralHiZ.mount: Deactivated successfully.
Feb 15 10:36:26 ip-192-168-66-3.us-west-2.compute.internal apiserver[971]: 10:36:26 [INFO] Received exec request to localhost:/exec
Feb 15 10:36:26 ip-192-168-66-3.us-west-2.compute.internal apiserver[971]: 10:36:26 [INFO] exec process returned 0
Feb 15 10:36:26 ip-192-168-66-3.us-west-2.compute.internal apiserver[971]: 10:36:26 [INFO] Closing exec connection; message: "0"
Feb 15 10:36:26 ip-192-168-66-3.us-west-2.compute.internal apiserver[971]: 10:36:26 [INFO] Received exec request to localhost:/exec
Feb 15 10:36:33 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.hwUmTa.mount: Deactivated successfully.
Feb 15 10:36:33 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.E7VdUM.mount: Deactivated successfully.
Feb 15 10:36:37 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: Configuration file /etc/systemd/system/kubelet.service.d/exec-start.conf is marked world-inaccessible. This has no effect as configuration data is accessible via APIs without restrictions. Proceeding anyway.
Feb 15 10:36:43 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.087Jfc.mount: Deactivated successfully.
Feb 15 10:37:13 ip-192-168-66-3.us-west-2.compute.internal systemd[1]: run-containerd-runc-k8s.io-b3cd8f645b9345f01fb9a5976473d691862beeff1a60207bc28f7d36a0d4a197-runc.zRVIty.mount: Deactivated successfully.
@rodrigobersa rodrigobersa added status/needs-triage Pending triage or re-evaluation type/bug Something isn't working labels Mar 7, 2024
@yeazelm
Copy link
Contributor

yeazelm commented Mar 8, 2024

Hello @rodrigobersa, I'll do some testing myself to see if I can reproduce this issue and get back to you.

@yeazelm yeazelm added area/core Issues core to the OS (variant independent) and removed status/needs-triage Pending triage or re-evaluation labels Mar 8, 2024
@webern
Copy link
Member

webern commented Mar 18, 2024

One thing we noticed is that the container that seems to be problematic is in the k8s-io namespace which means it is not the admin container. I don't think I see anything related to the admin container (though we can't rule out some interaction there).

Can you list the containers running on a host that is in this state?

enter-admin-container and use sudo sheltie, then ctr --namespace k8s.io images ls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core Issues core to the OS (variant independent) type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants