Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pods that have UnexpectedAdmissionError are not automatically removed. #124934

Closed
warlock2k opened this issue May 17, 2024 · 6 comments
Closed

Pods that have UnexpectedAdmissionError are not automatically removed. #124934

warlock2k opened this issue May 17, 2024 · 6 comments
Labels
needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@warlock2k
Copy link

Background

I'm using the Smarter Device Manager to allow Kubernetes (K8s) containers access to devices available on nodes (for example, /dev/kvm). However, when the node restarts, it takes a while for the Smarter Device Manager to initialize, as a result, pods scheduled on such nodes requiring access to /dev/kvm error out with a status of UnexpectedAdmissionError and with the following error:

...
Annotations:  <none>
Status:  Failed
Reason:  UnexpectedAdmissionError
Message:  Pod was rejected: Allocate failed due to no healthy devices present; cannot allocate unhealthy devices smarter-devices/kvm, which is unexpected

Although, after a while, I see new pod pertaining to the job scheduled, the older pod still remains errored out and not removed.

Kubernetes version

Client Version: v1.26.9
Kustomize Version: v4.5.7
Server Version: v1.26.11

What did I expect to happen?

I expect that pods with status UnexpectedAdmissionError are removed automatically by Kubernetes.

Steps to reproduce

  • Create a job that deploys a pod needing access to device drivers (say /dev/kvm through smarter-device-manager).
  • Restart the node on which this job is deployed.
  • You will see that the pod pertaining to the aforementioned job doesn't get scheduled.

OS

PRETTY_NAME="Ubuntu 22.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.2 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
Linux 5.15.0-105-generic #115-Ubuntu SMP Mon Apr 15 09:52:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 17, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@neolit123
Copy link
Member

I expect that pods with status UnexpectedAdmissionError are removed automatically by Kubernetes.

/sig node
for triage

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 18, 2024
@ffromani
Copy link
Contributor

pods in failed state, including UnexpectedAdmissionError aren't automatically cleaned up to enable users to learn the failure conditions (e.g. logs). In order to enable automatic safe cleanup we need to have clear well known errors, which will probably require a (minor) RFE.

@warlock2k
Copy link
Author

@ffromani pods in the UnexepectedAdmissionError state actually do not have any logs but only the status that can be seen through the kubectl describe command. I do believe that enabling a automatic safe cleanup is a useful feature to have and would greatly help us. Do you suggest I raise a RFE for this?

@ffromani
Copy link
Contributor

@ffromani pods in the UnexepectedAdmissionError state actually do not have any logs but only the status that can be seen through the kubectl describe command. I do believe that enabling a automatic safe cleanup is a useful feature to have and would greatly help us. Do you suggest I raise a RFE for this?

The key in this case is Failed state, which UnexpectedAdmissionError is one of the reason for. I think the first step is to have a clear indentifiable error which can enable automatic cleanup later on. Not sure the base system should do this cleanup but a cleaning controller would then be pretty trivial to create.

@warlock2k
Copy link
Author

Alright, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

4 participants