You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a Kubernetes cluster, kubelet tries to create a pod which sometimes times out after 4 minutes. Kubelet retries to create the pod which results in "error="failed to reserve sandbox name".
2024-02-07T02:49:52.036144+00:00 NodeA containerd 22938 - - time="2024-02-07T02:49:52.035550152Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:podX-b181ca13-z7d7p,Uid:f4fc0242-4fb4-4015-a7eb-3e4a9cd73293,Namespace:default,Attempt:0,} failed, error" error="failed to reserve sandbox name "podX-b181ca13-z7d7p_cm_f4fc0242-4fb4-4015-a7eb-3e4a9cd73293_0": name "podX-b181ca13-z7d7p_cm_f4fc0242-4fb4-4015-a7eb-3e4a9cd73293_0" is reserved for "3897d260724a7061999dbc3b03243ae672a6bd7d287784009ddfad9428029b92""
The same issue was reproduced using 'crictl' test script more frequently. The log snippet given below is from one of the test runs where " failed to reserve sandbox name" was observed and containerd did NOT RECOVER from this state and kept posting the same error message for subsequent pod creation requests with the SAME POD NAME. Looks like containerd is not cleaning up the stale pod entries from its database.
Log snippet from Iteration: 17, Test Instance ID: 2
Sandbox Pod ID: fbb5843175d06d7c84d66ad3f5a07c4327910585c1157e8b71bdaa4dd0247017
Creating container inside pod fbb5843175d06d7c84d66ad3f5a07c4327910585c1157e8b71bdaa4dd0247017
b3c019d108f1800a7d2552d6c053ca07690d7d886db2b4131eee38cb16688922
Starting container b3c019d108f1800a7d2552d6c053ca07690d7d886db2b4131eee38cb16688922 inside pod fbb5843175d06d7c84d66ad3f5a07c4327910585c1157e8b71bdaa4dd0247017
Stopped sandbox fbb5843175d06d7c84d66ad3f5a07c4327910585c1157e8b71bdaa4dd0247017
# Remove-pod command timedout
E0307 21:36:57.596856 69228 remote_runtime.go:274] "RemovePodSandbox from runtime service failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" podSandboxID="fbb5843175d06d7c84d66ad3f5a07c4327910585c1157e8b71bdaa4dd0247017"
removing the pod sandbox "fbb5843175d06d7c84d66ad3f5a07c4327910585c1157e8b71bdaa4dd0247017": rpc error: code = DeadlineExceeded desc = context deadline exceeded
# On retry with the same sandbox pod name, create pod fails with the error mentioned below. Containerd has not cleaned up the stale pod entries after remove-pod timed out.
E0307 21:36:57.940424 69750 remote_runtime.go:201] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to reserve sandbox name "bb-sandbox-2b_default_2b_1": name "bb-sandbox-2b_default_2b_1" is reserved for "fbb5843175d06d7c84d66ad3f5a07c4327910585c1157e8b71bdaa4dd0247017""
time="2024-03-07T21:36:57Z" level=fatal msg="run pod sandbox: rpc error: code = Unknown desc = failed to reserve sandbox name "bb-sandbox-2b_default_2b_1": name "bb-sandbox-2b_default_2b_1" is reserved for "fbb5843175d06d7c84d66ad3f5a07c4327910585c1157e8b71bdaa4dd0247017""
In this example, remove-pod has timed out after stopping the pod. Similarly create-pod can get stuck or timeout and the following create-pod attempt fails with "failed to reserve sandbox name" error.
It is also observed that if sleep(10s) is used between the 'crictl' commands, error rate reduced significantly but is not eliminated though.
Steps to reproduce the issue
Pre-requisite for successfully running crictl cmds:
(a) In /etc/containerd/config.toml, commented the line "SystemdCgroup: ..."
(b) systemctl stop kubelet
Five instances of test scripts running in parallel perform these steps repeatedly, each instance running 1000 iterations.
Create sandbox pod
Create container inside the sandbox pod
Start the container
Stop the sandbox pod
Remove the sandbox pod
Describe the results you received and expected
Expected:
Successful creation and deletion of pods and containers.
Received:
Issue was reproduced on v1.6.20, v1.7.0 and v1.7.3
Observed "failed to reserve sandbox name" after crictl cmds timedout.
See the attached backtrace of containerd obtained when sandbox pod name error was observed.
Description
In a Kubernetes cluster, kubelet tries to create a pod which sometimes times out after 4 minutes. Kubelet retries to create the pod which results in "error="failed to reserve sandbox name".
kubelet and containerd logs:
2024-02-07T02:45:40.739274+00:00 NodeA containerd 22938 - - time="2024-02-07T02:45:40.738965836Z" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:podX-b181ca13-z7d7p,Uid:f4fc0242-4fb4-4015-a7eb-3e4a9cd73293,Namespace:default,Attempt:0,}
2024-02-07T02:49:40.739405+00:00 NodeA kubelet 24578 - - E0207 02:49:40.738973 24578 remote_runtime.go:176] "RunPodSandbox from runtime service failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
2024-02-07T02:49:40.739537+00:00 NodeA kubelet 24578 - - E0207 02:49:40.739041 24578 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" pod="default/podX-b181ca13-z7d7p"
2024-02-07T02:49:52.035922+00:00 NodeA containerd 22938 - - time="2024-02-07T02:49:52.035460535Z" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:podX-b181ca13-z7d7p,Uid:f4fc0242-4fb4-4015-a7eb-3e4a9cd73293,Namespace:default,Attempt:0,}"
2024-02-07T02:49:52.036144+00:00 NodeA containerd 22938 - - time="2024-02-07T02:49:52.035550152Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:podX-b181ca13-z7d7p,Uid:f4fc0242-4fb4-4015-a7eb-3e4a9cd73293,Namespace:default,Attempt:0,} failed, error" error="failed to reserve sandbox name "podX-b181ca13-z7d7p_cm_f4fc0242-4fb4-4015-a7eb-3e4a9cd73293_0": name "podX-b181ca13-z7d7p_cm_f4fc0242-4fb4-4015-a7eb-3e4a9cd73293_0" is reserved for "3897d260724a7061999dbc3b03243ae672a6bd7d287784009ddfad9428029b92""
The same issue was reproduced using 'crictl' test script more frequently. The log snippet given below is from one of the test runs where " failed to reserve sandbox name" was observed and containerd did NOT RECOVER from this state and kept posting the same error message for subsequent pod creation requests with the SAME POD NAME. Looks like containerd is not cleaning up the stale pod entries from its database.
Log snippet from Iteration: 17, Test Instance ID: 2
Sandbox Pod ID: fbb5843175d06d7c84d66ad3f5a07c4327910585c1157e8b71bdaa4dd0247017
Creating container inside pod fbb5843175d06d7c84d66ad3f5a07c4327910585c1157e8b71bdaa4dd0247017
b3c019d108f1800a7d2552d6c053ca07690d7d886db2b4131eee38cb16688922
Starting container b3c019d108f1800a7d2552d6c053ca07690d7d886db2b4131eee38cb16688922 inside pod fbb5843175d06d7c84d66ad3f5a07c4327910585c1157e8b71bdaa4dd0247017
Stopped sandbox fbb5843175d06d7c84d66ad3f5a07c4327910585c1157e8b71bdaa4dd0247017
# Remove-pod command timedout
E0307 21:36:57.596856 69228 remote_runtime.go:274] "RemovePodSandbox from runtime service failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" podSandboxID="fbb5843175d06d7c84d66ad3f5a07c4327910585c1157e8b71bdaa4dd0247017"
removing the pod sandbox "fbb5843175d06d7c84d66ad3f5a07c4327910585c1157e8b71bdaa4dd0247017": rpc error: code = DeadlineExceeded desc = context deadline exceeded
# On retry with the same sandbox pod name, create pod fails with the error mentioned below. Containerd has not cleaned up the stale pod entries after remove-pod timed out.
E0307 21:36:57.940424 69750 remote_runtime.go:201] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to reserve sandbox name "bb-sandbox-2b_default_2b_1": name "bb-sandbox-2b_default_2b_1" is reserved for "fbb5843175d06d7c84d66ad3f5a07c4327910585c1157e8b71bdaa4dd0247017""
time="2024-03-07T21:36:57Z" level=fatal msg="run pod sandbox: rpc error: code = Unknown desc = failed to reserve sandbox name "bb-sandbox-2b_default_2b_1": name "bb-sandbox-2b_default_2b_1" is reserved for "fbb5843175d06d7c84d66ad3f5a07c4327910585c1157e8b71bdaa4dd0247017""
In this example, remove-pod has timed out after stopping the pod. Similarly create-pod can get stuck or timeout and the following create-pod attempt fails with "failed to reserve sandbox name" error.
It is also observed that if sleep(10s) is used between the 'crictl' commands, error rate reduced significantly but is not eliminated though.
Steps to reproduce the issue
Pre-requisite for successfully running crictl cmds:
(a) In /etc/containerd/config.toml, commented the line "SystemdCgroup: ..."
(b) systemctl stop kubelet
Five instances of test scripts running in parallel perform these steps repeatedly, each instance running 1000 iterations.
Describe the results you received and expected
Expected:
Successful creation and deletion of pods and containers.
Received:
Issue was reproduced on v1.6.20, v1.7.0 and v1.7.3
Observed "failed to reserve sandbox name" after crictl cmds timedout.
See the attached backtrace of containerd obtained when sandbox pod name error was observed.
What version of containerd are you using?
1.6.20, 1.7.0, 1.7.3
Any other relevant information
crictl version
Version: 0.1.0
RuntimeName: containerd
RuntimeVersion: v1.6.20
RuntimeApiVersion: v1
runc -v
runc version 1.1.5
commit: v1.1.5-0-gf19387a6
spec: 1.0.2-dev
go: go1.19.7
libseccomp: 2.5.1
uname -a
Linux smdev-esx1 5.16.0-0.bpo.4-amd64 #1 SMP PREEMPT Debian 5.16.12-1~bpo11+1 (2022-03-08) x86_64 GNU/Linux
crictl.txt
TestScripts.zip
containerd_backtrace.txt
Show configuration if it is related to CRI plugin.
config_toml.txt
The text was updated successfully, but these errors were encountered: