-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failed to start or create containerd task #4068
Comments
@rtheis this smells like the one @liggitt fixed in opencontainers/runc#2183 please use containerd 1.3.3 which uses runc v1.0.0-rc10 or just directly drop that version of runc in your environment and retry the tests. thanks, |
@dims Thanks for the pointer. Unfortunately, our latest failure from today used containerd 1.3.3.
|
@rtheis i see it in upstream CI too - https://storage.googleapis.com/k8s-gubernator/triage/index.html?pr=1&text=opening%20w%2Fo%20fifo cc @liggitt |
I see that prior to our recent containerd/runc bump, so it appears to be an independent, pre-existing issue: https://storage.googleapis.com/k8s-gubernator/triage/index.html?date=2020-01-01&pr=1&text=opening%20w%2Fo%20fifo |
@dims Sorry, I don't have an environment running containerd 1.2.7 at this time. |
tracking on the Kubernetes side in kubernetes/kubernetes#89064 |
comes from containerd/pkg/process/exec.go Lines 217 to 230 in c6851ac
|
it's unclear from the error whether the deadline that was exceeded was the 30 second one or a deadline inherited from the wrapped context |
Will take a look on this. Thanks for reporting this ! |
In my vagrant box with 4cpu4GB and high load with |
@fuweid I don't have any I/O metrics data for these failures. |
In #4595 we stopped failing integration tests whenever a pod restarted just once, which is being caused by containerd/containerd#4068. But we forgot to remove the warning event corresponding to that containerd failure, and such unexpected event continues to fail the tests. So this change adds that event to the list of expected ones.
) In #4595 we stopped failing integration tests whenever a pod restarted just once, which is being caused by containerd/containerd#4068. But we forgot to remove the warning event corresponding to that containerd failure, and such unexpected event continues to fail the tests. So this change adds that event to the list of expected ones.
we still see this in kubernetes sometimes. it seems to happen more with certain testcases oddly specific |
I also encountered "failed to create containerd task: failed to start io pipe copy: unable to copy pipes: containerd-shim: opening w/o fifo ...: context deadline exceeded" in the CI tests for the PMEM-CSI driver. FWIW, I only saw it after updating to containerd 1.3.7 from 1.2.13. |
containerd/containerd#4068 caused a container start to fail and get retried, which then broke tests because of our "no container restart" check. By treating this particular failure as non-fatal we get our tests to run reliably again.
It's been about 1.5 years since anyone commented or noted this issue. I also don't see any mentions of 1.4.x or above releases, which are the only ones in support. Do we have any data on this happening with containerd 1.4.x or above? If we don't we may as well close this out. |
@estesp things look good with containerd 1.4, 1.5 and 1.6. |
Kubernetes is using 1.5.9 in KIND currently and 1.6.0 most elsewhere, we are not seeing this anymore that I can find. |
Thanks for the feedback! Closing. |
containerd 1.6.1 also have this problem 8:00" level=error msg="collecting metrics for 00f39eb23e6de53f353385bf3adfc55c4e101f8cf5ace562a04dae02de867d02" error="ttrpc: closed: unknown" |
I deplyed kubernets via microk8s (canonical and snap project) and got the error, seams the error is similar to error mentioned here and related to containerd: kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: can't copy bootstrap data to pipe: write init-p: broken pipe: unknown Is there any solution for it? |
I'm on GKE on version:
This is a cronjob and it just fails with the Any suggestions why this is happening? |
Hello
Pod's status:
|
Description
Running Kubernetes conformance testing against a cluster with containerd runtime sometimes fails due to a pod not starting during one of the test cases. The general error is
failed to start containerd task
orfailed to create containerd task
. More detailed errors include the following:ttrpc: closed: unknown
read: connection reset by peer: unknown
failed to start io pipe copy: unable to copy pipes: containerd-shim: opening w/o fifo ... failed: context deadline exceeded
Steps to reproduce the issue:
Option 1: Follow https://github.com/cncf/k8s-conformance/blob/master/instructions.md#running to run Kubernetes conformance testing via
sonobuoy
.Option 2: Follow https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md#running-conformance-tests to run Kubernetes conformance testing via
kubetest
.The more load on the cluster (i.e running conformance tests in parallel) makes the problem easier to reproduce. However, the problem is in general difficult to reproduce since the failure rate is low. For example, re-running the conformance tests after a failure is usually successful.
Describe the results you received:
See description.
Describe the results you expected:
Kubernetes conformance test passes because containerd retries the failed task.
Output of
containerd --version
:We've seen this on various containerd 1.2.x and 1.3.x versions.
Any other relevant information:
We’ve noticed and have been monitoring these failures since October 2019. Although, they could have started long before that.
The text was updated successfully, but these errors were encountered: