-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent pod attach issues with bottlerocket-aws-k8s-1.24-x86_64-v1.19.3-f097c617 AMI #3866
Comments
FYI I'll also be looping in GitLab support in case we can come up with more details. |
Hi @aaronborden-rivian thanks for reporting this. I'll work on reproducing it. In the mean time, any additional logs or error messages you could provide would be helpful. Thanks! |
A number of folks reporting this in the gitlab-runner issue https://gitlab.com/gitlab-org/gitlab-runner/-/issues/37446 Not all of them using Bottlerocket. Might be a component that's included in recent versions that is causing the issue. |
Maybe this is caused by containerd/containerd#10036. There is already an upstream containerd fix with backports for 1.6.* and 1.7.* in preparation, see containerd/containerd#10036 (comment). |
Many thanks to @vyaghras, @bcressey and @rpkelly for #3869. Now, we need a new Amazon EKS optimized Bottlerocket AMI release, will this be triggered automatically? |
@alex-berger it is not automatic but the release process for 1.19.4 with the fix is getting under way now. |
With the release of Bottlerocket v1.19.4 the containerd issue should be addressed! |
@alex-berger |
Image I'm using:
AWS AMI bottlerocket-aws-k8s-1.24-x86_64-v1.19.3-f097c617 ami-08ab333430f1465ce
kublet version reported: v1.24.17-eks-bd4e8bf
Karpenter 0.30.0
We're using Bottlerocket to run our GitLab Runner CI jobs. After the 1.19.3 AMI was released, we started seeing jobs hanging after pod start and eventually timing out on nodes deployed with the latest Bottlerocket image.
What I expected to happen:
Nodes running Bottlerocket 1.19.3 to perform similar to Bottlerocket 1.19.2 with respect to kubectl attach stability.
What actually happened:
We saw an increase in the number of failed jobs with error message.
Note that this did not affect all jobs, the error was rare, but frequent enough for us to see it ~0.7% of jobs affected. We saw about 175 jobs fail with this error during an 8 hour period.
After reverting to Bottlerocket 1.19.2, the error went away.
How to reproduce the problem:
Unfortunately, I don't have a great method for reproduction. The gitlab-runner uses
attach
to run scripts within the build container, so perhaps you could simplify the repro by doingkubectl attach
in a loop.The text was updated successfully, but these errors were encountered: