kubelet crash loop panic with SIGSEGV #124871

garymm · 2024-05-14T18:02:00Z

What happened?

I was running some pods on my node as usual.
At some point (23:41:35.127989 in the log), the kubelet crashed and then went into a crash loop.

Log showing the first crash and several thereafter: kubelet-crash.txt

The errors I see shortly before the crash are:

projected.go:292] Couldn't get configMap default/kube-root-ca.crt: object "default"/"kube-root-ca.crt" not registered
projected.go:198] Error preparing data for projected volume kube-api-access-lwp5v for pod default/run-fluid-wtw5n-99kwf: object "default"/"kube-root-ca.crt" not registered
nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/projected/e6cfc866-a5b2-4e16-9b83-884de8552d45-kube-api-access-lwp5v podName:e6cfc866-a5b2-4e16-9b83-884de8552d45 nodeName:}" failed. No retries permitted until 2024-05-13 23:41:38.8756138 +0000 UTC m=+9176.959811707 (durationBeforeRet
ry 2m2s). Error: MountVolume.SetUp failed for volume "kube-api-access-lwp5v" (UniqueName: "kubernetes.io/projected/e6cfc866-a5b2-4e16-9b83-884de8552d45-kube-api-access-lwp5v") pod "run-fluid-wtw5n-99kwf" (UID: "e6cfc866-a5b2-4e16-9b83-884de8552d45") : object "default"/"kube-root-ca.crt" not registered

Stack trace from the first crash (later stack traces are a bit different):

goroutine 401 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x3d005c0?, 0x6d89410})
        vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x480e338?})
        vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75
panic({0x3d005c0, 0x6d89410})
        /usr/local/go/src/runtime/panic.go:884 +0x213
errors.As({0x12, 0x0}, {0x3eac4c0?, 0xc0014e6590?})
        /usr/local/go/src/errors/wrap.go:109 +0x215
k8s.io/apimachinery/pkg/util/net.IsConnectionReset(...)
        vendor/k8s.io/apimachinery/pkg/util/net/util.go:45
k8s.io/client-go/rest.(*Request).request.func2(0x6d2e8a0?, {0x12, 0x0})
        vendor/k8s.io/client-go/rest/request.go:1007 +0x79
k8s.io/client-go/rest.IsRetryableErrorFunc.IsErrorRetryable(...)
        vendor/k8s.io/client-go/rest/with_retry.go:43
k8s.io/client-go/rest.(*withRetry).IsNextRetry(0xc001587f40, {0x0?, 0x0?}, 0x0?, 0xc001ef5f00, 0xc001bb1560, {0x12, 0x0}, 0x480e328)
        vendor/k8s.io/client-go/rest/with_retry.go:169 +0x170
k8s.io/client-go/rest.(*Request).request.func3(0xc001bb1560, 0xc002913af8, {0x4c44840?, 0xc001587f40?}, 0x0?, 0x0?, 0x39efc40?, {0x12?, 0x0?}, 0x480e328)
        vendor/k8s.io/client-go/rest/request.go:1042 +0xba
k8s.io/client-go/rest.(*Request).request(0xc00199f200, {0x4c42e00, 0xc00227b0e0}, 0x2?)
        vendor/k8s.io/client-go/rest/request.go:1048 +0x4e5
k8s.io/client-go/rest.(*Request).Do(0xc00199f200, {0x4c42dc8, 0xc000196010})
        vendor/k8s.io/client-go/rest/request.go:1063 +0xc9
k8s.io/client-go/kubernetes/typed/core/v1.(*nodes).Get(0xc001db6740, {0x4c42dc8, 0xc000196010}, {0x7ffe6a355c42, 0x1d}, {{{0x0, 0x0}, {0x0, 0x0}}, {0x4c03920, ...}})
        vendor/k8s.io/client-go/kubernetes/typed/core/v1/node.go:77 +0x145
k8s.io/kubernetes/pkg/kubelet.(*Kubelet).tryUpdateNodeStatus(0xc000289400, {0x4c42dc8, 0xc000196010}, 0x44ead4?)
        pkg/kubelet/kubelet_node_status.go:561 +0xf6
k8s.io/kubernetes/pkg/kubelet.(*Kubelet).updateNodeStatus(0xc000289400, {0x4c42dc8, 0xc000196010})
        pkg/kubelet/kubelet_node_status.go:536 +0xfc
k8s.io/kubernetes/pkg/kubelet.(*Kubelet).syncNodeStatus(0xc000289400)
        pkg/kubelet/kubelet_node_status.go:526 +0x105
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc002913f28?)
        vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x4c19ae0, 0xc0008f2000}, 0x1, 0xc000180360)
        vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x2540be400, 0x3fa47ae147ae147b, 0x0?, 0x0?)
        vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x89
created by k8s.io/kubernetes/pkg/kubelet.(*Kubelet).Run
        pkg/kubelet/kubelet.go:1606 +0x58a
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x1a pc=0x47a935]

What did you expect to happen?

Ideally no crash, but at least a useful error message.

How can we reproduce it (as minimally and precisely as possible)?

Really not sure.

Anything else we need to know?

No response

Kubernetes version

Client Version: v1.28.6
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.6

Cloud provider

Bare metal.

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

kubespray

Container runtime (CRI) and version (if applicable)

containerd

containerd github.com/containerd/containerd v1.7.13 7c3aca7a610df76212171d200ca3811ff6096eb8

Related plugins (CNI, CSI, ...) and versions (if applicable)

CNI = calico v3.26.4

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2024-05-14T18:02:10Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

garymm · 2024-05-14T18:02:38Z

/sig node

saschagrunert · 2024-05-15T09:07:51Z

errors.As will panic if target is not a non-nil pointer to either a type that implements error, or to any interface type. As returns false if err is nil. But syscall.Errno clearly implements error.

kubernetes/staging/src/k8s.io/apimachinery/pkg/util/net/util.go

Lines 44 to 47 in be3af46

    
           var errno syscall.Errno 
        
           if errors.As(err, &errno) { 
        
           	return errno == syscall.ECONNRESET 
        
           }

If we assume that you're using golang 1.20.13 to build this version, then it would panic there: https://github.com/golang/go/blob/a95136a88cb8a51ede3ec2cdca4cfa3962dcfacd/src/errors/wrap.go#L109

Which golang version did you use to build this Kubernetes version?

garymm · 2024-05-15T16:44:03Z

According to the changelog, kubernentes v1.28.6 (what I'm using) is built with go 1.21.7

haircommander · 2024-05-22T17:56:50Z

/assign @rphillips
/priority important-soon

@garymm do you have a more succinct reproducer to make it easier to find the issue?

garymm · 2024-05-22T23:30:02Z

Not really sorry. The only weird thing which may be relevant:
This happened on a control plane node, and I didn't have any taints on my control plane nodes, so it was running normal pods as well as control plane pods. No idea if that's relevant.

garymm added the kind/bug Categorizes issue or PR as related to a bug. label May 14, 2024

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 14, 2024

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 14, 2024

SergeyKanzhelev added this to Triage in SIG Node Bugs May 22, 2024

k8s-ci-robot assigned rphillips May 22, 2024

k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label May 22, 2024

haircommander moved this from Triage to High Priority in SIG Node Bugs May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubelet crash loop panic with SIGSEGV #124871

kubelet crash loop panic with SIGSEGV #124871

garymm commented May 14, 2024

k8s-ci-robot commented May 14, 2024

garymm commented May 14, 2024

saschagrunert commented May 15, 2024

garymm commented May 15, 2024

haircommander commented May 22, 2024

garymm commented May 22, 2024

kubelet crash loop panic with SIGSEGV #124871

kubelet crash loop panic with SIGSEGV #124871

Comments

garymm commented May 14, 2024

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented May 14, 2024

garymm commented May 14, 2024

saschagrunert commented May 15, 2024

garymm commented May 15, 2024

haircommander commented May 22, 2024

garymm commented May 22, 2024