Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calico-kube-controller crash with concurrent map read and map write #8705

Open
zamog opened this issue Apr 9, 2024 · 0 comments
Open

calico-kube-controller crash with concurrent map read and map write #8705

zamog opened this issue Apr 9, 2024 · 0 comments

Comments

@zamog
Copy link

zamog commented Apr 9, 2024

Expected Behavior

Not crash

Current Behavior

calico-kube-controller exits every few hours with fatal error: concurrent map read and map write.
running the controller with debug reduces the race conduction to once a ~day

Possible Solution

Add mutex lock to the set function
PR 8706

Steps to Reproduce (for bugs)

Running Calico on production environments with high pod/nodes state change.
(We fail to reproduce this behavior on non production cluster)

Context

2024-03-21 16:16:16.233 [DEBUG][1] cache.go 133: converter.WorkloadEndpointData{PodName:"daemonset-daemonset-extended-1711037756-1711037762-vgqms", Namespace:"kuberhealthy", Labels:map[string]string{"app":"daemonset-daemonset-extended-1711037756-1711037762", "checkRunTime":"1711037762", "controller-revision-hash":"7f95f86fd5", "creatingInstance":"daemonset-extended-1711037756", "khcheck":"daemonset", "pod-template-generation":"1", "projectcalico.org/namespace":"kuberhealthy", "projectcalico.org/orchestrator":"k8s", "projectcalico.org/serviceaccount":"default", "source":"kuberhealthy"}, ServiceAccount:"default"} already exists in cache - comparing. type=converter.WorkloadEndpointData
2024-03-21 16:16:16.233 [DEBUG][1] workload_endpoint_default.go 56: Using prefix to create a WorkloadEndpoint veth name prefix="cali"
fatal error: concurrent map read and map write

goroutine 230 [running]:
reflect.mapaccess_faststr(0x1999240?, 0x18?, {0xc0165e0f20?, 0x1999240?})
        /usr/local/go/src/runtime/map.go:1343 +0x1e
reflect.Value.MapIndex({0x1a5bca0?, 0xc006cb6620?, 0x4b078a?}, {0x1999240, 0xc00c0abc90, 0x98})
        /usr/local/go/src/reflect/value.go:1664 +0xc5
reflect.deepValueEqual({0x1a5bca0?, 0xc010371ee0?, 0xc0017a1820?}, {0x1a5bca0?, 0xc006cb6620?, 0xc0017a1860?}, 0x80000000000?)
        /usr/local/go/src/reflect/deepequal.go:147 +0x149e
reflect.deepValueEqual({0x1bb2440?, 0xc010371ec0?, 0xc000240660?}, {0x1bb2440?, 0xc006cb6600?, 0xc00069a330?}, 0x1cac260?)
        /usr/local/go/src/reflect/deepequal.go:130 +0x1296
reflect.DeepEqual({0x1bb2440?, 0xc010371ec0?}, {0x1bb2440?, 0xc006cb6600?})
        /usr/local/go/src/reflect/deepequal.go:237 +0x2c5
github.com/projectcalico/calico/kube-controllers/pkg/cache.(*calicoCache).Set(0xc0007c0140, {0xc0159d60a0, 0x45}, {0x1bb2440, 0xc006cb6600})
        /go/src/github.com/projectcalico/calico/kube-controllers/pkg/cache/cache.go:134 +0x205
github.com/projectcalico/calico/kube-controllers/pkg/controllers/pod.NewPodController.func3({0x406818?, 0xc00072e0c0?}, {0x1cf28a0?, 0xc012d05400?})
        /go/src/github.com/projectcalico/calico/kube-controllers/pkg/controllers/pod/pod_controller.go:170 +0x55e
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)
        /go/pkg/mod/k8s.io/client-go@v0.25.12/tools/cache/controller.go:239
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
        /go/pkg/mod/k8s.io/client-go@v0.25.12/tools/cache/shared_informer.go:814 +0xf7
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x0?)
        /go/pkg/mod/k8s.io/apimachinery@v0.25.12/pkg/util/wait/wait.go:157 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00014bf38?, {0x200eae0, 0xc0004b9f50}, 0x1, 0xc00013af00)
        /go/pkg/mod/k8s.io/apimachinery@v0.25.12/pkg/util/wait/wait.go:158 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
        /go/pkg/mod/k8s.io/apimachinery@v0.25.12/pkg/util/wait/wait.go:135 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(...)
        /go/pkg/mod/k8s.io/apimachinery@v0.25.12/pkg/util/wait/wait.go:92
k8s.io/client-go/tools/cache.(*processorListener).run(0xc000f8a000?)
        /go/pkg/mod/k8s.io/client-go@v0.25.12/tools/cache/shared_informer.go:810 +0x6b
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
        /go/pkg/mod/k8s.io/apimachinery@v0.25.12/pkg/util/wait/wait.go:75 +0x5a
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start
        /go/pkg/mod/k8s.io/apimachinery@v0.25.12/pkg/util/wait/wait.go:73 +0x85

Your Environment

  • Calico version 3.26.5
  • Orchestrator version (e.g. kubernetes, mesos, rkt): k8s 1.25.16
  • Operating System and version: Rocky 8
  • Datastore: ETCD
  • Link to your project (optional):
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants