Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
pkg/k8s/watcher: fix deadlock with service event handler & CES watcher.
There is a deadlock that can occur when a k8s service update and a policy update occur at the same time. In practice, this can occur in the following situation: 1. CiliumEndpointSlice k8s watcher performs an update due to a new watcher event. The handler logic for this first goes to hold a lock on the IPCache. Next, this triggers an endpoint regeneration via the endpoint manager. Note: This code path will wait for endpoint regeneration to complete via a passed WaitGroup. To complete this task, endpoint manager attempts to lock policyRepository. Effectively, this means that CES handler has locking dependencies on IPCache's lock and policyRepos lock (transitively, by waiting on endpointManager endpoint regeneration). It will not release the IPCache lock until endpoint regen is done, thus waiting on the policyRepo lock. 2. The k8sServiceHandler control loop performs an update due to kube-apiserver service record change (i.e. this is common on EKS where the control plane IPs change often). A new policyRepository.Translator is constructed with k8s.RuleTranslator{} with AllocatedPrefixes being enabled. This implementation of the Translator holds a reference to ipcache and uses that to make necessary prefix updates to ipcache during the translation. This is passed to policyRepository to perform policy rule translation, which locks itself before proceeding to use translator.Translate(...) to perform translation on its state. The k8sServiceHandler now holds nested locks on policyRepo -> ipcache. At this point, let's say codepath 1. can is holding a lock on both ipcache and waiting on a lock for policyRepo (nested ipCache -> policyRepo). At the same time, codepath 2. (i.e. k8sServiceHandler) just grabbed a policyRepo lock and is waiting for the ipcache lock. Codepath 2 (which holds policyRepo) needs ipcache to unlock, which is held by Codepath 1, Which is waiting for policyRepo to unlock. The following is a stack trace of such a case occurring: 101 occurences. Sample stack trace: 6 occurences. Sample stack trace: sync.runtime_SemacquireMutex(0xc0018f0e08?, 0x20?, 0xc000c12740?) /usr/local/go/src/runtime/sema.go:71 +0x25 sync.(*RWMutex).RLock(...) /usr/local/go/src/sync/rwmutex.go:63 github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regeneratePolicy(0xc0010c7c00) /go/src/github.com/cilium/cilium/pkg/endpoint/policy.go:198 +0x11a github.com/cilium/cilium/pkg/endpoint.(*Endpoint).runPreCompilationSteps(0xc0010c7c00, 0xc0005be400) /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:814 +0x2c5 github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerateBPF(0xc0010c7c00, 0xc0005be400) /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:584 +0x189 github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerate(0xc0010c7c00, 0xc0005be400) /go/src/github.com/cilium/cilium/pkg/endpoint/policy.go:398 +0x7a5 github.com/cilium/cilium/pkg/endpoint.(*EndpointRegenerationEvent).Handle(0xc0099405b0, 0x2a27540?) /go/src/github.com/cilium/cilium/pkg/endpoint/events.go:53 +0x325 github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).run.func1() /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:245 +0x13b sync.(*Once).doSlow(0x2f14d01?, 0x4422a5?) /usr/local/go/src/sync/once.go:68 +0xc2 sync.(*Once).Do(...) /usr/local/go/src/sync/once.go:59 github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).run(0x0?) /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:233 +0x45 created by github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).Run /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:229 +0x76 1 occurences. Sample stack trace: sync.runtime_Semacquire(0xc0003f44d0?) /usr/local/go/src/runtime/sema.go:56 +0x25 sync.(*WaitGroup).Wait(0xc0003f5420?) /usr/local/go/src/sync/waitgroup.go:136 +0x52 github.com/cilium/cilium/pkg/ipcache.(*IPCache).UpdatePolicyMaps(0xc001003580, {0x3468338, 0xc00007e038}, 0xa?, 0xc008c15e60) /go/src/github.com/cilium/cilium/pkg/ipcache/metadata.go:235 +0xc7 github.com/cilium/cilium/pkg/ipcache.(*IPCache).removeLabelsFromIPs(0xc001003580, 0xc005d73778?, {0x2f35b2b, 0xf}) /go/src/github.com/cilium/cilium/pkg/ipcache/metadata.go:414 +0x7c5 github.com/cilium/cilium/pkg/ipcache.(*IPCache).RemoveLabelsExcluded(0xc001003580, 0xc0000e3110, 0xc001506dd8?, {0x2f35b2b, 0xf}) /go/src/github.com/cilium/cilium/pkg/ipcache/metadata.go:328 +0x1ab github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).handleKubeAPIServerServiceEPChanges(0xc001586d80, 0xc003ec89b0?) /go/src/github.com/cilium/cilium/pkg/k8s/watchers/endpoint.go:135 +0x5b github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).addKubeAPIServerServiceEPSliceV1(0xf3c386?, 0xc001ab7d40) /go/src/github.com/cilium/cilium/pkg/k8s/watchers/endpoint_slice.go:205 +0x452 github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).updateK8sEndpointSliceV1(0xc001586d80, 0xc001ab7d40?, 0xc001ab7d40?) /go/src/github.com/cilium/cilium/pkg/k8s/watchers/endpoint_slice.go:178 +0x69 github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).endpointSlicesInit.func2({0x2ec7ea0?, 0xc00294c410?}, {0x2ec7ea0, 0xc001ab7d40}) /go/src/github.com/cilium/cilium/pkg/k8s/watchers/endpoint_slice.go:71 +0x125 k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...) /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:239 github.com/cilium/cilium/pkg/k8s/informer.NewInformerWithStore.func1({0x2a4b9c0?, 0xc00057d1e8?}) /go/src/github.com/cilium/cilium/pkg/k8s/informer/informer.go:103 +0x2fe k8s.io/client-go/tools/cache.(*DeltaFIFO).Pop(0xc001b805a0, 0xc000927940) /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/delta_fifo.go:554 +0x566 k8s.io/client-go/tools/cache.(*controller).processLoop(0xc001bda1b0) /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:184 +0x36 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x40d6a5?) /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x3e k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xed53e5?, {0x343e1c0, 0xc000d50450}, 0x1, 0xc000929980) /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xb6 k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc001bda218?, 0x3b9aca00, 0x0, 0x30?, 0x7f587b87fd30?) /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x89 k8s.io/apimachinery/pkg/util/wait.Until(...) /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 k8s.io/client-go/tools/cache.(*controller).Run(0xc001bda1b0, 0xc000929980) /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:155 +0x2c5 created by github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).endpointSlicesInit /go/src/github.com/cilium/cilium/pkg/k8s/watchers/endpoint_slice.go:156 +0x759 1 occurences. Sample stack trace: sync.runtime_SemacquireMutex(0xc000880000?, 0x20?, 0x21?) /usr/local/go/src/runtime/sema.go:71 +0x25 sync.(*RWMutex).RLock(...) /usr/local/go/src/sync/rwmutex.go:63 github.com/cilium/cilium/pkg/ipcache.(*metadata).get(0xc00104f770?, {0xc0069e9160?, 0x9?}) /go/src/github.com/cilium/cilium/pkg/ipcache/metadata.go:90 +0x66 github.com/cilium/cilium/pkg/ipcache.(*IPCache).GetIDMetadataByIP(...) /go/src/github.com/cilium/cilium/pkg/ipcache/metadata.go:86 github.com/cilium/cilium/pkg/ipcache.(*IPCache).AllocateCIDRs(0xc001003580, {0xc008680cf0, 0x2, 0x0?}, {0x0, 0x0, 0x0?}, 0x0) /go/src/github.com/cilium/cilium/pkg/ipcache/cidr.go:57 +0x22b github.com/cilium/cilium/pkg/k8s.RuleTranslator.generateToCidrFromEndpoint({0xc001003580, {{0xc005bb63c0, 0xa}, {0xc005bb6378, 0x7}}, {0xc008c15e00}, 0xc001905e60, 0x0, 0x1}, 0xc001f667e0, ...) /go/src/github.com/cilium/cilium/pkg/k8s/rule_translate.go:124 +0xb3 github.com/cilium/cilium/pkg/k8s.RuleTranslator.populateEgress({0xc001003580, {{0xc005bb63c0, 0xa}, {0xc005bb6378, 0x7}}, {0xc008c15e00}, 0xc001905e60, 0x0, 0x1}, 0xc001f667e0, ...) /go/src/github.com/cilium/cilium/pkg/k8s/rule_translate.go:62 +0x172 github.com/cilium/cilium/pkg/k8s.RuleTranslator.TranslateEgress({0xc001003580, {{0xc005bb63c0, 0xa}, {0xc005bb6378, 0x7}}, {0xc008c15e00}, 0xc001905e60, 0x0, 0x1}, 0xc001f667e0, ...) /go/src/github.com/cilium/cilium/pkg/k8s/rule_translate.go:51 +0x18e github.com/cilium/cilium/pkg/k8s.RuleTranslator.Translate({0xc001003580, {{0xc005bb63c0, 0xa}, {0xc005bb6378, 0x7}}, {0xc008c15e00}, 0xc001905e60, 0x0, 0x1}, 0xc001c66750, ...) /go/src/github.com/cilium/cilium/pkg/k8s/rule_translate.go:33 +0x117 github.com/cilium/cilium/pkg/policy.(*Repository).TranslateRules(0xc0003f5490, {0x3440260, 0xc0025fd280}) /go/src/github.com/cilium/cilium/pkg/policy/repository.go:627 +0x10b github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).k8sServiceHandler.func1({0x0, {{0xc005bb63c0, 0xa}, {0xc005bb6378, 0x7}}, 0xc0015f0c80, 0x0, 0xc003165f50, 0xc001bc9c80}) /go/src/github.com/cilium/cilium/pkg/k8s/watchers/watcher.go:586 +0xc9e github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).k8sServiceHandler(0xc001586d80) /go/src/github.com/cilium/cilium/pkg/k8s/watchers/watcher.go:623 +0x9f created by github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).RunK8sServiceHandler /go/src/github.com/cilium/cilium/pkg/k8s/watchers/watcher.go:629 +0x56 This commit solves this situation by moving the IPCache allocation out of the k8s.RuleTranslator Translator implementation. Thus moving the responsibility of the IPCache updating out of the translator. This removes the nested policyRepo -> ipcache locks in translator. So, in situations like the one described, the translation no longer has a dependency on ipcache. Codepath 2 will be able to complete, releasing the policyRepo lock and allowing Codepath 1 to proceed. Note: Rule translation prefixes are not used in other usages of k8s.RuleTranslator called from endpoint watcher handler. So we don't have to add the same ipcache logic as in k8sServiceHandler. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Reported-by: Michi Mutsuzaki <michi@isovalent.com>
- Loading branch information