Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
eventqueue: Forcefully drain to prevent deadlock
This commit ensures that the EventQueue is fully drained, even when its not running its loop. When endpoints are being restored, their EventQueue is initialized, but non-running state (processing events). It is the job of the endpoint manager to kick off the event loop by calling Expose() on the endpoint. This commit fixes the following commits which causes Cilium to be stuck waiting for the EventQueue to drain (WaitToBeDrained()): 290d9e9 ("daemon: Init endpoint queue during validation") 79bf425 ("endpoint: Add function to initialize event queue") Cilium becoming stuck is described in the following flow: - Endpoints began restoration - Endpoint's EventQueue initialized (but never run) - Endpoint's metadata data resolver controller kicked off - Visilbity and bandwidth policy events enqueued - Endpoint fails restoration due to some issue (e.g. interface not found, etc) - Endpoint queued for deletion because it failed restoration - As part of endpoint deletion, the EventQueue is stopped and drained - Cilium deadlocks trying to drain, but the EventQueue run loop was never run, which would pop events off the `events` channel, and close the `eventsClosed` channel This commit fixes this deadlock by forcefully running the event loop to drain the queue. After the `events` channel is closed (from Stop()), the loop will terminate and the `eventsClosed` channel will close, thereby unblocking WaitToBeDrained(). Stacktrace from `gops`: ``` goroutine 632 [chan receive, 1 minutes]: github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).WaitToBeDrained(0xc00013c960) /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:322 +0x1ad github.com/cilium/cilium/pkg/endpoint.(*Endpoint).Delete(0xc000ad6900, 0x27faee0, 0xc00062ac40, 0x27fba60, 0xc00099a120, 0x2877280, 0xc0005d2340, 0x430101, 0x0, 0x0, ...) /go/src/github.com/cilium/cilium/pkg/endpoint/endpoint.go:2194 +0x91 github.com/cilium/cilium/daemon/cmd.(*Daemon).deleteEndpointQuiet(...) /go/src/github.com/cilium/cilium/daemon/cmd/endpoint.go:674 github.com/cilium/cilium/daemon/cmd.(*Daemon).regenerateRestoredEndpoints.func2(0xc00062ac40, 0xc000b63214, 0xc000ad6900) /go/src/github.com/cilium/cilium/daemon/cmd/state.go:302 +0x7c created by github.com/cilium/cilium/daemon/cmd.(*Daemon).regenerateRestoredEndpoints /go/src/github.com/cilium/cilium/daemon/cmd/state.go:296 +0x8a0 ``` Signed-off-by: Chris Tarazi <chris@isovalent.com>
- Loading branch information