Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
endpoint: lock the selector cache for reading
As in 52ace8e, there have been reports of a runtime crash with with following error: fatal error: concurrent map read and map write (This commit message is heavily inspired by the above commit in the hopes of achieving a similar level of clarity.) The path for this issue is printed also in the logs, with the following call stack: github.com/cilium/cilium/pkg/policy.(*SelectorCache).GetNetsLocked(...) github.com/cilium/cilium/pkg/policy.getNets(...) github.com/cilium/cilium/pkg/policy.identityIsSupersetOf(...) github.com/cilium/cilium/pkg/policy.(*mapState).DenyPreferredInsertWithChanges(...) Given the abovementioned commit has fixed the issue on main, there must be a different call stack present in 1.12. Indeed, as shown below, there's the flow involving `addNewRedirects` which isn't covered by locking in `DistillPolicy`. INCOMING CALLS GetNetsLocked github.com/cilium/cilium/pkg/policy • selectorcache.go getNets github.com/cilium/cilium/pkg/policy • mapstate.go identityIsSupersetOf github.com/cilium/cilium/pkg/policy • mapstate.go DenyPreferredInsertWithChanges github.com/cilium/cilium/pkg/policy • mapstate.go addNewRedirectsFromDesiredPolicy github.com/cilium/cilium/pkg/policy • bpf.go addNewRedirects github.com/cilium/cilium/pkg/endpoint • bpf.go <--- No SelectorCache lock DenyPreferredInsert github.com/cilium/cilium/pkg/policy • mapstate.go ToMapState github.com/cilium/cilium/pkg/policy • l4.go addNewRedirectsFromDesiredPolicy github.com/cilium/cilium/pkg/policy • bpf.go addNewRedirects github.com/cilium/cilium/pkg/endpoint • bpf.go <--- No SelectorCache lock computeDirectionL4PolicyMapEntries github.com/cilium/cilium/pkg/policy • resolve.go computeDesiredL4PolicyMapEntries github.com/cilium/cilium/pkg/policy • resolve.go DistillPolicy github.com/cilium/cilium/pkg/policy • resolve.go <--- SelectorCache.RLock() DetermineAllowLocalhostIngress github.com/cilium/cilium/pkg/policy • mapstate.go DistillPolicy github.com/cilium/cilium/pkg/policy • resolve.go <--- SelectorCache.RLock() computeDirectionL4PolicyMapEntries github.com/cilium/cilium/pkg/policy • resolve.go computeDesiredL4PolicyMapEntries github.com/cilium/cilium/pkg/policy • resolve.go DistillPolicy github.com/cilium/cilium/pkg/policy • resolve.go <--- SelectorCache.RLock() consumeMapChanges github.com/cilium/cilium/pkg/policy • mapstate.go ConsumeMapChanges github.com/cilium/cilium/pkg/policy • resolve.go <--- SelectorCache.Lock() Read the above tree as "GetNetsLocked() is called by getNets()", "getNets() is called by entryIdentityIsSupersetOf()", and so on. Siblings at the same level of indent represent alternate callers of the function that is one level of indentation less in the tree, ie addNewRedirectsFromDesiredPolicy and DenyPreferredInsert() both call DenyPreferredInsertWithChanges(). As annotated above, we see that calls through addNewRedirects() do not grab the SelectorCache lock. Given that consumeMapChanges() grabs the SelectorCache lock, we cannot introduce a new lock acquisition in any descendent function, otherwise it would introduce a deadlock in goroutines that follow that call path. This provides us the option to lock at some point from the sibling of consumeMapChanges() or higher in the call stack. As addNewRedirectsFromDesiredPolicy calls DenyPreferredInsertWithChanges both directly and indirectly (via ToMapState), it represents the lower bound of the call stack range where we can lock. The further up we traverse in the call stack, the greater the risk of a deadlock becomes, as more and more code now runs under the SelectorCache lock. Hence, we opt for locking at the lower bound, in addNewRedirectsFromDesiredPolicy. Unfortunately, this function lives in pkg endpoint, not policy, and we thus need to allow pkg-external access to the SelectorCache mutex, which is not great. The only silver lining is that this hack has an expiration time, as it is not present on main. Fixes: d68efba (endpoint: Remove GetLabelsLocked) Co-authored-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
- Loading branch information