-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1.13] Address selectorcache concurrent read/write #29186
Merged
nathanjsweet
merged 1 commit into
v1.13
from
pr/tklauser/v1.13-fix-selectorcache-locking
Nov 16, 2023
Merged
[1.13] Address selectorcache concurrent read/write #29186
nathanjsweet
merged 1 commit into
v1.13
from
pr/tklauser/v1.13-fix-selectorcache-locking
Nov 16, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tklauser
added
kind/bug
This is a bug in the Cilium logic.
sig/policy
Impacts whether traffic is allowed or denied based on user-defined policies.
kind/backports
This PR provides functionality previously merged into master.
sig/agent
Cilium agent related.
backport/1.13
This PR represents a backport for Cilium 1.13.x of a PR that was merged to main.
labels
Nov 15, 2023
/test-backport-1.13 |
As in 52ace8e, there have been reports of a runtime crash with with following error: fatal error: concurrent map read and map write (This commit message is heavily inspired by the above commit in the hopes of achieving a similar level of clarity.) The path for this issue is printed also in the logs, with the following call stack: github.com/cilium/cilium/pkg/policy.(*SelectorCache).GetNetsLocked(...) github.com/cilium/cilium/pkg/policy.getNets(...) github.com/cilium/cilium/pkg/policy.identityIsSupersetOf(...) github.com/cilium/cilium/pkg/policy.(*mapState).DenyPreferredInsertWithChanges(...) Given the abovementioned commit has fixed the issue on main, there must be a different call stack present in 1.13. Indeed, as shown below, there's the flow involving `addNewRedirects` which isn't covered by locking in `DistillPolicy`. INCOMING CALLS GetNetsLocked github.com/cilium/cilium/pkg/policy • selectorcache.go getNets github.com/cilium/cilium/pkg/policy • mapstate.go identityIsSupersetOf github.com/cilium/cilium/pkg/policy • mapstate.go DenyPreferredInsertWithChanges github.com/cilium/cilium/pkg/policy • mapstate.go addNewRedirectsFromDesiredPolicy github.com/cilium/cilium/pkg/policy • bpf.go addNewRedirects github.com/cilium/cilium/pkg/endpoint • bpf.go <--- No SelectorCache lock DenyPreferredInsert github.com/cilium/cilium/pkg/policy • mapstate.go ToMapState github.com/cilium/cilium/pkg/policy • l4.go addNewRedirectsFromDesiredPolicy github.com/cilium/cilium/pkg/policy • bpf.go addNewRedirects github.com/cilium/cilium/pkg/endpoint • bpf.go <--- No SelectorCache lock computeDirectionL4PolicyMapEntries github.com/cilium/cilium/pkg/policy • resolve.go computeDesiredL4PolicyMapEntries github.com/cilium/cilium/pkg/policy • resolve.go DistillPolicy github.com/cilium/cilium/pkg/policy • resolve.go <--- SelectorCache.RLock() DetermineAllowLocalhostIngress github.com/cilium/cilium/pkg/policy • mapstate.go DistillPolicy github.com/cilium/cilium/pkg/policy • resolve.go <--- SelectorCache.RLock() computeDirectionL4PolicyMapEntries github.com/cilium/cilium/pkg/policy • resolve.go computeDesiredL4PolicyMapEntries github.com/cilium/cilium/pkg/policy • resolve.go DistillPolicy github.com/cilium/cilium/pkg/policy • resolve.go <--- SelectorCache.RLock() consumeMapChanges github.com/cilium/cilium/pkg/policy • mapstate.go ConsumeMapChanges github.com/cilium/cilium/pkg/policy • resolve.go <--- SelectorCache.Lock() Read the above tree as "GetNetsLocked() is called by getNets()", "getNets() is called by entryIdentityIsSupersetOf()", and so on. Siblings at the same level of indent represent alternate callers of the function that is one level of indentation less in the tree, ie addNewRedirectsFromDesiredPolicy and DenyPreferredInsert() both call DenyPreferredInsertWithChanges(). As annotated above, we see that calls through addNewRedirects() do not grab the SelectorCache lock. Given that consumeMapChanges() grabs the SelectorCache lock, we cannot introduce a new lock acquisition in any descendent function, otherwise it would introduce a deadlock in goroutines that follow that call path. This provides us the option to lock at some point from the sibling of consumeMapChanges() or higher in the call stack. As addNewRedirectsFromDesiredPolicy calls DenyPreferredInsertWithChanges both directly and indirectly (via ToMapState), it represents the lower bound of the call stack range where we can lock. The further up we traverse in the call stack, the greater the risk of a deadlock becomes, as more and more code now runs under the SelectorCache lock. Hence, we opt for locking at the lower bound, in addNewRedirectsFromDesiredPolicy. Unfortunately, this function lives in pkg endpoint, not policy, and we thus need to allow pkg-external access to the SelectorCache mutex, which is not great. The only silver lining is that this hack has an expiration time, as it is not present on main. Fixes: 90f8859 ("endpoint: Remove GetLabelsLocked") Co-authored-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
tklauser
force-pushed
the
pr/tklauser/v1.13-fix-selectorcache-locking
branch
from
November 15, 2023 14:04
c2d0cb4
to
2dba4dd
Compare
/test-backport-1.13 Reworded commit message and added a comment, following #29167 (comment) |
bimmlerd
approved these changes
Nov 16, 2023
julianwiedmann
approved these changes
Nov 16, 2023
Blocking merge until @cilium/sig-policy reviewed. |
nathanjsweet
approved these changes
Nov 16, 2023
maintainer-s-little-helper
bot
added
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
labels
Nov 16, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backport/1.13
This PR represents a backport for Cilium 1.13.x of a PR that was merged to main.
kind/backports
This PR provides functionality previously merged into master.
kind/bug
This is a bug in the Cilium logic.
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
sig/agent
Cilium agent related.
sig/policy
Impacts whether traffic is allowed or denied based on user-defined policies.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See commit message for details.
v1.13 version of #29167