-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.7 backports 2020-03-10 #10530
v1.7 backports 2020-03-10 #10530
Conversation
[ upstream commit 5bf90eb ] UpdateGenerateDNS updated the cache and generate selector updates as two separate lock-holding sections. When updates happened concurrently, via DNS lookups, DNS GC, or endpoint restores, the snapshot of data that eventually updated the selectors could be out-of-order. This created an A-B-A race for selectors that needed an update and those that needed to be cleared. These races have been observed on agent starts with pending pods but may manifest between multiple pods making DNS updates or the DNS garbage collector controller and any pod. The code now holds the NameManager lock for the entirety of UpdateGenerateDNS and ForceRegenerateDNS. This forces the update to complete in its entirety before allowing another to modify the FQDN cache or registered selectors. The locking order between the NameManager lock and the SelectorCache lock has been accidental so far. When removing a selector user from the cache, the SelectorCache lock is taken, and if an FQDN selector has no more users, the NameManager is notified of this, which internally takes the NameManager lock. Reverse this by explicitly locking NamaManager for the duration of selector cache operations. From now on, if the two locks are to be held at the same time, the NameManager mutex MUST be taken first, the SelectorCache mutex second, as the name manager is a higher level construct than the selector cache. Exposing 'Lock()' and 'Unlock()' operations in an interface is not ideal, but it helps make the lockinng order explicit, so it also serves as documentation on the required locking order. NameManager now return IDs also if the selector has already been registered. This should help not get into a state where FQDNs are not passed to the selector cache in case of a race condition. Signed-off-by: Ray Bejjani <ray@isovalent.com> Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
@@ -71,6 +71,9 @@ func allocateCIDRs(prefixes []*net.IPNet) ([]*identity.Identity, error) { | |||
allocateCtx, cancel := context.WithTimeout(context.Background(), option.Config.IPAllocationTimeout) | |||
defer cancel() | |||
|
|||
if IdentityAllocator == nil { | |||
return nil, fmt.Errorf("IdentityAllocator not initialized!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error strings should not be capitalized or end with punctuation or a newline
delete(d.selectors, selector) | ||
} | ||
|
||
func (d *DummyIdentityNotifier) InjectIdentitiesForSelector(fqdnSel api.FQDNSelector, ids []identity.NumericIdentity) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exported method DummyIdentityNotifier.InjectIdentitiesForSelector should have comment or be unexported
selectors map[api.FQDNSelector][]identity.NumericIdentity | ||
} | ||
|
||
func NewDummyIdentityNotifier() *DummyIdentityNotifier { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exported function NewDummyIdentityNotifier should have comment or be unexported
"github.com/cilium/cilium/pkg/policy/api" | ||
) | ||
|
||
type DummyIdentityNotifier struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exported type DummyIdentityNotifier should have comment or be unexported
test-me-please |
v1.7 backports 2020-03-10
Once this PR is merged, you can update the PR labels via:
This change is