Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.7 backports 2020-03-10 #10530

Merged
merged 1 commit into from
Mar 10, 2020
Merged

v1.7 backports 2020-03-10 #10530

merged 1 commit into from
Mar 10, 2020

Conversation

jrajahalme
Copy link
Member

@jrajahalme jrajahalme commented Mar 10, 2020

v1.7 backports 2020-03-10

Once this PR is merged, you can update the PR labels via:

$ for pr in 10501; do contrib/backporting/set-labels.py $pr done 1.7; done

This change is Reviewable

[ upstream commit 5bf90eb ]

UpdateGenerateDNS updated the cache and generate selector updates as two
separate lock-holding sections. When updates happened concurrently, via
DNS lookups, DNS GC, or endpoint restores, the snapshot of data that
eventually updated the selectors could be out-of-order. This created an
A-B-A race for selectors that needed an update and those that needed to
be cleared.

These races have been observed on agent starts with pending pods but may
manifest between multiple pods making DNS updates or the DNS garbage
collector controller and any pod.

The code now holds the NameManager lock for the entirety of
UpdateGenerateDNS and ForceRegenerateDNS. This forces the update to
complete in its entirety before allowing another to modify the FQDN
cache or registered selectors.

The locking order between the NameManager lock and the SelectorCache
lock has been accidental so far. When removing a selector user from
the cache, the SelectorCache lock is taken, and if an FQDN selector
has no more users, the NameManager is notified of this, which
internally takes the NameManager lock. Reverse this by explicitly
locking NamaManager for the duration of selector cache operations.

From now on, if the two locks are to be held at the same time, the
NameManager mutex MUST be taken first, the SelectorCache mutex second,
as the name manager is a higher level construct than the selector
cache.

Exposing 'Lock()' and 'Unlock()' operations in an interface is not
ideal, but it helps make the lockinng order explicit, so it also
serves as documentation on the required locking order.

NameManager now return IDs also if the selector has already been
registered. This should help not get into a state where FQDNs are not
passed to the selector cache in case of a race condition.

Signed-off-by: Ray Bejjani <ray@isovalent.com>
Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
@jrajahalme jrajahalme added kind/backports This PR provides functionality previously merged into master. release-note/misc This PR makes changes that have no direct user impact. backport/1.7 labels Mar 10, 2020
@jrajahalme jrajahalme requested a review from a team as a code owner March 10, 2020 14:57
@@ -71,6 +71,9 @@ func allocateCIDRs(prefixes []*net.IPNet) ([]*identity.Identity, error) {
allocateCtx, cancel := context.WithTimeout(context.Background(), option.Config.IPAllocationTimeout)
defer cancel()

if IdentityAllocator == nil {
return nil, fmt.Errorf("IdentityAllocator not initialized!")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error strings should not be capitalized or end with punctuation or a newline

delete(d.selectors, selector)
}

func (d *DummyIdentityNotifier) InjectIdentitiesForSelector(fqdnSel api.FQDNSelector, ids []identity.NumericIdentity) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method DummyIdentityNotifier.InjectIdentitiesForSelector should have comment or be unexported

selectors map[api.FQDNSelector][]identity.NumericIdentity
}

func NewDummyIdentityNotifier() *DummyIdentityNotifier {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported function NewDummyIdentityNotifier should have comment or be unexported

"github.com/cilium/cilium/pkg/policy/api"
)

type DummyIdentityNotifier struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported type DummyIdentityNotifier should have comment or be unexported

@jrajahalme
Copy link
Member Author

test-me-please

@joestringer joestringer merged commit d19e888 into v1.7 Mar 10, 2020
@joestringer joestringer deleted the pr/v1.7-backport-2020-03-10 branch March 10, 2020 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants