New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix identity leak via FQDN selectors #17699
Fix identity leak via FQDN selectors #17699
Conversation
6edab40
to
b02d727
Compare
/test |
b02d727
to
0f5287e
Compare
This comment has been minimized.
This comment has been minimized.
0f5287e
to
3b9209b
Compare
/test |
pkg/ipcache/cidr.go
Outdated
releaseCtx, cancel := context.WithTimeout(context.Background(), option.Config.KVstoreConnectivityTimeout) | ||
defer cancel() | ||
identities = append(identities, id) | ||
} else { | ||
log.Errorf("Unable to find identity of previously used CIDR %s", prefix.String()) | ||
} | ||
} | ||
|
||
released, err := IdentityAllocator.Release(releaseCtx, id) | ||
if err != nil { | ||
log.WithError(err).Warningf("Unable to release identity for CIDR %s. Ignoring error. Identity may be leaked", prefix.String()) | ||
} | ||
releaseCIDRIdentities(context.Background(), identities) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we need the previous context.WithTimeout
bounded by the kvstore connectivity timeout anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's a little bit awkward but basically CIDR identities never require kvstore/k8s allocation, so technically it doesn't matter whether we have a context here or not. Technically it also can't fail (as far as I understand).
We had initially relied on clusterwide coordination of these identities for CIDRs, but at some point we switched it to local-only and we haven't gone back through and established clear patterns for how to handle this allocation interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, that makes sense thanks. Would be nice in the future to split regular identities allocation from CIDR identity allocation via a different call. That separation itself is useful for documentation purposes too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we resisted this in the past because abstractly, you just need an identity for a set of labels. The notion that CIDR labels are different and don't require clusterwide co-ordination is more of an optimization.
AllocateCIDRsForIPs(ips []net.IP, newlyAllocatedIdentities map[string]*identity.Identity) ([]*identity.Identity, error) | ||
|
||
// ReleaseCIDRIdentitiesByID() is a wrapper for ReleaseSlice() that | ||
// also handles ipcache entries. | ||
ReleaseCIDRIdentitiesByID(context.Context, []identity.NumericIdentity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding these funcions to the pkg/identity/cache:IdentityAllocator
interface here is a bit awkward given that the intent is for these calls to go via ipcache in the real implementation, but I think we probably just need to come up with a bit clearer ownership/structure/relationship between the daemon, fqdn, policy, ipcache and identity subsystems in order to tidy this up properly for longer term. I'm not planning to address that right now in this bugfix.
I think this is also why the 'newlyAllocatedIdentities' parameter is necessary, I think that if we make the relationship clearer then that parameter should be able to go away.
df99f69
to
bdeec70
Compare
/test |
/test-runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the very thorough explanation and commit msgs. Great to see that the code docs are getting updated with what the caller should expect after making the call 👍.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for splitting up the changes into bite-size commits :-)
/ci-eks |
From further testing, I found that this is not quite enough to cover all of the cases, but these steps are correct and necessary to partially resolve the original bug report. I think we should merge this as-is and then I can follow up separately with another PR to cover the other case. |
The mock identity notifier and identity allocator make sense together under pkg/testutils/identity, and refactoring like this will be required by upcoming commits in order to avoid circular imports. Signed-off-by: Joe Stringer <joe@cilium.io>
This constructor will simplify upcoming commits by performing all of the necessary initialization for the FakeIdentityAllocator in all tests that make use of this structure. Signed-off-by: Joe Stringer <joe@cilium.io>
Upcoming commits will rely on the daemon passing the identityAllocator module down to the fqdn subsystem via the policy repository. For now, adjust the function signatures to pass this through, but avoid actually using the new parameter for now. This commit is a functional no-op but reduces the noise in terms of code churn from upcoming commits. Signed-off-by: Joe Stringer <joe@cilium.io>
In order to test an upcoming bugfix, the internal FQDN handling functions that are triggered from the unit tests must be able to refer to an abstract identity allocator. This will then be mocked out by the tests. This commit extends the existing cache.IdentityAllocator interface to support allocating identities for CIDRs. A future commit will also add Identity release functions to the same abstraction. This also paves the way for some improvements to code structure: Some of the FQDN logic under `daemon/cmd/fqdn.go` may be able to be pushed down into pkg/fqdn now that it doesn't directly reference the globals from the ipcache package. There's another remaining open question around code structure for how the IPcache and identity allocation packages handle the pattern for allocating CIDR identities vs. injecting those identities into the underlying IPcache maps (eg in eBPF). This commit shies away from solving that problem right now in order to keep the changes minimal as this code is likely to be proposed for backport. Signed-off-by: Joe Stringer <joe@cilium.io>
An upcoming commit will introduce a new variation on the release function, so just rename this one here to simplify the upcoming changes. Signed-off-by: Joe Stringer <joe@cilium.io>
Everything that goes up must eventually come down. And so it is also with everything that is allocated; it must be deallocated. Extend the IdentityAllocator interface to also support CIDR release functions and plumb through an initial implementation. This is currently not hooked into the agent itself, this will come with an upcoming bugfix instead. Signed-off-by: Joe Stringer <joe@cilium.io>
Users recently reported an issue where pods could not be deployed into their Cilium cluster, because the datapath regeneration for new Cilium endpoints would fail. The reason for the failure was that while attempting to initially provision the datapath for the new endpoint, Cilium would attempt to populate the endpoint's PolicyMap, but reach the limit of the total number of entries that the PolicyMap could contain. As such, in abundance of caution, Cilium backed off on attempting to provision the network for that pod and returned an error back up to the Kubelet and prevented subsequent pod provisioning. The reason for reaching the PolicyMap size was partially due to the node containing a large number of Identities, and partially due to an overly permissive network policy that programmed the new endpoints to allow traffic from all of these identities (individually). Typically this sort of problem is not present if the user specifies a `toEntities: all` policy to explicitly allow everything, but it can occasionally occur if the policy is very permissive but more complex than this explicit allow-all policy. One such example would be using `toFQDNs` policy with a MatchPattern on all domains. One of the curious aspects that we discovered while investigating this issue was that if the cilium-agent pods were restarted, the issue would disappear. This is indeed one potential mitigation for the issue. Furthermore, when comparing the `cilium policy selectors` output between sysdump output of a node where the problem is present vs. where the problem is not present, we discovered that there were CIDR identities derived from `toFQDNs` policy even when `toFQDNs` policy was not present. Upon further code inspection along with Chris Tarazi, we observed that the NameManager is responsible for handling allocation of Identities to associate with CIDR policies, and that the name manager then hands those identities over to the SelectorCache in order to be associated with individual toFQDNs policy selectors, and then passes a final list of identities up to the Daemon in order to plumb the underlying IPCache map. However, none of these layers were taking responsibility for releasing references to these Identities upon expiry of the entry that triggered the CIDR Identity allocation. Allocation path: Daemon.notifyOnDNSMsg() -> NameManager.UpdateGenerateDNS() -> NameManager.Config.UpdateSelectors -> Daemon.updateSelectors() -> identitiesForFQDNSelectorIPs() (in daemon/cmd/fqdn.go) This leaves us with a couple of options for releasing these references: * Associate each identity with the NameManager and have the NameManager release the references when the entry expires * Associate each identity with the FQDN selectors that are actively using it, and release the references when the FQDN selectors are released. Although I can see arguments for perhaps associating these references with the NameManager itself as it tracks the usage of each IP for FQDN policy and hence could tie the reference to the lifetime of the usage of the IP overall in the agent, so far the code has implicitly tied the allocation of each identity to its usage in each selector. As such, this commit attempts to simply acknowledge this reality and extend the update/release of the selectors in the NameManager to also release those same references that each selector holds. Practically speaking this means passing the allocated references from: identitiesForFQDNSelectorIPs() through to the sibling function updateSelectorCacheFQDNs(): Daemon.notifyOnDNSMsg() -> NameManager.UpdateGenerateDNS() -> NameManager.Config.UpdateSelectors -> Daemon.updateSelectors() -> identitiesForFQDNSelectorIPs() (allocation) -> updateSelectorCacheFQDNs() -> SelectorCache.{U,u}pdateFQDNSelector() Here we're basically just taking the new complete set of identities to be associated with the selector, and merging them into the existing selector in the cache. As such, we need to handle each case where an identity may have a duplicate reference between the existing cache and the new list, or any case where an identity is removed from the selector, and then collect those for subsequent release of the identity. Conveniently for the above approach, the release of the selectors is handled through the same core function, just coming from the caller SelectorCache.RemoveIdentitiesFQDNSelectors(). So this means we can contain the modifications necessary to resolve this bug to the selectorcache code. Signed-off-by: Joe Stringer <joe@cilium.io>
Add a new unit test that tests the integration between the daemon DNS event handling logic and the core FQDN and policy (selector) subsystems. The core idea here is to validate that when DNS events are handled that associate FQDNs to IPs, ensure that the reference counting on CIDR identities remains sane. To this end, we put a mock identity reference counter into the Daemon (using pkg/counter), then trigger DNS events to occur. By exercising this core Daemon / FQDN / selector code with our own "known-good" reference counting identity allocator backend, we can check that the intermediate code is appropriately balancing the identity allocation and release. The resulting test ensures that even if we trigger DNS events multiple times, the core agent code should not leak identity references. Signed-off-by: Joe Stringer <joe@cilium.io>
bdeec70
to
a7b9f13
Compare
I rebased the PR against a minor conflict in the tree (go includes ordering). Should be good to merge as long as the smoke tests pass. |
Review commit by commit.
Main bugfix commit message included here for review convenience: