-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v1.14] Fix bug where restored CIDR identities would be incorrectly released, leading to packet loss #27305
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The first use case for this is truncating a slice of netip.Prefix for logging purposes. Ideally it'd be sorted, but netip.Prefix isn't yet comparable. There's upstream work going on for that, so in the mean time let's just pick any random keys from the map and return N of them. Signed-off-by: Joe Stringer <joe@cilium.io>
Some of the lines of this function will be shared by an upcoming commit. Refactor this separately to minimize diff. No functional changes. Signed-off-by: Joe Stringer <joe@cilium.io>
On startup the agent restores CIDR identities from the ipcache map and allocates references to those Security Identities temporarily, until the identity-restore-grace-period is complete. At the end of this period, the CIDR Identities may be released. Although many of these CIDR identities will be referenced from the FQDN selectors associated with the DNS Name Manager, none of the references will be held by the name manager or its FQDN selectors due to the way that internal reference counting works for Identities in such rules. See commit de10e82 ("Fix identity leak via FQDN selectors") for more history on the structure regarding identity reference counts in the tree today. The unfortunate consequence of this structure is that once the identity-restore-grace-period completes, these identities will be forgotten about until the next time a DNS lookup occurs for the names associated with these IP addresses. This can cause connectivity disruption for applications following an upgrade or restart of the agent, if that connectivity is enabled by ToFQDNs policy. One might be tempted to think, well what if we just don't release those CIDR Identity references? If we don't release them, then Cilium won't forget how to properly enforce policy for those Identities, and traffic will continue to flow. However, in this scenario, regardless of whether applications continue to leverage the CIDR identities, Cilium will keep these Identities alive persistently for the lifetime of the current agent, and even persist them on to the next time the agent starts up. These identities are generated based on DNS lookup activity during the agent lifetime, and there is currently no limit on how many of these Identities there may be. If these are not appropriately garbage collected, then endpoint policymaps may fill more easily, leading to incorrect policy enforcement or even rejecting the scheduling of new endpoints. Well, then why don't we identify exactly which IP addresses are being used by the FQDN selectors and allocate additional references to the corresponding CIDR Identities on startup? The current code structure doesn't have any sort of bootstrap phase where these restored CIDRs are identified with the context of the corresponding FQDNs. Additionally, any particular restored Identity for an IP could be selected by multiple selectors, and the name manager expects to retain a reference to the identity for each selector. So it's not enough just to reference count the identities once per CIDR. Furthermore, this means that the policy engine must have done a full policy calculation first in order to initialize the FQDN selectors and attach them to the name manager, so that we can determine how many references there may be to each CIDR Identity! At some point we should probably take a longer harder look at this and maybe revisit some of the design decisions from commit de10e82 ("Fix identity leak via FQDN selectors") around reference counting. In the interests of preparing a safe, small, backportable fix, this commit explores alternate short term options. With that in mind, this commit proposes the following solution: wait until the identity-restore-grace-period is completed and the identities have their references released, then attempt to re-allocate the identity references used by the name manager at that time. Any identities that are already referenced will have referenced allocated and then removed again by this new logic, resulting in a no-op. Any newly allocated identities will be reinserted into the ipcache. This may cause brief connectivity disruption to workloads impacted by the latter category of identities, but this is better than having persistent disruption. Signed-off-by: Joe Stringer <joe@cilium.io>
joestringer
added
the
backport/1.14
This PR represents a backport for Cilium 1.14.x of a PR that was merged to main.
label
Aug 6, 2023
maintainer-s-little-helper
bot
added
the
dont-merge/needs-release-note-label
The author needs to describe the release impact of these changes.
label
Aug 6, 2023
joestringer
had a problem deploying
to
release-base-images
August 6, 2023 18:27
— with
GitHub Actions
Error
joestringer
added
kind/backports
This PR provides functionality previously merged into master.
and removed
dont-merge/needs-release-note-label
The author needs to describe the release impact of these changes.
labels
Aug 6, 2023
/test-backport-1.14 |
Boris, Jason and Kim-Eirik report that ten minutes after startup, Cilium releases CIDR identities that are allowing Cilium to implement network policy, which leads to connectivity outages. Boris traced this down to commit c96b9d8 ("ipcache: Remove superfluous if condition") which attempted to solve a reference leak issue in the core of ipcache. Upon deeper inspection, the agent's CIDR Identity restore logic was not playing well with network policy allocations of CIDR Identity in the ipcache, due to a misalignment of expectations of ownership over Identity and IPcache entry structures between the two agent modules. This commit resolves the misalignment by moving the CIDR Identity restore logic over to the newer IPCache APIs, thereby balancing the reference counting ownership between the different code paths. Previously the daemon on startup would: 1. Read previously-used CIDRs from the bpf ipcache map 2. Allocate the old numeric identities for the corresponding CIDRs 3. Directly inject entries into the new bpf ipcachemap for these CIDR to Identity mappings, to be used by an upcoming datapath reinitialization. 4. Wait for the identity-restore-grace-period (default 10m) 5. Release its references to the identities 6. If these are the final references on the identities, attempt to remove the corresponding prefix->identity mappings from the ipcache. The goal is to ensure that after Cilium starts up, it tries to retain the same identity numbers for the same prefixes in order to minimize the diff between policymaps as endpoints are progressively migrated from the datapath configured by the old Cilium agent towards the new one understood by this newly started agent. Step (3) above uses the old ipcache APIs that delegate responsibility to callers to properly reference count the Identities. However, nothing was stopping other parts of Cilium from configuring network policies for the same IP prefixes into the ipcache using the new ipcache APIs, where the ipcache package itself takes care of reference counting. If another subsystem like the policy engine requests the ipcache to allocate & inject an identity + IP mapping into the datapath during step (4) above, it would hit the inner loop of InjectLabels(). That logic attempts to (re-)allocate an identity for the CIDR, which would coincide with the identity allocated in Step (2) above. Subsequently, it checks whether the Identity is already pushed into the datapath. If yes, then it will release the (re-)allocated identity reference for the CIDR, thinking that it had previously grabbed a reference on the Identity. This is actually incorrect, the ipcache never acquired a reference on the Identity for itself! In the short term connectivity is kept alive because the CIDR retains a reference from the startup logic, but when the identity restore grace period runs out, this triggers persistent drops on the datapath until a user re-inserts their network policies or otherwise triggers re-evaluation of the policy paths that inject mappings into the ipcache. In this scenario, when the agent gets to step (5), it will release the final reference to each such CIDR. This releases the identity internally across the agent so that the policy engine no longer has any knowledge of it, which triggers policymap entry deletion on the datapath, leading to policy drops. As it turns out, due to the way that the ipcache entry is inserted into the datapath with a higher priority source, the logic in step (6) actually fails to clean up the datapath's ipcache entries since the entry is not really owned by the daemon restore logic, but by the ipcache at this stage. The result is that the datapath's ipcache maps certain CIDRs to identities that do not exist in the agent. In order to resolve this conflict, this commit moves steps (3) and (6) for the identity restore logic over to the new ipcache APIs, so that the ipcache can correctly allocate its own references for the identities and keep them alive. The daemon startup logic for identity restore still separately allocates and releases its own references to the underlying identities in order to ensure that the identities retain the prefix correspondence across upgrades, but this reference is no longer tied up with the old ipcache interfaces that caused the reference count ambiguity. As a matter of process to clearly delineate the codepaths for step (5), this commit also moves that code into the daemon so that there is no ambiguity over whether the daemon's identity references are tied to the ipcache or not. Fixes: c96b9d8 ("ipcache: Remove superfluous if condition") Reported-by: Boris Petrovic <carnerito.b@gmail.com> Reported-by: Kim-Eirik Karlsen <kim.eirik@gmail.com> Reported-by: Jason Witkowski <jason@witkow.ski> Signed-off-by: Joe Stringer <joe@cilium.io>
joestringer
force-pushed
the
hf/v1.14/cidr-identity-refcnt-fix
branch
from
August 6, 2023 18:39
001538c
to
703e07a
Compare
joestringer
had a problem deploying
to
release-developer-images
August 6, 2023 18:39
— with
GitHub Actions
Error
joestringer
had a problem deploying
to
release-developer-images
August 6, 2023 18:39
— with
GitHub Actions
Error
joestringer
had a problem deploying
to
release-developer-images
August 6, 2023 18:39
— with
GitHub Actions
Error
joestringer
had a problem deploying
to
release-developer-images
August 6, 2023 18:39
— with
GitHub Actions
Error
joestringer
had a problem deploying
to
release-developer-images
August 6, 2023 18:39
— with
GitHub Actions
Error
/test-backport-1.14 |
joestringer
had a problem deploying
to
release-developer-images
August 6, 2023 18:39
— with
GitHub Actions
Error
joestringer
had a problem deploying
to
release-developer-images
August 6, 2023 18:39
— with
GitHub Actions
Error
joestringer
had a problem deploying
to
release-developer-images
August 6, 2023 18:39
— with
GitHub Actions
Error
joestringer
had a problem deploying
to
release-developer-images
August 6, 2023 18:39
— with
GitHub Actions
Error
joestringer
had a problem deploying
to
release-developer-images
August 6, 2023 18:39
— with
GitHub Actions
Error
2 tasks
Superseded by an alternative proposal (#27327). |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backport/1.14
This PR represents a backport for Cilium 1.14.x of a PR that was merged to main.
kind/backports
This PR provides functionality previously merged into master.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a v1.14 branch backport for #27255 -- Fix bug where restored CIDR identities would be incorrectly released, leading to packet loss.
Once this PR is merged, you can update the PR labels via: