Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipcache: Fix refcounting with mix of APIs #27327

Merged
merged 1 commit into from Aug 10, 2023
Merged

Conversation

joestringer
Copy link
Member

@joestringer joestringer commented Aug 7, 2023

Commit c96b9d8 ("ipcache: Remove superfluous if condition")
introduced a double-free for cases where the ipcache allocates a new
(reference to a) security identity and there is already an identity
corresponding to those labels which another subsystem injected directly
into the ipcache via Upsert() and friends. This commit fixes the issue.

IPCache has two sets of APIs:

  • Older APIs like Upsert() / UpsertGeneratedIdentities which require the
    caller to retain reference counts against each Identity and balance
    allocate/release calls.
  • Newer APIs like UpsertMetadata() / UpsertLabels() / UpsertPrefixes()
    which delegate the reference counting responsibility to the ipcache
    itself, which will hold one reference to the identity for all callers
    of the APIs.

InjectLabels() is responsible for reference counting the identities
associated with ipcache entries that it manages. In order to ensure
balanced allocate and release, this function allocates and retains one
reference to the identity for itself internally within this function.
Under normal operation it will typically release this reference again
once it has performed the ipcache update operations. It may be triggered
multiple times for the same prefix, in which case it should only ever
acquire one reference to the identity, otherwise the references could
leak and the identity would not be freed.

There is one exception to the "always allocate" + "always" release
behaviour in InjectLabels(): Given that the ipcache needs to hold
exactly one reference to the identity over time to ensure it remains
live while in use by the ipcache, the very first time that
InjectLabels() allocates the identity reference, it must hold onto the
reference and not immediately release it. In particular, if another
subsystem injected an ipcache entry via older APIs, then the first time
that InjectLabels() runs, it also needs to acquire and hold a reference
to the identity, even though the underlying ipcache already contains an
CIDR -> Identity mapping that was injected via the older Upsert() API.

In commit c96b9d8 ("ipcache: Remove superfluous if condition"), the
check which was ensuring that ipcache entries that transition from being
managed by older APIs towards newer APIs would not trigger allocation of
a new identity reference to be held by the ipcache itself. The result
would be that the resting number of references to the identity would be
one less than what is necessary to keep the identity alive. If the other
subsystem ever released its reference to the identity, then that would
be the last reference and the identity would be freed, despite still
being in use.

This scenario leads to policy recalculation that removes any datapath
allow rules for the corresponding CIDRs, ultimately resulting in packet
loss for the impacted CIDRs. One such example involves CIDR identity
restore startup logic in the daemon. That path allocates identities then
injects them into the ipcache using older APIs. If any such CIDRs are
used by network policies, then the network policies subsystem will
insert the CIDR into the ipcache using newer ipcache APIs. Given this
mix of API usage and the bug introduced in the identity reference
counting internally in ipcache, this triggers the bug.

This commit fixes it by checking whether InjectLabels() already assumed
ownership over the underlying ipcache entry, which implies that the
ipcache already holds a reference to the identity on behalf of all newer
API users. During subsequent calls to InjectLabels(), the logic checks
that the ipcache entry was installed by this function and accounts for
the fact that a prior iteration had allocated an identity reference. It
can then safely release references that were acquired during the current
execution of the function.

Fixes: c96b9d8 ("ipcache: Remove superfluous if condition")
Reported-by: Boris Petrovic (@carnerito)
Reported-by: Kim-Eirik Karlsen (@kimma)
Reported-by: Jason Witkowski (@jwitko)

Related: #20116
Related: #21667
Fixes: #21667
Fixes: #27169
Fixes: #27176
Fixes: #27210

Fix bug where startup CIDR restore logic would mishandle reference counting, leading to persistent packet loss to those CIDRs

@joestringer joestringer added release-note/bug This PR fixes an issue in a previous release of Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Aug 7, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Aug 7, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.14.1 Aug 7, 2023
@joestringer
Copy link
Member Author

joestringer commented Aug 7, 2023

/ci-ginkgo

EDIT: I triggered the workflow before images were available, leading to the entire workflow failing. Trying again.

@joestringer
Copy link
Member Author

joestringer commented Aug 7, 2023

/ci-ginkgo

Passed: https://github.com/cilium/cilium/actions/runs/5790494483

@joestringer
Copy link
Member Author

joestringer commented Aug 7, 2023

/ci-ginkgo

Passed: https://github.com/cilium/cilium/actions/runs/5790980060

@joestringer
Copy link
Member Author

joestringer commented Aug 8, 2023

/ci-ginkgo

Passed except for one VM infra issue: https://github.com/cilium/cilium/actions/runs/5791347291

@joestringer
Copy link
Member Author

I've run this three times against #27253 which was able to reproduce two related other issues during development of these fixes last week. That gives me fair confidence that the fix is addressing the problem it intends to, without introducing any obvious regressions.

@joestringer joestringer changed the base branch from pr/joe/messing-with-time to main August 8, 2023 01:17
@joestringer
Copy link
Member Author

/test

@joestringer joestringer marked this pull request as ready for review August 8, 2023 02:33
@joestringer joestringer requested a review from a team as a code owner August 8, 2023 02:33
@joestringer joestringer added the release-blocker/1.14 This issue will prevent the release of the next version of Cilium. label Aug 8, 2023
Copy link
Contributor

@squeed squeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a bit of thinking, but now I get why this works.

InjectLabels always does a new := Allocate() followed by a Release(existing). Most of the time, the identity for a set of labels never changes, so new == existing and the reference count is unchanged. In the case where the ID has changed, the existing ID will be properly released and cleaned up.

The issue is when legacy callers do allocation. At that point in time, there is an entry in the ipcache that did not have an ipcache-owned Allocate(). Thus, it is not safe for the ipcache to call Release(existing) since it didn't balance it with its own Allocate()

We also cannot just change legacy callers to never Release(), since the ipcache will issue at most 0 or 1 spurious Release() calls -- and there could be more than one legacy allocation.

Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed comments. However, I lack context about the "new" internal API, so while I can make some sense of the issue based on your and Casey's comments, I don't fully appreciate the fix. Perhaps if there were a diagram explaining the lifecycle management involving allocation/release of identities, that might've helped. Two other reviewers with more context have already approved the change though, so I mainly have comments about testing and need a clarification.

I can see that couple of reporters have tested the fix. 👍 At the same time, it would be nice to have a test to ensure there are no regressions going forward. Maybe a test case can be crafted based on the sequence of events mentioned in Casey's comment?

pkg/ipcache/metadata.go Show resolved Hide resolved
@maintainer-s-little-helper maintainer-s-little-helper bot added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Aug 8, 2023
@joestringer
Copy link
Member Author

joestringer commented Aug 8, 2023

Thanks for the detailed comments. However, I lack context about the "new" internal API, so while I can make some sense of the issue based on your and Casey's comments, I don't fully appreciate the fix. Perhaps if there were a diagram explaining the lifecycle management involving allocation/release of identities, that might've helped. Two other reviewers with more context have already approved the change though, so I mainly have comments about testing and need a clarification.

I like the diagram idea, do you have some ideas about how to express lifecycle management as a diagram? Is there a canonical representation?

Broadly speaking the way it works is that there are a range of ipcache users who directly call ipcache.Upsert() or ipcache.UpsertGeneratedIdentities() with an identity. This means that the caller already discovered the identity and took a reference for that identity to ensure that it stays alive as long as it's used. In that case, the ipcache itself does not take responsibility for reference-counting the identities. Any of the other Upsert() variations like UpsertMetadata(), UpsertPrefixes(), UpsertLabels() do not directly reference the identity from other packages. The ipcache API just provides a way for the ipcache itself to allocate and assign an identity to an IP. For these cases, the ipcache is responsible for reference-counting the identities correctly. This PR deals with the interaction between the two cases, which is internally resolved in pkg/ipcache. One of my todo list items is to add comments to deprecate the older APIs to discourage usage and clarify which ones should/shouldn't be used.

I can see that couple of reporters have tested the fix. +1 At the same time, it would be nice to have a test to ensure there are no regressions going forward. Maybe a test case can be crafted based on the sequence of events mentioned in Casey's comment?

I agree about the testing. I have a PR that I'm working on that was able to point out the problem here: #27253 . Given that it's a pretty generic change, I'd like to explore how to expand it to cover a range of similar timing-related issues. That will take a bit more time, so I'll continue the work independent of this PR.

@joestringer joestringer added dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. and removed ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Aug 8, 2023
@squeed
Copy link
Contributor

squeed commented Aug 9, 2023

I'm wondering if there's an issue where InjectLabels() is called for a given prefix again. We will again call Allocate(), which will return the same ID, but will not call Release(). In fact, every time we call InjectLabels() we will leak another reference, right?

@christarazi
Copy link
Member

@squeed Previously, you mentioned

InjectLabels always does a new := Allocate() followed by a Release(existing). Most of the time, the identity for a set of labels never changes, so new == existing and the reference count is unchanged. In the case where the ID has changed, the existing ID will be properly released and cleaned up.

which is my understanding as well.

If we are not confident that's the case, then this is a pretty easy thing to unit test.

@joestringer
Copy link
Member Author

I'm wondering if there's an issue where InjectLabels() is called for a given prefix again. We will again call Allocate(), which will return the same ID, but will not call Release(). In fact, every time we call InjectLabels() we will leak another reference, right?

The new condition says "if the current entry in ipcache was not created by InjectLabels() and it's being replaced by one created by InjectLabels(), then we retain the reference". The inverse means that if the entry was already created by InjectLabels(), for instance in the case of repeated InjectLabels() calls, or if InjectLabels will not replace the entry, then the newly allocated reference will be released.

Commit c96b9d8 ("ipcache: Remove superfluous if condition")
introduced a double-free for cases where the ipcache allocates a new
(reference to a) security identity and there is already an identity
corresponding to those labels which another subsystem injected directly
into the ipcache via Upsert() and friends. This commit fixes the issue.

IPCache has two sets of APIs:
- Older APIs like Upsert() / UpsertGeneratedIdentities which require the
  caller to retain reference counts against each Identity and balance
  allocate/release calls.
- Newer APIs like UpsertMetadata() / UpsertLabels() / UpsertPrefixes()
  which delegate the reference counting responsibility to the ipcache
  itself, which will hold one reference to the identity for all callers
  of the APIs.

InjectLabels() is responsible for reference counting the identities
associated with ipcache entries that it manages. In order to ensure
balanced allocate and release, this function allocates and retains one
reference to the identity for itself internally within this function.
Under normal operation it will typically release this reference again
once it has performed the ipcache update operations. It may be triggered
multiple times for the same prefix, in which case it should only ever
acquire one reference to the identity, otherwise the references could
leak and the identity would not be freed.

There is one exception to the "always allocate" + "always" release
behaviour in InjectLabels(): Given that the ipcache needs to hold
exactly one reference to the identity over time to ensure it remains
live while in use by the ipcache, the very first time that
InjectLabels() allocates the identity reference, it must hold onto the
reference and not immediately release it. In particular, if another
subsystem injected an ipcache entry via older APIs, then the first time
that InjectLabels() runs, it also needs to acquire and hold a reference
to the identity, even though the underlying ipcache already contains an
CIDR -> Identity mapping that was injected via the older Upsert() API.

In commit c96b9d8 ("ipcache: Remove superfluous if condition"), the
check which was ensuring that ipcache entries that transition from being
managed by older APIs towards newer APIs would not trigger allocation of
a new identity reference to be held by the ipcache itself. The result
would be that the resting number of references to the identity would be
one less than what is necessary to keep the identity alive. If the other
subsystem ever released its reference to the identity, then that would
be the last reference and the identity would be freed, despite still
being in use.

This scenario leads to policy recalculation that removes any datapath
allow rules for the corresponding CIDRs, ultimately resulting in packet
loss for the impacted CIDRs. One such example involves CIDR identity
restore startup logic in the daemon. That path allocates identities then
injects them into the ipcache using older APIs. If any such CIDRs are
used by network policies, then the network policies subsystem will
insert the CIDR into the ipcache using newer ipcache APIs. Given this
mix of API usage and the bug introduced in the identity reference
counting internally in ipcache, this triggers the bug.

This commit fixes it by checking whether InjectLabels() already assumed
ownership over the underlying ipcache entry, which implies that the
ipcache already holds a reference to the identity on behalf of all newer
API users. During subsequent calls to InjectLabels(), the logic checks
that the ipcache entry was installed by this function and accounts for
the fact that a prior iteration had allocated an identity reference. It
can then safely release references that were acquired during the current
execution of the function.

Fixes: c96b9d8 ("ipcache: Remove superfluous if condition")
Reported-by: Boris Petrovic <carnerito.b@gmail.com>
Reported-by: Kim-Eirik Karlsen <kim.eirik@gmail.com>
Reported-by: Jason Witkowski <jason@witkow.ski>
Signed-off-by: Joe Stringer <joe@cilium.io>
@joestringer
Copy link
Member Author

joestringer commented Aug 9, 2023

@aditighag @christarazi @squeed I took Chris' proposal and wrote a unit test for this ipcache API call pattern. I didn't need to change the actual implementation, the new test fails on main and passes with the fix. While I was at it, I spent a bit more time rewriting the explanation for the fix. PTAL.

EDIT: Also, for context, the new test tests both that (a) the reported bug doesn't show up, ie mixed API usage does not lead to releasing too many references as well as (b) the concern above around allocating too many references doesn't come up, so repeated use of InjectLabels() should not cause reference leaks on the identities.

@joestringer joestringer removed the dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. label Aug 9, 2023
@squeed
Copy link
Contributor

squeed commented Aug 9, 2023

and it's being replaced by one created by InjectLabels(), then we retain the reference

Aha, that's the key. Should be OK then.

@joestringer
Copy link
Member Author

/test

@nebril nebril merged commit 74530b1 into main Aug 10, 2023
196 checks passed
@nebril nebril deleted the pr/joe/lost-identity-v2 branch August 10, 2023 09:33
@nebril nebril mentioned this pull request Aug 10, 2023
1 task
@nebril nebril added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Aug 10, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.14 in 1.14.1 Aug 10, 2023
@joestringer joestringer added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Aug 10, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.14 to Backport done to v1.14 in 1.14.1 Aug 10, 2023
@aditighag
Copy link
Member

aditighag commented Aug 10, 2023

The PR was merged before I could take a follow-up look, but it looks like you addressed the main review comments. The updated PR description looks great. 🚀

I like the diagram idea, do you have some ideas about how to express lifecycle management as a diagram? Is there a canonical representation?

Something for follow-ups: Listing usages of the old and new APIs annotated with subsystems that are using the old API v/s the new, and the corresponding allocate/release calls can be helpful for future PRs as well knowledge sharing. Happy to iterate over it in case you need a fresh pair of eyes. :)

@joestringer
Copy link
Member Author

@aditighag good suggestion. Following up here: #21667

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. release-blocker/1.14 This issue will prevent the release of the next version of Cilium. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
No open projects
1.14.1
Backport done to v1.14
5 participants