New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipcache: Fix refcounting with mix of APIs #27327
Conversation
/ci-ginkgo EDIT: I triggered the workflow before images were available, leading to the entire workflow failing. Trying again. |
/ci-ginkgo Passed: https://github.com/cilium/cilium/actions/runs/5790494483 |
/ci-ginkgo Passed: https://github.com/cilium/cilium/actions/runs/5790980060 |
/ci-ginkgo Passed except for one VM infra issue: https://github.com/cilium/cilium/actions/runs/5791347291 |
I've run this three times against #27253 which was able to reproduce two related other issues during development of these fixes last week. That gives me fair confidence that the fix is addressing the problem it intends to, without introducing any obvious regressions. |
cdcba08
to
971b8c0
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a bit of thinking, but now I get why this works.
InjectLabels always does a new := Allocate()
followed by a Release(existing)
. Most of the time, the identity for a set of labels never changes, so new == existing
and the reference count is unchanged. In the case where the ID has changed, the existing ID will be properly released and cleaned up.
The issue is when legacy callers do allocation. At that point in time, there is an entry in the ipcache that did not have an ipcache-owned Allocate()
. Thus, it is not safe for the ipcache to call Release(existing)
since it didn't balance it with its own Allocate()
We also cannot just change legacy callers to never Release()
, since the ipcache will issue at most 0 or 1 spurious Release()
calls -- and there could be more than one legacy allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed comments. However, I lack context about the "new" internal API, so while I can make some sense of the issue based on your and Casey's comments, I don't fully appreciate the fix. Perhaps if there were a diagram explaining the lifecycle management involving allocation/release of identities, that might've helped. Two other reviewers with more context have already approved the change though, so I mainly have comments about testing and need a clarification.
I can see that couple of reporters have tested the fix. 👍 At the same time, it would be nice to have a test to ensure there are no regressions going forward. Maybe a test case can be crafted based on the sequence of events mentioned in Casey's comment?
I like the diagram idea, do you have some ideas about how to express lifecycle management as a diagram? Is there a canonical representation? Broadly speaking the way it works is that there are a range of ipcache users who directly call
I agree about the testing. I have a PR that I'm working on that was able to point out the problem here: #27253 . Given that it's a pretty generic change, I'd like to explore how to expand it to cover a range of similar timing-related issues. That will take a bit more time, so I'll continue the work independent of this PR. |
I'm wondering if there's an issue where InjectLabels() is called for a given prefix again. We will again call |
@squeed Previously, you mentioned
which is my understanding as well. If we are not confident that's the case, then this is a pretty easy thing to unit test. |
The new condition says "if the current entry in ipcache was not created by InjectLabels() and it's being replaced by one created by InjectLabels(), then we retain the reference". The inverse means that if the entry was already created by InjectLabels(), for instance in the case of repeated InjectLabels() calls, or if InjectLabels will not replace the entry, then the newly allocated reference will be released. |
Commit c96b9d8 ("ipcache: Remove superfluous if condition") introduced a double-free for cases where the ipcache allocates a new (reference to a) security identity and there is already an identity corresponding to those labels which another subsystem injected directly into the ipcache via Upsert() and friends. This commit fixes the issue. IPCache has two sets of APIs: - Older APIs like Upsert() / UpsertGeneratedIdentities which require the caller to retain reference counts against each Identity and balance allocate/release calls. - Newer APIs like UpsertMetadata() / UpsertLabels() / UpsertPrefixes() which delegate the reference counting responsibility to the ipcache itself, which will hold one reference to the identity for all callers of the APIs. InjectLabels() is responsible for reference counting the identities associated with ipcache entries that it manages. In order to ensure balanced allocate and release, this function allocates and retains one reference to the identity for itself internally within this function. Under normal operation it will typically release this reference again once it has performed the ipcache update operations. It may be triggered multiple times for the same prefix, in which case it should only ever acquire one reference to the identity, otherwise the references could leak and the identity would not be freed. There is one exception to the "always allocate" + "always" release behaviour in InjectLabels(): Given that the ipcache needs to hold exactly one reference to the identity over time to ensure it remains live while in use by the ipcache, the very first time that InjectLabels() allocates the identity reference, it must hold onto the reference and not immediately release it. In particular, if another subsystem injected an ipcache entry via older APIs, then the first time that InjectLabels() runs, it also needs to acquire and hold a reference to the identity, even though the underlying ipcache already contains an CIDR -> Identity mapping that was injected via the older Upsert() API. In commit c96b9d8 ("ipcache: Remove superfluous if condition"), the check which was ensuring that ipcache entries that transition from being managed by older APIs towards newer APIs would not trigger allocation of a new identity reference to be held by the ipcache itself. The result would be that the resting number of references to the identity would be one less than what is necessary to keep the identity alive. If the other subsystem ever released its reference to the identity, then that would be the last reference and the identity would be freed, despite still being in use. This scenario leads to policy recalculation that removes any datapath allow rules for the corresponding CIDRs, ultimately resulting in packet loss for the impacted CIDRs. One such example involves CIDR identity restore startup logic in the daemon. That path allocates identities then injects them into the ipcache using older APIs. If any such CIDRs are used by network policies, then the network policies subsystem will insert the CIDR into the ipcache using newer ipcache APIs. Given this mix of API usage and the bug introduced in the identity reference counting internally in ipcache, this triggers the bug. This commit fixes it by checking whether InjectLabels() already assumed ownership over the underlying ipcache entry, which implies that the ipcache already holds a reference to the identity on behalf of all newer API users. During subsequent calls to InjectLabels(), the logic checks that the ipcache entry was installed by this function and accounts for the fact that a prior iteration had allocated an identity reference. It can then safely release references that were acquired during the current execution of the function. Fixes: c96b9d8 ("ipcache: Remove superfluous if condition") Reported-by: Boris Petrovic <carnerito.b@gmail.com> Reported-by: Kim-Eirik Karlsen <kim.eirik@gmail.com> Reported-by: Jason Witkowski <jason@witkow.ski> Signed-off-by: Joe Stringer <joe@cilium.io>
971b8c0
to
5408c26
Compare
@aditighag @christarazi @squeed I took Chris' proposal and wrote a unit test for this ipcache API call pattern. I didn't need to change the actual implementation, the new test fails on EDIT: Also, for context, the new test tests both that (a) the reported bug doesn't show up, ie mixed API usage does not lead to releasing too many references as well as (b) the concern above around allocating too many references doesn't come up, so repeated use of |
Aha, that's the key. Should be OK then. |
/test |
The PR was merged before I could take a follow-up look, but it looks like you addressed the main review comments. The updated PR description looks great. 🚀
Something for follow-ups: Listing usages of the old and new APIs annotated with subsystems that are using the old API v/s the new, and the corresponding allocate/release calls can be helpful for future PRs as well knowledge sharing. Happy to iterate over it in case you need a fresh pair of eyes. :) |
@aditighag good suggestion. Following up here: #21667 |
Commit c96b9d8 ("ipcache: Remove superfluous if condition")
introduced a double-free for cases where the ipcache allocates a new
(reference to a) security identity and there is already an identity
corresponding to those labels which another subsystem injected directly
into the ipcache via Upsert() and friends. This commit fixes the issue.
IPCache has two sets of APIs:
caller to retain reference counts against each Identity and balance
allocate/release calls.
which delegate the reference counting responsibility to the ipcache
itself, which will hold one reference to the identity for all callers
of the APIs.
InjectLabels() is responsible for reference counting the identities
associated with ipcache entries that it manages. In order to ensure
balanced allocate and release, this function allocates and retains one
reference to the identity for itself internally within this function.
Under normal operation it will typically release this reference again
once it has performed the ipcache update operations. It may be triggered
multiple times for the same prefix, in which case it should only ever
acquire one reference to the identity, otherwise the references could
leak and the identity would not be freed.
There is one exception to the "always allocate" + "always" release
behaviour in InjectLabels(): Given that the ipcache needs to hold
exactly one reference to the identity over time to ensure it remains
live while in use by the ipcache, the very first time that
InjectLabels() allocates the identity reference, it must hold onto the
reference and not immediately release it. In particular, if another
subsystem injected an ipcache entry via older APIs, then the first time
that InjectLabels() runs, it also needs to acquire and hold a reference
to the identity, even though the underlying ipcache already contains an
CIDR -> Identity mapping that was injected via the older Upsert() API.
In commit c96b9d8 ("ipcache: Remove superfluous if condition"), the
check which was ensuring that ipcache entries that transition from being
managed by older APIs towards newer APIs would not trigger allocation of
a new identity reference to be held by the ipcache itself. The result
would be that the resting number of references to the identity would be
one less than what is necessary to keep the identity alive. If the other
subsystem ever released its reference to the identity, then that would
be the last reference and the identity would be freed, despite still
being in use.
This scenario leads to policy recalculation that removes any datapath
allow rules for the corresponding CIDRs, ultimately resulting in packet
loss for the impacted CIDRs. One such example involves CIDR identity
restore startup logic in the daemon. That path allocates identities then
injects them into the ipcache using older APIs. If any such CIDRs are
used by network policies, then the network policies subsystem will
insert the CIDR into the ipcache using newer ipcache APIs. Given this
mix of API usage and the bug introduced in the identity reference
counting internally in ipcache, this triggers the bug.
This commit fixes it by checking whether InjectLabels() already assumed
ownership over the underlying ipcache entry, which implies that the
ipcache already holds a reference to the identity on behalf of all newer
API users. During subsequent calls to InjectLabels(), the logic checks
that the ipcache entry was installed by this function and accounts for
the fact that a prior iteration had allocated an identity reference. It
can then safely release references that were acquired during the current
execution of the function.
Fixes: c96b9d8 ("ipcache: Remove superfluous if condition")
Reported-by: Boris Petrovic (@carnerito)
Reported-by: Kim-Eirik Karlsen (@kimma)
Reported-by: Jason Witkowski (@jwitko)
Related: #20116
Related: #21667
Fixes: #21667
Fixes: #27169
Fixes: #27176
Fixes: #27210