New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipcache: Add ability to override identity via UpsertMetadata #21667
ipcache: Add ability to override identity via UpsertMetadata #21667
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to not have to complicate that core loop further by adding another fork in the logic, but at the same time I'm not sure whether we really have another good option.
Right now, the logic is looking correct to me. It wouldn't hurt to stew it over another day and take another fresh look but overall this PR seems correct as-is, modulo one minor locking quirk.
I also posted #21675 for a comparison of a similar approach, but I think that your PR's handling of the identity reference counting is more mature than #21675. Please take a look either way, in case there's any useful insights or approaches there that could improve this PR.
I added a few discussion threads below but having spent a chunk of time with this one I'd be OK with dropping most of my one + rebase any other remaining bits on top of this.
pkg/ipcache/metadata.go
Outdated
// overriddenIdentities tracks which prefixes have an override identity | ||
// assigned to them and therefore do not need its identity released when | ||
// updated or removed. | ||
// overriddenIdentitiesMu protects overriddenIdentities and must be taken | ||
// both before the metdata.RWMutex or the IPCache.Lock | ||
overriddenIdentitiesMu lock.Mutex | ||
overriddenIdentities map[netip.Prefix]struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡
Yeah the reference tracking is the trickiest part of this API. Broadly we have two alternatives I can think of, and it looks like we've explored both in our two PRs looking at solving this problem :-)
Here I recognize the more explicit form of having a separate map to track the prefix to avoid adding/removing reference counts for the identities that are provided explicitly. This has a couple of nice properties: it's less "magic", and there's no additional messing around with the reference counts.
The alternative is that the core logic takes in an identity, then internally adds its own reference counts to the identity. I had explored that option briefly here: 2d2bee8 , but based on the TODO comments I have in that commit it seems like I was already leaning towards an approach more like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unfortunately not super familiar with identity allocation. I chose the separate map because it more closely mirrors the existing behavior. If I understand your implementation correctly, it's main idea is to determine the identity based on the labels instead of the numeric identity, correct?
My main worry here was that for CiliumNode objects (which can get assigned an override identity via CEW), we wouldn't have the labels available that lead to the identity being assigned to the node. But re-reading the code, it seems that the labels from the CEW are written into the CiliumNode object together with the assigned identity, so that should work too.
The other open question I have is around the policy map update. Here I'm out of my depth if it makes sense to update the policy maps for "override identities", which I assume are already added to the policy map via policy and if this is something we need to track here, or if it is something that is anyway already tracked by the selector cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah pretty much as you say. The idea is that we'll pull the labels out from the identity and then allocate the identity for that set of labels. The idea behind that is that the identity allocator would figure out that this is an existing identity and increment the refcount on that identity + return the identity that was already existing.1
Technically the "override identities" do not need to trigger policy update in this logic, correct. This is where the role of IPCache in the overall architecture is changing a bit in general - The new model says that IPCache itself is responsible for generating new identities (in the other case), and hence it must notify the policy engine for those new identities so that the policymaps are appropriately plumbed to handle this logic.
For "override identities", the corresponding logic for policy integration is currently elsewhere. We could potentially decide that actually, rather than having that handled elsewhere, instead we rely on this notification from the ipcache. At a high level, this is saying that when some watcher or other agent logic learns that "IP X has identity Y", it informs the userspace IP informationbase ("IPCache") about this mapping, then IPCache is responsible for ensuring that all relevant calculation occurs, and occurs in the right order - that is to say, inform the policy engine + update bpf policy maps and then update the bpf ipcache map. I think this is a reasonable design and could potentially simplify some other logic.
That said, for now it's not necessary. I think your question is coming from the perspective "does it need to notify the policy engine, and if not, should we skip it?" which is quite a reasonable question to ask.. technically we could skip it for now. I think that'll make the logic slightly more complex. Would it hurt to notify the policy engine twice for the same identity updates? Well, that's a bit more tricky. 🤔 At a glance it looks like SelectorCache.UpdateIdentities()
is idempotent so in a sense it should be OK. However, if an identity is added + deleted within a quick succession, and there are two different pieces of agent logic notifying the selectorcache at different times outside of a critical section then we could end up with inconsistent state in the selectorcache. So from that perspective it should probably be the responsibility of exactly one piece of code to notify the selectorcache about the identity update. We can ensure consistency by queuing events in order as they come in (which we already do).
^^ Part of the idea behind having a dedicated goroutine handling the ipcache updates like this is to force serialization of such events so that we don't end up in some intermediate inconsistent state in the selectorcache.
Footnotes
-
Then we still have the question of how that reference is eventually removed, which I covered in a bit more detail in one of the other discussion threads below. The principle is that this refcount would stay live as long as the corresponding ipcache entry stays live; if the ipcache entry already exists and we re-allocate the same identity then we need to make sure that this doesn't cause an identity reference leak. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to keep the code simpler, I decided to not treat override identities special when it comes to notifying the policy engine and have InjectLabels do the notification for all identities. Let's see how that works out.
pkg/ipcache/metadata.go
Outdated
if !override { | ||
// We allocated a new identity, which means we want to add it | ||
// to the policy map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we'll need to document the inverse of this in the UpsertIdentity() API: Callers from policy context must arrange for policymaps to be updated themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely agree. At the moment, there is no separate method to inject override identities, but it might make sense to have one even just for documentation purposes.
Thanks for the review Joe! FWIW, I fully agree with this sentiment. I'm not a fan of the complex loop logic either and I would love to find a simpler approach too. I'm especially worried that any changes to it might cause regressions again. |
f92a94a
to
1e80655
Compare
This is ready for review again. A few points:
|
9b33233
to
b95b857
Compare
Why doesn't each ipcache entry retain a reference to the identity in this case? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Worth looking a bit closer at some of the questions I have below around semantics for the callers. These are not critical right now since there's no users of the new interface yet, but it won't be long before we'd need to grapple with some of these questions.
// This will trigger asynchronous calculation of any local identity changes | ||
// that must occur to associate the specified labels with the prefix, and push | ||
// any datapath updates necessary to implement the logic associated with the | ||
// metadata currently associated with the 'prefix'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the caller presumably holds a reference to the target (numeric) identity, but then this interface will asynchronously perform the updates itself, does the caller need to retain a reference until the underlying async logic runs? Or will this implementation grab a preliminary reference on the corresponding identity up until the async logic is run?
The concern I have is something like, the caller does:
- Allocate identity
- UpsertIdentity(...)
- Release reference identity
Then, since UpsertIdentity(...)
internally defers the insert logic, that could happen after the caller releases its reference on the identity, and then it becomes more likely that the ipcache will (re-)allocate a new numeric identity for these labels. This is not ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth, I think that the current code does handle this case gracefully, it will just log an extraneous warning that the user can't do much about. And I guess at least right now, there are no callers of this interface so there's no way to trigger this condition so it's not exactly a big deal. But it would be nice to make this expectation super clear for future callers.
If we used this from the endpoint logic, then the endpoint itself would already retain a reference to the identity so I think that case would be fine. Not sure about other users that directly allocate an identity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was designing this interface originally with NodeManager
in mind, which actually does not even locally allocate the identity it's passing in here, it just blindly trusts the remote node that it holds a reference to the identity it is claiming to have. So the code as I wrote it doesn't really care if the identity already exists or not, the numeric identity right now is only there to provide a warning, it doesn't have any functional purpose.
I'm myself a bit partial to the requirement that the caller of UpsertIdentity
needs to provide a numeric identity (which implies prior allocation). Maybe it would be cleaner for UpsertIdentity
to just require the labels as your PR originally did (and maybe have the numeric identity as an optional argument just for logging/troubleshooting purposes)
In the end, I guess we do assume that if the caller of UpsertIdentity
needs to somehow know the numeric identity of a prefix, then we also assume that someone else is already holding the reference to it.
I guess in theory it could happen that code allocates an identity, schedules UpsertIdentity
, then drops the identity again before InjectLabels
(executed asynchronously) had a chance to run. That would re-assign the numeric identity. If that code assumes UpsertIdentity
instantly keeps the reference/numeric identity alive, then yeah, that would likely be buggy behavior. On the other hand, decreasing an identity ref count while assuming the numeric identity remains valid is dangerous regardless of this API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the numeric identity. That way, the interface is agnostic to the fact if the identity was already allocated or not.
I've also renamed the method to OverrideIdentity
to distinguish it a bit more clearly from UpsertLabels
.
pkg/ipcache/types.go
Outdated
func (s prefixInfo) IdentityOverride() (types.IdentityOverride, labels.Labels) { | ||
for _, v := range s { | ||
if v.identityOverride.IsValid() { | ||
return v.identityOverride, v.labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably OK in terms of resolving the conflict. IIRC range s
won't guarantee the same item is picked if there are conflicts and the function is run multiple times. A more stable behaviour would be to pick out the conflicts, sort them, and pick an identity based on some stable heuristic.... but ideally the user will never hit conflicts because we recognize the potential for conflict ahead of time and put guard rails in place. So probably fine to just leave this for now, especially since there are warnings when conflicts are inserted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think I mentioned somewhere in the doc to UpsertIdentity
that a random one will be chosen upon conflict, which is due to this range s
here being non-deterministic.
Looking at this again, I am afraid that having a random pick here this will likely just cause headaches if we ever need to troubleshoot conflicting entries. I think I'll change this to use a deterministic iteration order (since follow-up PRs for hostIP and encryptKey will also rely on the the same conflict resolution logic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added basic deterministic conflict resolution by sorting alphabetically. I'm honestly not sure what heuristic would be more useful in practice? Prefer CIDR identities over global ones over reserved ones? For two identities in the same category, which one should we pick?
Because I couldn't come up with a good rationale for anything, I decided to keep it simple for now and just sort alphabetically.
It does. Maybe I'm misunderstanding your question. To rephrase my above sentence: The purpose of this newly added tracking is to ensure the IPCache entry gets deleted regardless of the identity ref count. If a prefix has no more metadata, then the corresponding IPCache entry needs to be deleted even if the identity refcount of the prefix is still potentially >0. With CIDR identites, one could assume that each IPCache Now that we have global/reserved identities handled by the logic as well, each IPCache entry still retains a reference to exactly one identity, but an identity might now be retained by multiple entries. Therefore, an IPCache entry might now need to be deleted even if its identity is still alive (due to it being retained by other entries). |
See cilium#21667 (comment) for an in-depth discussion about this change. For local identities, this condition never evaluated to false, since identities are different for each prefix and thus no other prefix could have added a map entry for the previously allocated identity. A subsequent commit will cause InjectLabels to also allocate global identities. In that case, the condition is actively harmful, since it could cause an identity leak. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
efc9baa
to
368c21c
Compare
Made the following changes (all in the last commit, besides a rebase on master). The overall logic of an "override" remains the same:
|
368c21c
to
2b4a051
Compare
/test |
This renames the `id` variable to `oldID`, to better distinguish it from the `newID` variable and indicate that `oldID` is the one about to be replaced or deleted. This commit contains no functional changes. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
See cilium#21667 (comment) for an in-depth discussion about this change. For local identities, this condition never evaluated to false, since identities are different for each prefix and thus no other prefix could have added a map entry for the previously allocated identity. A subsequent commit will cause InjectLabels to also allocate global identities. In that case, the condition is actively harmful, since it could cause an identity leak. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
The new asynchronous IPCache API tries to determine the identity of a prefix based on the provided labels. However, in some cases, the sources providing the IPCache metadata (e.g. CiliumNode, CiliumEndpoint) already know the numeric identity of the prefix. In that case, we want to give that predefined global identity preference over any local identity we might derive from the labels. All other metadata is still tracked. If the override is ever removed from the prefix, the metadata will be used to determine the new prefix identity. Prefixes added via UpsertIdentity otherwise are treated similarly to ones upserted via UpsertMetadata: The policy maps are still updated for any newly added identity, and the identity is removed from policy maps if RemoveIdentity caused the identity to be deallocated. Co-authored-by: Joe Stringer <joe@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
2b4a051
to
80504f0
Compare
@christarazi fixed the remaining issues with CIDR identity restoration. Thanks, Chris! |
/test Job 'Cilium-PR-K8s-1.25-kernel-4.19' failed: Click to show.Test Name
Failure Output
Edit: Hit #21519. |
Travis hit #23311. |
@tklauser Just looking for @cilium/cli review here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for @cilium/cli
/ci-datapath |
/ci-gke |
GKE failed again with the encryption tests: https://github.com/cilium/cilium/actions/runs/4005134938/jobs/6875144115 They have been disabled in the meantime. Restarting |
/ci-gke |
See #21667 (comment) for an in-depth discussion about this change. For local identities, this condition never evaluated to false, since identities are different for each prefix and thus no other prefix could have added a map entry for the previously allocated identity. A subsequent commit will cause InjectLabels to also allocate global identities. In that case, the condition is actively harmful, since it could cause an identity leak. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
The new asynchronous IPCache API tries to determine the identity of a
prefix based on the provided labels. However, in some cases, the sources
providing the IPCache metadata (e.g. CiliumNode, CiliumEndpoint) already
know the numeric identity of the prefix. In that case, we want to give
that predefined global identity preference over any local identity we
might derive from the labels.
All other metadata is still tracked. If the override is ever removed
from the prefix, the metadata will be used to determine the new prefix
identity.
Currently, the override identity is always accepted, regardless whether
the numeric identity is known e.g. to the identity allocator. This
behavior mirrors the current semantics of the legacy IPCache.Upsert
method. In the future, we could consider blocking overrides until they
are actually known to the identity allocator.
Identities added via override do not trigger a policy map update. It is
assumed that the policy map update for those identities is handled
elsewhere. This again mirrors the current behavior of IPCache.Upsert.