ipcache: Add ability to override identity via UpsertMetadata #21667

gandro · 2022-10-11T14:49:06Z

The new asynchronous IPCache API tries to determine the identity of a
prefix based on the provided labels. However, in some cases, the sources
providing the IPCache metadata (e.g. CiliumNode, CiliumEndpoint) already
know the numeric identity of the prefix. In that case, we want to give
that predefined global identity preference over any local identity we
might derive from the labels.

All other metadata is still tracked. If the override is ever removed
from the prefix, the metadata will be used to determine the new prefix
identity.

Currently, the override identity is always accepted, regardless whether
the numeric identity is known e.g. to the identity allocator. This
behavior mirrors the current semantics of the legacy IPCache.Upsert
method. In the future, we could consider blocking overrides until they
are actually known to the identity allocator.

Identities added via override do not trigger a policy map update. It is
assumed that the policy map update for those identities is handled
elsewhere. This again mirrors the current behavior of IPCache.Upsert.

pkg/ipcache/metadata.go

joestringer

I would like to not have to complicate that core loop further by adding another fork in the logic, but at the same time I'm not sure whether we really have another good option.

Right now, the logic is looking correct to me. It wouldn't hurt to stew it over another day and take another fresh look but overall this PR seems correct as-is, modulo one minor locking quirk.

I also posted #21675 for a comparison of a similar approach, but I think that your PR's handling of the identity reference counting is more mature than #21675. Please take a look either way, in case there's any useful insights or approaches there that could improve this PR.

I added a few discussion threads below but having spent a chunk of time with this one I'd be OK with dropping most of my one + rebase any other remaining bits on top of this.

joestringer · 2022-10-11T23:43:08Z

pkg/ipcache/metadata.go

+	// overriddenIdentities tracks which prefixes have an override identity
+	// assigned to them and therefore do not need its identity released when
+	// updated or removed.
+	// overriddenIdentitiesMu protects overriddenIdentities and must be taken
+	// both before the metdata.RWMutex or the IPCache.Lock
+	overriddenIdentitiesMu lock.Mutex
+	overriddenIdentities   map[netip.Prefix]struct{}


💡

Yeah the reference tracking is the trickiest part of this API. Broadly we have two alternatives I can think of, and it looks like we've explored both in our two PRs looking at solving this problem :-)

Here I recognize the more explicit form of having a separate map to track the prefix to avoid adding/removing reference counts for the identities that are provided explicitly. This has a couple of nice properties: it's less "magic", and there's no additional messing around with the reference counts.

The alternative is that the core logic takes in an identity, then internally adds its own reference counts to the identity. I had explored that option briefly here: 2d2bee8 , but based on the TODO comments I have in that commit it seems like I was already leaning towards an approach more like this.

I am unfortunately not super familiar with identity allocation. I chose the separate map because it more closely mirrors the existing behavior. If I understand your implementation correctly, it's main idea is to determine the identity based on the labels instead of the numeric identity, correct?

My main worry here was that for CiliumNode objects (which can get assigned an override identity via CEW), we wouldn't have the labels available that lead to the identity being assigned to the node. But re-reading the code, it seems that the labels from the CEW are written into the CiliumNode object together with the assigned identity, so that should work too.

The other open question I have is around the policy map update. Here I'm out of my depth if it makes sense to update the policy maps for "override identities", which I assume are already added to the policy map via policy and if this is something we need to track here, or if it is something that is anyway already tracked by the selector cache.

Yeah pretty much as you say. The idea is that we'll pull the labels out from the identity and then allocate the identity for that set of labels. The idea behind that is that the identity allocator would figure out that this is an existing identity and increment the refcount on that identity + return the identity that was already existing.¹

Technically the "override identities" do not need to trigger policy update in this logic, correct. This is where the role of IPCache in the overall architecture is changing a bit in general - The new model says that IPCache itself is responsible for generating new identities (in the other case), and hence it must notify the policy engine for those new identities so that the policymaps are appropriately plumbed to handle this logic.

For "override identities", the corresponding logic for policy integration is currently elsewhere. We could potentially decide that actually, rather than having that handled elsewhere, instead we rely on this notification from the ipcache. At a high level, this is saying that when some watcher or other agent logic learns that "IP X has identity Y", it informs the userspace IP informationbase ("IPCache") about this mapping, then IPCache is responsible for ensuring that all relevant calculation occurs, and occurs in the right order - that is to say, inform the policy engine + update bpf policy maps and then update the bpf ipcache map. I think this is a reasonable design and could potentially simplify some other logic.

That said, for now it's not necessary. I think your question is coming from the perspective "does it need to notify the policy engine, and if not, should we skip it?" which is quite a reasonable question to ask.. technically we could skip it for now. I think that'll make the logic slightly more complex. Would it hurt to notify the policy engine twice for the same identity updates? Well, that's a bit more tricky. 🤔 At a glance it looks like SelectorCache.UpdateIdentities() is idempotent so in a sense it should be OK. However, if an identity is added + deleted within a quick succession, and there are two different pieces of agent logic notifying the selectorcache at different times outside of a critical section then we could end up with inconsistent state in the selectorcache. So from that perspective it should probably be the responsibility of exactly one piece of code to notify the selectorcache about the identity update. We can ensure consistency by queuing events in order as they come in (which we already do).

^^ Part of the idea behind having a dedicated goroutine handling the ipcache updates like this is to force serialization of such events so that we don't end up in some intermediate inconsistent state in the selectorcache.

Footnotes

Then we still have the question of how that reference is eventually removed, which I covered in a bit more detail in one of the other discussion threads below. The principle is that this refcount would stay live as long as the corresponding ipcache entry stays live; if the ipcache entry already exists and we re-allocate the same identity then we need to make sure that this doesn't cause an identity reference leak. ↩

In order to keep the code simpler, I decided to not treat override identities special when it comes to notifying the policy engine and have InjectLabels do the notification for all identities. Let's see how that works out.

pkg/ipcache/types.go

pkg/ipcache/metadata.go

joestringer · 2022-10-11T23:54:32Z

pkg/ipcache/metadata.go

+			if !override {
+				// We allocated a new identity, which means we want to add it
+				// to the policy map


I guess we'll need to document the inverse of this in the UpsertIdentity() API: Callers from policy context must arrange for policymaps to be updated themselves.

Absolutely agree. At the moment, there is no separate method to inject override identities, but it might make sense to have one even just for documentation purposes.

pkg/ipcache/metadata.go

gandro · 2022-10-12T08:01:47Z

I would like to not have to complicate that core loop further by adding another fork in the logic, but at the same time I'm not sure whether we really have another good option.

Thanks for the review Joe! FWIW, I fully agree with this sentiment. I'm not a fan of the complex loop logic either and I would love to find a simpler approach too. I'm especially worried that any changes to it might cause regressions again.

gandro · 2022-11-03T13:24:24Z

This is ready for review again. A few points:

I have included some of the restructurings by Joe from ipcache: Add support to store identities internally #21675
I decided to route everything through the identity allocator. There are two caveats to this though:
i. I require the user to provide a numeric identity when overriding an identity. It is actually only used for logging purposes, but I think that should be helpful to uncover any inconsistencies.
ii. Relying on the identity reference count to remove the IPCache entry does not work if multiple IPCache entries share the same identity (which they do for reserved or global identities). Therefore, I have added a flag to the IPCache entry itself that tracks if it should be cleaned up by the metadata logic or if we expect a manual Delete call.
The mock allocator behaved in a manner that I did not expect. I changed it (see commit 312b0b8), but I would prefer to have an extra pair of eyes on that, since I'm not sure what the actual expected behavior should be.

joestringer · 2022-11-03T21:03:21Z

ii. Relying on the identity reference count to remove the IPCache entry does not work if multiple IPCache entries share the same identity (which they do for reserved or global identities). Therefore, I have added a flag to the IPCache entry itself that tracks if it should be cleaned up by the metadata logic or if we expect a manual Delete call.

Why doesn't each ipcache entry retain a reference to the identity in this case?

joestringer

LGTM. Worth looking a bit closer at some of the questions I have below around semantics for the callers. These are not critical right now since there's no users of the new interface yet, but it won't be long before we'd need to grapple with some of these questions.

pkg/testutils/identity/allocator.go

joestringer · 2022-11-03T21:33:02Z

pkg/ipcache/ipcache.go

+// This will trigger asynchronous calculation of any local identity changes
+// that must occur to associate the specified labels with the prefix, and push
+// any datapath updates necessary to implement the logic associated with the
+// metadata currently associated with the 'prefix'.


Given that the caller presumably holds a reference to the target (numeric) identity, but then this interface will asynchronously perform the updates itself, does the caller need to retain a reference until the underlying async logic runs? Or will this implementation grab a preliminary reference on the corresponding identity up until the async logic is run?

The concern I have is something like, the caller does:

Allocate identity

UpsertIdentity(...)

Release reference identity

Then, since UpsertIdentity(...) internally defers the insert logic, that could happen after the caller releases its reference on the identity, and then it becomes more likely that the ipcache will (re-)allocate a new numeric identity for these labels. This is not ideal.

For what it's worth, I think that the current code does handle this case gracefully, it will just log an extraneous warning that the user can't do much about. And I guess at least right now, there are no callers of this interface so there's no way to trigger this condition so it's not exactly a big deal. But it would be nice to make this expectation super clear for future callers.

If we used this from the endpoint logic, then the endpoint itself would already retain a reference to the identity so I think that case would be fine. Not sure about other users that directly allocate an identity.

Yeah, I was designing this interface originally with NodeManager in mind, which actually does not even locally allocate the identity it's passing in here, it just blindly trusts the remote node that it holds a reference to the identity it is claiming to have. So the code as I wrote it doesn't really care if the identity already exists or not, the numeric identity right now is only there to provide a warning, it doesn't have any functional purpose.

I'm myself a bit partial to the requirement that the caller of UpsertIdentity needs to provide a numeric identity (which implies prior allocation). Maybe it would be cleaner for UpsertIdentity to just require the labels as your PR originally did (and maybe have the numeric identity as an optional argument just for logging/troubleshooting purposes)

In the end, I guess we do assume that if the caller of UpsertIdentity needs to somehow know the numeric identity of a prefix, then we also assume that someone else is already holding the reference to it.

I guess in theory it could happen that code allocates an identity, schedules UpsertIdentity, then drops the identity again before InjectLabels (executed asynchronously) had a chance to run. That would re-assign the numeric identity. If that code assumes UpsertIdentity instantly keeps the reference/numeric identity alive, then yeah, that would likely be buggy behavior. On the other hand, decreasing an identity ref count while assuming the numeric identity remains valid is dangerous regardless of this API.

I've removed the numeric identity. That way, the interface is agnostic to the fact if the identity was already allocated or not.

I've also renamed the method to OverrideIdentity to distinguish it a bit more clearly from UpsertLabels.

pkg/ipcache/types/types.go

pkg/ipcache/types.go

joestringer · 2022-11-03T22:11:31Z

pkg/ipcache/types.go

+func (s prefixInfo) IdentityOverride() (types.IdentityOverride, labels.Labels) {
+	for _, v := range s {
+		if v.identityOverride.IsValid() {
+			return v.identityOverride, v.labels


This is probably OK in terms of resolving the conflict. IIRC range s won't guarantee the same item is picked if there are conflicts and the function is run multiple times. A more stable behaviour would be to pick out the conflicts, sort them, and pick an identity based on some stable heuristic.... but ideally the user will never hit conflicts because we recognize the potential for conflict ahead of time and put guard rails in place. So probably fine to just leave this for now, especially since there are warnings when conflicts are inserted.

Yeah, I think I mentioned somewhere in the doc to UpsertIdentity that a random one will be chosen upon conflict, which is due to this range s here being non-deterministic.

Looking at this again, I am afraid that having a random pick here this will likely just cause headaches if we ever need to troubleshoot conflicting entries. I think I'll change this to use a deterministic iteration order (since follow-up PRs for hostIP and encryptKey will also rely on the the same conflict resolution logic)

I added basic deterministic conflict resolution by sorting alphabetically. I'm honestly not sure what heuristic would be more useful in practice? Prefer CIDR identities over global ones over reserved ones? For two identities in the same category, which one should we pick?

Because I couldn't come up with a good rationale for anything, I decided to keep it simple for now and just sort alphabetically.

gandro · 2022-11-07T16:11:46Z

ii. Relying on the identity reference count to remove the IPCache entry does not work if multiple IPCache entries share the same identity (which they do for reserved or global identities). Therefore, I have added a flag to the IPCache entry itself that tracks if it should be cleaned up by the metadata logic or if we expect a manual Delete call.

Why doesn't each ipcache entry retain a reference to the identity in this case?

It does. Maybe I'm misunderstanding your question.

To rephrase my above sentence: The purpose of this newly added tracking is to ensure the IPCache entry gets deleted regardless of the identity ref count. If a prefix has no more metadata, then the corresponding IPCache entry needs to be deleted even if the identity refcount of the prefix is still potentially >0.

With CIDR identites, one could assume that each IPCache prefix->identity mapping retained exactly one identity, and that every CIDR identity was retained by exactly one identity (ignoring other subsystems).

Now that we have global/reserved identities handled by the logic as well, each IPCache entry still retains a reference to exactly one identity, but an identity might now be retained by multiple entries. Therefore, an IPCache entry might now need to be deleted even if its identity is still alive (due to it being retained by other entries).

See cilium#21667 (comment) for an in-depth discussion about this change. For local identities, this condition never evaluated to false, since identities are different for each prefix and thus no other prefix could have added a map entry for the previously allocated identity. A subsequent commit will cause InjectLabels to also allocate global identities. In that case, the condition is actively harmful, since it could cause an identity leak. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

gandro · 2022-11-10T13:15:40Z

Made the following changes (all in the last commit, besides a rebase on master). The overall logic of an "override" remains the same:

Renamed UpsertIdentity to OverrideIdentity to make it a bit more clear what it does
Removed the requirement to specify a numeric identity (which was only used for logging purposes). The callers now just provides labels.
Made conflict resolution more deterministic when multiple overrides are present (by sorting the labels)
Unit tests now check the identity reference count explicitly

gandro · 2022-11-15T09:51:06Z

/test

This renames the `id` variable to `oldID`, to better distinguish it from the `newID` variable and indicate that `oldID` is the one about to be replaced or deleted. This commit contains no functional changes. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

See cilium#21667 (comment) for an in-depth discussion about this change. For local identities, this condition never evaluated to false, since identities are different for each prefix and thus no other prefix could have added a map entry for the previously allocated identity. A subsequent commit will cause InjectLabels to also allocate global identities. In that case, the condition is actively harmful, since it could cause an identity leak. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

The new asynchronous IPCache API tries to determine the identity of a prefix based on the provided labels. However, in some cases, the sources providing the IPCache metadata (e.g. CiliumNode, CiliumEndpoint) already know the numeric identity of the prefix. In that case, we want to give that predefined global identity preference over any local identity we might derive from the labels. All other metadata is still tracked. If the override is ever removed from the prefix, the metadata will be used to determine the new prefix identity. Prefixes added via UpsertIdentity otherwise are treated similarly to ones upserted via UpsertMetadata: The policy maps are still updated for any newly added identity, and the identity is removed from policy maps if RemoveIdentity caused the identity to be deallocated. Co-authored-by: Joe Stringer <joe@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

gandro · 2023-01-24T17:18:52Z

@christarazi fixed the remaining issues with CIDR identity restoration. Thanks, Chris!

gandro · 2023-01-24T17:19:09Z

/test

Job 'Cilium-PR-K8s-1.25-kernel-4.19' failed:

Click to show.

Test Name

K8sDatapathConfig Host firewall With VXLAN

Failure Output

FAIL: Failed to reach 192.168.56.12:80 from testclient-272wn

Edit: Hit #21519.

christarazi · 2023-01-24T20:56:34Z

Travis hit #23311.

christarazi · 2023-01-25T07:21:48Z

@tklauser Just looking for @cilium/cli review here.

tklauser

Looks good for @cilium/cli

gandro · 2023-01-25T10:45:34Z

/ci-datapath

gandro · 2023-01-25T10:46:02Z

/ci-gke

gandro · 2023-01-26T12:59:01Z

GKE failed again with the encryption tests: https://github.com/cilium/cilium/actions/runs/4005134938/jobs/6875144115

They have been disabled in the meantime. Restarting

gandro · 2023-01-26T12:59:11Z

/ci-gke

gandro · 2023-01-26T15:37:26Z

CI passed, except two known flakes #21519 & #23311.

Marking ready to merge.

See #21667 (comment) for an in-depth discussion about this change. For local identities, this condition never evaluated to false, since identities are different for each prefix and thus no other prefix could have added a map entry for the previously allocated identity. A subsequent commit will cause InjectLabels to also allocate global identities. In that case, the condition is actively harmful, since it could cause an identity leak. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

gandro added the release-note/misc This PR makes changes that have no direct user impact. label Oct 11, 2022

gandro requested review from joestringer and christarazi October 11, 2022 14:49

gandro commented Oct 11, 2022

View reviewed changes

pkg/ipcache/metadata.go Outdated Show resolved Hide resolved

joestringer reviewed Oct 12, 2022

View reviewed changes

joestringer mentioned this pull request Oct 12, 2022

ipcache: Add support to store identities internally #21675

Closed

gandro mentioned this pull request Oct 24, 2022

Rework ipcache to handle metadata from multiple sources via a dedicated worker goroutine #21142

Open

29 tasks

gandro force-pushed the pr/gandro/ipcache-override-identity-via-upsertmetadata branch from f92a94a to 1e80655 Compare November 3, 2022 13:09

gandro marked this pull request as ready for review November 3, 2022 13:19

gandro requested review from a team as code owners November 3, 2022 13:19

gandro requested a review from ldelossa November 3, 2022 13:19

gandro force-pushed the pr/gandro/ipcache-override-identity-via-upsertmetadata branch 2 times, most recently from 9b33233 to b95b857 Compare November 3, 2022 14:01

joestringer approved these changes Nov 3, 2022

View reviewed changes

gandro force-pushed the pr/gandro/ipcache-override-identity-via-upsertmetadata branch 2 times, most recently from efc9baa to 368c21c Compare November 10, 2022 13:10

gandro mentioned this pull request Nov 10, 2022

node: Use new asynchronous IPCache API for Manager #22086

Closed

gandro force-pushed the pr/gandro/ipcache-override-identity-via-upsertmetadata branch from 368c21c to 2b4a051 Compare November 15, 2022 09:51

gandro and others added 3 commits January 24, 2023 17:42

gandro force-pushed the pr/gandro/ipcache-override-identity-via-upsertmetadata branch from 2b4a051 to 80504f0 Compare January 24, 2023 17:17

gandro marked this pull request as ready for review January 24, 2023 17:18

christarazi added kind/enhancement This would improve or streamline existing functionality. area/daemon Impacts operation of the Cilium daemon. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. labels Jan 24, 2023

christarazi requested review from tklauser and removed request for ldelossa January 25, 2023 07:18

tklauser approved these changes Jan 25, 2023

View reviewed changes

gandro added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 26, 2023

christarazi self-assigned this Jan 27, 2023

christarazi added this to the 1.14 milestone Jan 27, 2023

aanm merged commit f9173b7 into cilium:master Jan 27, 2023

squeed mentioned this pull request Apr 12, 2023

kube-apiserver IP adresses disappear from bpf ipcache 10 minutes after restarting cilium-agent #24502

Closed

2 tasks

This was referenced Aug 6, 2023

Convert daemon identity restore logic over to newer IPCache APIs #27255

Closed

ipcache: Fix refcounting with mix of APIs #27327

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ipcache: Add ability to override identity via UpsertMetadata #21667

ipcache: Add ability to override identity via UpsertMetadata #21667

gandro commented Oct 11, 2022

joestringer left a comment

joestringer Oct 11, 2022

gandro Oct 12, 2022

joestringer Oct 12, 2022 •

edited

gandro Nov 3, 2022

joestringer Oct 11, 2022

gandro Oct 12, 2022

gandro commented Oct 12, 2022

gandro commented Nov 3, 2022 •

edited

joestringer commented Nov 3, 2022

joestringer left a comment

joestringer Nov 3, 2022

joestringer Nov 3, 2022

gandro Nov 7, 2022

gandro Nov 10, 2022

joestringer Nov 3, 2022

gandro Nov 7, 2022

gandro Nov 10, 2022

gandro commented Nov 7, 2022 •

edited

gandro commented Nov 10, 2022

gandro commented Nov 15, 2022

gandro commented Jan 24, 2023

gandro commented Jan 24, 2023 •

edited by christarazi

Test Name

Failure Output

christarazi commented Jan 24, 2023

christarazi commented Jan 25, 2023

tklauser left a comment

gandro commented Jan 25, 2023

gandro commented Jan 25, 2023

gandro commented Jan 26, 2023

gandro commented Jan 26, 2023

gandro commented Jan 26, 2023

ipcache: Add ability to override identity via UpsertMetadata #21667

ipcache: Add ability to override identity via UpsertMetadata #21667

Conversation

gandro commented Oct 11, 2022

joestringer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joestringer Oct 12, 2022 • edited

Choose a reason for hiding this comment

Footnotes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gandro commented Oct 12, 2022

gandro commented Nov 3, 2022 • edited

joestringer commented Nov 3, 2022

joestringer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gandro commented Nov 7, 2022 • edited

gandro commented Nov 10, 2022

gandro commented Nov 15, 2022

gandro commented Jan 24, 2023

gandro commented Jan 24, 2023 • edited by christarazi

Test Name

Failure Output

christarazi commented Jan 24, 2023

christarazi commented Jan 25, 2023

tklauser left a comment

Choose a reason for hiding this comment

gandro commented Jan 25, 2023

gandro commented Jan 25, 2023

gandro commented Jan 26, 2023

gandro commented Jan 26, 2023

gandro commented Jan 26, 2023

joestringer Oct 12, 2022 •

edited

gandro commented Nov 3, 2022 •

edited

gandro commented Nov 7, 2022 •

edited

gandro commented Jan 24, 2023 •

edited by christarazi