Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipcache: Add ability to override identity via UpsertMetadata #21667

Conversation

gandro
Copy link
Member

@gandro gandro commented Oct 11, 2022

The new asynchronous IPCache API tries to determine the identity of a
prefix based on the provided labels. However, in some cases, the sources
providing the IPCache metadata (e.g. CiliumNode, CiliumEndpoint) already
know the numeric identity of the prefix. In that case, we want to give
that predefined global identity preference over any local identity we
might derive from the labels.

All other metadata is still tracked. If the override is ever removed
from the prefix, the metadata will be used to determine the new prefix
identity.

Currently, the override identity is always accepted, regardless whether
the numeric identity is known e.g. to the identity allocator. This
behavior mirrors the current semantics of the legacy IPCache.Upsert
method. In the future, we could consider blocking overrides until they
are actually known to the identity allocator.

Identities added via override do not trigger a policy map update. It is
assumed that the policy map update for those identities is handled
elsewhere. This again mirrors the current behavior of IPCache.Upsert.

@gandro gandro added the release-note/misc This PR makes changes that have no direct user impact. label Oct 11, 2022
pkg/ipcache/metadata.go Outdated Show resolved Hide resolved
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to not have to complicate that core loop further by adding another fork in the logic, but at the same time I'm not sure whether we really have another good option.

Right now, the logic is looking correct to me. It wouldn't hurt to stew it over another day and take another fresh look but overall this PR seems correct as-is, modulo one minor locking quirk.

I also posted #21675 for a comparison of a similar approach, but I think that your PR's handling of the identity reference counting is more mature than #21675. Please take a look either way, in case there's any useful insights or approaches there that could improve this PR.

I added a few discussion threads below but having spent a chunk of time with this one I'd be OK with dropping most of my one + rebase any other remaining bits on top of this.

Comment on lines 70 to 76
// overriddenIdentities tracks which prefixes have an override identity
// assigned to them and therefore do not need its identity released when
// updated or removed.
// overriddenIdentitiesMu protects overriddenIdentities and must be taken
// both before the metdata.RWMutex or the IPCache.Lock
overriddenIdentitiesMu lock.Mutex
overriddenIdentities map[netip.Prefix]struct{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡

Yeah the reference tracking is the trickiest part of this API. Broadly we have two alternatives I can think of, and it looks like we've explored both in our two PRs looking at solving this problem :-)

Here I recognize the more explicit form of having a separate map to track the prefix to avoid adding/removing reference counts for the identities that are provided explicitly. This has a couple of nice properties: it's less "magic", and there's no additional messing around with the reference counts.

The alternative is that the core logic takes in an identity, then internally adds its own reference counts to the identity. I had explored that option briefly here: 2d2bee8 , but based on the TODO comments I have in that commit it seems like I was already leaning towards an approach more like this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unfortunately not super familiar with identity allocation. I chose the separate map because it more closely mirrors the existing behavior. If I understand your implementation correctly, it's main idea is to determine the identity based on the labels instead of the numeric identity, correct?

My main worry here was that for CiliumNode objects (which can get assigned an override identity via CEW), we wouldn't have the labels available that lead to the identity being assigned to the node. But re-reading the code, it seems that the labels from the CEW are written into the CiliumNode object together with the assigned identity, so that should work too.

The other open question I have is around the policy map update. Here I'm out of my depth if it makes sense to update the policy maps for "override identities", which I assume are already added to the policy map via policy and if this is something we need to track here, or if it is something that is anyway already tracked by the selector cache.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah pretty much as you say. The idea is that we'll pull the labels out from the identity and then allocate the identity for that set of labels. The idea behind that is that the identity allocator would figure out that this is an existing identity and increment the refcount on that identity + return the identity that was already existing.1

Technically the "override identities" do not need to trigger policy update in this logic, correct. This is where the role of IPCache in the overall architecture is changing a bit in general - The new model says that IPCache itself is responsible for generating new identities (in the other case), and hence it must notify the policy engine for those new identities so that the policymaps are appropriately plumbed to handle this logic.

For "override identities", the corresponding logic for policy integration is currently elsewhere. We could potentially decide that actually, rather than having that handled elsewhere, instead we rely on this notification from the ipcache. At a high level, this is saying that when some watcher or other agent logic learns that "IP X has identity Y", it informs the userspace IP informationbase ("IPCache") about this mapping, then IPCache is responsible for ensuring that all relevant calculation occurs, and occurs in the right order - that is to say, inform the policy engine + update bpf policy maps and then update the bpf ipcache map. I think this is a reasonable design and could potentially simplify some other logic.

That said, for now it's not necessary. I think your question is coming from the perspective "does it need to notify the policy engine, and if not, should we skip it?" which is quite a reasonable question to ask.. technically we could skip it for now. I think that'll make the logic slightly more complex. Would it hurt to notify the policy engine twice for the same identity updates? Well, that's a bit more tricky. 🤔 At a glance it looks like SelectorCache.UpdateIdentities() is idempotent so in a sense it should be OK. However, if an identity is added + deleted within a quick succession, and there are two different pieces of agent logic notifying the selectorcache at different times outside of a critical section then we could end up with inconsistent state in the selectorcache. So from that perspective it should probably be the responsibility of exactly one piece of code to notify the selectorcache about the identity update. We can ensure consistency by queuing events in order as they come in (which we already do).

^^ Part of the idea behind having a dedicated goroutine handling the ipcache updates like this is to force serialization of such events so that we don't end up in some intermediate inconsistent state in the selectorcache.

Footnotes

  1. Then we still have the question of how that reference is eventually removed, which I covered in a bit more detail in one of the other discussion threads below. The principle is that this refcount would stay live as long as the corresponding ipcache entry stays live; if the ipcache entry already exists and we re-allocate the same identity then we need to make sure that this doesn't cause an identity reference leak.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to keep the code simpler, I decided to not treat override identities special when it comes to notifying the policy engine and have InjectLabels do the notification for all identities. Let's see how that works out.

pkg/ipcache/types.go Outdated Show resolved Hide resolved
pkg/ipcache/metadata.go Outdated Show resolved Hide resolved
Comment on lines 274 to 276
if !override {
// We allocated a new identity, which means we want to add it
// to the policy map
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we'll need to document the inverse of this in the UpsertIdentity() API: Callers from policy context must arrange for policymaps to be updated themselves.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely agree. At the moment, there is no separate method to inject override identities, but it might make sense to have one even just for documentation purposes.

pkg/ipcache/metadata.go Outdated Show resolved Hide resolved
@gandro
Copy link
Member Author

gandro commented Oct 12, 2022

I would like to not have to complicate that core loop further by adding another fork in the logic, but at the same time I'm not sure whether we really have another good option.

Thanks for the review Joe! FWIW, I fully agree with this sentiment. I'm not a fan of the complex loop logic either and I would love to find a simpler approach too. I'm especially worried that any changes to it might cause regressions again.

@gandro
Copy link
Member Author

gandro commented Nov 3, 2022

This is ready for review again. A few points:

  1. I have included some of the restructurings by Joe from ipcache: Add support to store identities internally #21675
  2. I decided to route everything through the identity allocator. There are two caveats to this though:
    i. I require the user to provide a numeric identity when overriding an identity. It is actually only used for logging purposes, but I think that should be helpful to uncover any inconsistencies.
    ii. Relying on the identity reference count to remove the IPCache entry does not work if multiple IPCache entries share the same identity (which they do for reserved or global identities). Therefore, I have added a flag to the IPCache entry itself that tracks if it should be cleaned up by the metadata logic or if we expect a manual Delete call.
  3. The mock allocator behaved in a manner that I did not expect. I changed it (see commit 312b0b8), but I would prefer to have an extra pair of eyes on that, since I'm not sure what the actual expected behavior should be.

@gandro gandro force-pushed the pr/gandro/ipcache-override-identity-via-upsertmetadata branch 2 times, most recently from 9b33233 to b95b857 Compare November 3, 2022 14:01
@joestringer
Copy link
Member

ii. Relying on the identity reference count to remove the IPCache entry does not work if multiple IPCache entries share the same identity (which they do for reserved or global identities). Therefore, I have added a flag to the IPCache entry itself that tracks if it should be cleaned up by the metadata logic or if we expect a manual Delete call.

Why doesn't each ipcache entry retain a reference to the identity in this case?

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Worth looking a bit closer at some of the questions I have below around semantics for the callers. These are not critical right now since there's no users of the new interface yet, but it won't be long before we'd need to grapple with some of these questions.

pkg/testutils/identity/allocator.go Show resolved Hide resolved
Comment on lines +522 to +518
// This will trigger asynchronous calculation of any local identity changes
// that must occur to associate the specified labels with the prefix, and push
// any datapath updates necessary to implement the logic associated with the
// metadata currently associated with the 'prefix'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the caller presumably holds a reference to the target (numeric) identity, but then this interface will asynchronously perform the updates itself, does the caller need to retain a reference until the underlying async logic runs? Or will this implementation grab a preliminary reference on the corresponding identity up until the async logic is run?

The concern I have is something like, the caller does:

  • Allocate identity
  • UpsertIdentity(...)
  • Release reference identity

Then, since UpsertIdentity(...) internally defers the insert logic, that could happen after the caller releases its reference on the identity, and then it becomes more likely that the ipcache will (re-)allocate a new numeric identity for these labels. This is not ideal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, I think that the current code does handle this case gracefully, it will just log an extraneous warning that the user can't do much about. And I guess at least right now, there are no callers of this interface so there's no way to trigger this condition so it's not exactly a big deal. But it would be nice to make this expectation super clear for future callers.

If we used this from the endpoint logic, then the endpoint itself would already retain a reference to the identity so I think that case would be fine. Not sure about other users that directly allocate an identity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was designing this interface originally with NodeManager in mind, which actually does not even locally allocate the identity it's passing in here, it just blindly trusts the remote node that it holds a reference to the identity it is claiming to have. So the code as I wrote it doesn't really care if the identity already exists or not, the numeric identity right now is only there to provide a warning, it doesn't have any functional purpose.

I'm myself a bit partial to the requirement that the caller of UpsertIdentity needs to provide a numeric identity (which implies prior allocation). Maybe it would be cleaner for UpsertIdentity to just require the labels as your PR originally did (and maybe have the numeric identity as an optional argument just for logging/troubleshooting purposes)


In the end, I guess we do assume that if the caller of UpsertIdentity needs to somehow know the numeric identity of a prefix, then we also assume that someone else is already holding the reference to it.

I guess in theory it could happen that code allocates an identity, schedules UpsertIdentity, then drops the identity again before InjectLabels (executed asynchronously) had a chance to run. That would re-assign the numeric identity. If that code assumes UpsertIdentity instantly keeps the reference/numeric identity alive, then yeah, that would likely be buggy behavior. On the other hand, decreasing an identity ref count while assuming the numeric identity remains valid is dangerous regardless of this API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the numeric identity. That way, the interface is agnostic to the fact if the identity was already allocated or not.

I've also renamed the method to OverrideIdentity to distinguish it a bit more clearly from UpsertLabels.

pkg/ipcache/types/types.go Outdated Show resolved Hide resolved
pkg/ipcache/types.go Outdated Show resolved Hide resolved
func (s prefixInfo) IdentityOverride() (types.IdentityOverride, labels.Labels) {
for _, v := range s {
if v.identityOverride.IsValid() {
return v.identityOverride, v.labels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably OK in terms of resolving the conflict. IIRC range s won't guarantee the same item is picked if there are conflicts and the function is run multiple times. A more stable behaviour would be to pick out the conflicts, sort them, and pick an identity based on some stable heuristic.... but ideally the user will never hit conflicts because we recognize the potential for conflict ahead of time and put guard rails in place. So probably fine to just leave this for now, especially since there are warnings when conflicts are inserted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think I mentioned somewhere in the doc to UpsertIdentity that a random one will be chosen upon conflict, which is due to this range s here being non-deterministic.

Looking at this again, I am afraid that having a random pick here this will likely just cause headaches if we ever need to troubleshoot conflicting entries. I think I'll change this to use a deterministic iteration order (since follow-up PRs for hostIP and encryptKey will also rely on the the same conflict resolution logic)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added basic deterministic conflict resolution by sorting alphabetically. I'm honestly not sure what heuristic would be more useful in practice? Prefer CIDR identities over global ones over reserved ones? For two identities in the same category, which one should we pick?

Because I couldn't come up with a good rationale for anything, I decided to keep it simple for now and just sort alphabetically.

@gandro
Copy link
Member Author

gandro commented Nov 7, 2022

ii. Relying on the identity reference count to remove the IPCache entry does not work if multiple IPCache entries share the same identity (which they do for reserved or global identities). Therefore, I have added a flag to the IPCache entry itself that tracks if it should be cleaned up by the metadata logic or if we expect a manual Delete call.

Why doesn't each ipcache entry retain a reference to the identity in this case?

It does. Maybe I'm misunderstanding your question.

To rephrase my above sentence: The purpose of this newly added tracking is to ensure the IPCache entry gets deleted regardless of the identity ref count. If a prefix has no more metadata, then the corresponding IPCache entry needs to be deleted even if the identity refcount of the prefix is still potentially >0.

With CIDR identites, one could assume that each IPCache prefix->identity mapping retained exactly one identity, and that every CIDR identity was retained by exactly one identity (ignoring other subsystems).

Now that we have global/reserved identities handled by the logic as well, each IPCache entry still retains a reference to exactly one identity, but an identity might now be retained by multiple entries. Therefore, an IPCache entry might now need to be deleted even if its identity is still alive (due to it being retained by other entries).

gandro added a commit to gandro/cilium that referenced this pull request Nov 10, 2022
See cilium#21667 (comment)
for an in-depth discussion about this change. For local identities, this
condition never evaluated to false, since identities are different for
each prefix and thus no other prefix could have added a map entry for
the previously allocated identity.

A subsequent commit will cause InjectLabels to also allocate global
identities. In that case, the condition is actively harmful, since it
could cause an identity leak.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro force-pushed the pr/gandro/ipcache-override-identity-via-upsertmetadata branch 2 times, most recently from efc9baa to 368c21c Compare November 10, 2022 13:10
@gandro
Copy link
Member Author

gandro commented Nov 10, 2022

Made the following changes (all in the last commit, besides a rebase on master). The overall logic of an "override" remains the same:

  • Renamed UpsertIdentity to OverrideIdentity to make it a bit more clear what it does
  • Removed the requirement to specify a numeric identity (which was only used for logging purposes). The callers now just provides labels.
  • Made conflict resolution more deterministic when multiple overrides are present (by sorting the labels)
  • Unit tests now check the identity reference count explicitly

@gandro gandro force-pushed the pr/gandro/ipcache-override-identity-via-upsertmetadata branch from 368c21c to 2b4a051 Compare November 15, 2022 09:51
@gandro
Copy link
Member Author

gandro commented Nov 15, 2022

/test

gandro and others added 3 commits January 24, 2023 17:42
This renames the `id` variable to `oldID`, to better distinguish it from
the `newID` variable and indicate that `oldID` is the one about to be
replaced or deleted.

This commit contains no functional changes.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
See cilium#21667 (comment)
for an in-depth discussion about this change. For local identities, this
condition never evaluated to false, since identities are different for
each prefix and thus no other prefix could have added a map entry for
the previously allocated identity.

A subsequent commit will cause InjectLabels to also allocate global
identities. In that case, the condition is actively harmful, since it
could cause an identity leak.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
The new asynchronous IPCache API tries to determine the identity of a
prefix based on the provided labels. However, in some cases, the sources
providing the IPCache metadata (e.g. CiliumNode, CiliumEndpoint) already
know the numeric identity of the prefix. In that case, we want to give
that predefined global identity preference over any local identity we
might derive from the labels.

All other metadata is still tracked. If the override is ever removed
from the prefix, the metadata will be used to determine the new prefix
identity.

Prefixes added via UpsertIdentity otherwise are treated similarly to
ones upserted via UpsertMetadata: The policy maps are still updated for
any newly added identity, and the identity is removed from policy maps
if RemoveIdentity caused the identity to be deallocated.

Co-authored-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro force-pushed the pr/gandro/ipcache-override-identity-via-upsertmetadata branch from 2b4a051 to 80504f0 Compare January 24, 2023 17:17
@gandro
Copy link
Member Author

gandro commented Jan 24, 2023

@christarazi fixed the remaining issues with CIDR identity restoration. Thanks, Chris!

@gandro gandro marked this pull request as ready for review January 24, 2023 17:18
@gandro
Copy link
Member Author

gandro commented Jan 24, 2023

/test

Job 'Cilium-PR-K8s-1.25-kernel-4.19' failed:

Click to show.

Test Name

K8sDatapathConfig Host firewall With VXLAN

Failure Output

FAIL: Failed to reach 192.168.56.12:80 from testclient-272wn

Edit: Hit #21519.

@christarazi christarazi added kind/enhancement This would improve or streamline existing functionality. area/daemon Impacts operation of the Cilium daemon. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. labels Jan 24, 2023
@christarazi
Copy link
Member

Travis hit #23311.

@christarazi christarazi requested review from tklauser and removed request for ldelossa January 25, 2023 07:18
@christarazi
Copy link
Member

@tklauser Just looking for @cilium/cli review here.

Copy link
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for @cilium/cli

@gandro
Copy link
Member Author

gandro commented Jan 25, 2023

/ci-datapath

@gandro
Copy link
Member Author

gandro commented Jan 25, 2023

/ci-gke

@gandro
Copy link
Member Author

gandro commented Jan 26, 2023

GKE failed again with the encryption tests: https://github.com/cilium/cilium/actions/runs/4005134938/jobs/6875144115

They have been disabled in the meantime. Restarting

@gandro
Copy link
Member Author

gandro commented Jan 26, 2023

/ci-gke

@gandro
Copy link
Member Author

gandro commented Jan 26, 2023

CI passed, except two known flakes #21519 & #23311.

Marking ready to merge.

@gandro gandro added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 26, 2023
@christarazi christarazi self-assigned this Jan 27, 2023
@christarazi christarazi added this to the 1.14 milestone Jan 27, 2023
@aanm aanm merged commit f9173b7 into cilium:master Jan 27, 2023
aanm pushed a commit that referenced this pull request Jan 27, 2023
See #21667 (comment)
for an in-depth discussion about this change. For local identities, this
condition never evaluated to false, since identities are different for
each prefix and thus no other prefix could have added a map entry for
the previously allocated identity.

A subsequent commit will cause InjectLabels to also allocate global
identities. In that case, the condition is actively harmful, since it
could cause an identity leak.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon Impacts operation of the Cilium daemon. kind/enhancement This would improve or streamline existing functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants