New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support policy matching against kube-apiserver #17823
Support policy matching against kube-apiserver #17823
Conversation
867f71d
to
1105e37
Compare
/test |
The KubeAPIServer source will be used only when an IP is associated with the kube-apiserver. The primary user of this source will be ipcache.InjectLabels() (introduced in a subsequent commit), which will perform this check and overwrite the already passed-in source with the KubeAPIServer source if there's a kube-apiserver label. The aforementioned overwrite can occur as follows: * CN watcher associates the local node IPs with its corresponding reserved:host label and upserts into the ipcache with Local source. * EP watcher associates a kube-apiserver backend IP with its corresponding reserved:kube-apiserver label and upserts into the ipcache with CustomResource source (as the default source). * InjectLabels() sees that there has been an upsert with the reserved:kube-apiserver label so it upgrades the source of that request to KubeAPIServer source. This source is now the strongest source throughout Cilium (greater than Local) because of the above. Signed-off-by: Chris Tarazi <chris@isovalent.com>
This will be used in the subsequent commit to ensure ipcache.InjectLabels() can retry upon failure to allocate CIDR identities if the allocator is not yet initialized. Signed-off-by: Chris Tarazi <chris@isovalent.com>
Any update to the IdentityMetadata (IDMD) map must trigger an update to the ipcache. Since the IdentityMetadata maps IPS / CIDRs to labels, if an update was made, we must to check if that update was related to the labels. If so, that means it might affect identities. Hence, we must to allocate a new identity, trigger policy updating, and push it into the ipcache. This commit provides the mechanism to do so (InjectLabels()). In future commits, this will be triggered by other subsystems as they encounter IPs / CIDRs for which we need to associate labels with. Corresponding functions to find and delete IPs / CIDRs with specific labels is also provided, which include releasing any allocated identities and deletion from the ipcache. Here's a diagram of how this works for the kube-apiserver policy matching: +------------+ (1) (1) +------------+ | EP Watcher +-----+ +-----+ CN Watcher | +-----+------+ W | | W +------+-----+ | | | | | v v | | +------+ | | | IDMD | | | +------+ | | ^ | | | | | (3) |R | | (2) +------+--------+ (2) | +------> |Label Injector |<------+ Trigger +-------+-------+ Trigger (4) |W | v +---+ |IPC| +---+ legend: * W means write * R means read Signed-off-by: Chris Tarazi <chris@isovalent.com> Co-authored-by: Joe Stringer <joe@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io>
By leveraging the special "kubernetes" Service / Endpoint object, we can learn about which IPs the kube-apiserver is behind. These IPs will be upserted into the ipcache with an identity that is to-be-allocated. In order to allocate an identity for these IPs, the kube-apiserver-associated IPs are inserted into the IDMD map with the "reserved:kube-apiserver" label. Then the watcher will trigger the label injection via ipcache.TriggerLabelInjection(). We do not make a distinction on where the kube-apiserver is running, whether it's on the local node, on a remote node in the cluster, or if it's running outside of the cluster. All kube-apiserver-associated IPs will have the "reserved:kube-apiserver" label. Note that we are not hooking into the delete event. We only subscribe to the add / update events. This is because we perform a diff on the currently associated kube-apiserver IPs inside the IDMD and what the add / update event contains (desired IPs), then remove the current IPs that are not in the desired set. This operation is equivalent to subscribing to the delete event and removing the "desired" IPs from the IDMD. However, we don't need to subscribe to the delete event because this special Service is automatically recreated / reconciled by the apiserver itself if the object is removed or edited. Signed-off-by: Chris Tarazi <chris@isovalent.com> Co-authored-by: Joe Stringer <joe@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io>
It is common for the kube-apiserver to be running as host-network on a node in the cluster. In order to correctly classify traffic to the kube-apiserver with the kube-apiserver identity, we need to know which node IPs are for local node or remote nodes. For these node IPs, they will be associated with the "reserved:host" or "reserved:remote-node" label, respectively, in the IDMD map. Once the node manager has made these associations, we need to trigger label injection via ipcache.TriggerLabelInjection() in order to figure out to if a new identity must be allocated and if we need to trigger policy calculation. After this, if * the local host has the kube-apiserver running, it will have an IP with the following labels: "reserved:host,reserved:kube-apiserver". This traffic will be classified as "host" traffic with ID 1. This does not trigger a new identity allocation, but the ID 1 will now contan the "reserved:kube-apiserver" label. This requires updating the selector cache, as ID 1 itself has changed, which is handled in ipcache.TriggerLabelInjection(). * a remote node(s) has the kube-apiserver running, it will have an IP with the following labels "reserved:remote-node,reserved:kube-apiserver". This traffic will be classified as "kube-apiserver" traffic with the newly created ID of 7. No further policy recalculation is needed in this case, as ID 7 already exists. Signed-off-by: Chris Tarazi <chris@isovalent.com> Co-authored-by: Joe Stringer <joe@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io>
Create dedicated functions for checking whether an identity represents any node in the cluster or a remote node in the cluster. This will be useful for an upcoming commit where a remote node may have the REMOTE_NODE_ID identity or alternatively another hardcoded identity. Signed-off-by: Joe Stringer <joe@cilium.io>
This identity will be used to identify remote nodes which also have the kube-apiserver co-located, and allows policy at the higher layer to differentiate nodes with this identity vs. other nodes in the cluster. Signed-off-by: Joe Stringer <joe@cilium.io>
Now that the kube-apiserver policy matching has been implemented, expose the user-facing policy API by allowing the user to express: ``` egress: toEntities: - kube-apiserver ``` Signed-off-by: Chris Tarazi <chris@isovalent.com>
This is to minimize the likelihood of confusion between 'prg' and 'prog'. Signed-off-by: Joe Stringer <joe@cilium.io>
This should not ultimately be a global, but until issue 17849 is resolved it needs to be. At least ensure that it's not accessed outside the package, since it must be handled while holding the 'idMDMU' lock immediately above. Signed-off-by: Joe Stringer <joe@cilium.io>
Add a short blurb to describe the relationship between the IdentityMetadata map and the rest of the Cilium agent. For a more detailed design, see earlier commits, in particular: "ipcache, policy: Inject labels from identity metadata". Signed-off-by: Joe Stringer <joe@cilium.io>
4697c3b
to
8b5ae5e
Compare
/test Job 'Cilium-PR-K8s-1.22-kernel-4.9' failed and has not been observed before, so may be related to your PR: Click to show.Test Name
Failure Output
If it is a flake, comment |
if l := RemoveLabels(prefix, lbls, src); l != nil { | ||
la = l.LabelArray() | ||
} | ||
idsToDelete[id.ID] = la | ||
if len(lbls) > 0 { | ||
// If for example kube-apiserver label is removed from | ||
// a remote-node, then RemoveLabels() will return a | ||
// non-empty set representing the new full set of | ||
// labels to associate with the node. In order to | ||
// propagate the new identity, we must emit a delete | ||
// event for the old identity and then an add event for | ||
// the new identity. | ||
idsToAdd[id.ID] = la | ||
} | ||
} | ||
if len(idsToDelete) > 0 { | ||
var wg sync.WaitGroup | ||
// SelectorCache.UpdateIdentities() asks for callers to avoid | ||
// handing the same identity in both 'adds' and 'deletes' | ||
// parameters here, so make two calls. These changes will not | ||
// be propagated to the datapath until later. | ||
updater.UpdateIdentities(nil, idsToDelete, &wg) | ||
updater.UpdateIdentities(idsToAdd, nil, &wg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking back through my git stash, I think there's a corner case here that is maybe not currently possible, but could hit someone really weirdly in future:
id
above this block is some identity allocated based on a series of labels, like reserved:kube:apiserver,cidr:w.x.y.z/n,...
. RemoveLabels
will potentially then subtract one or more of those labels out of the list. So l
ends up with say only cidr:w.x.y.z/n
. There's no guarantee that there is any identity associated with that remaining set of labels at this point. Least of which, the identity id
does not correspond to the new l
set of labels.
From a quick peruse of the code, I think that SelectorCache.UpdateIdentities()
will warn in this case, but I can't tell what the behaviour would actually be. Maybe one case to consider is if you had the apiserver on a remote node in the cluster, then you removed the kube-apiserver from that node (eg shift it to the current node). What happens to the selectorcache in that case? Do we suddenly change the cached labels for this identity?
cc @christarazi
K8s cache is not fully synced also when k8sSyncedChecker is still nil. Found during code inspection, not sure if this can manifest as a bug on runtime. Fixes: cilium#17823 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
K8s cache is not fully synced also when k8sSyncedChecker is still nil. Found during code inspection, not sure if this can manifest as a bug on runtime. Fixes: #17823 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
See commit msgs, implementation by @christarazi.
This PR implements support for automated policy matching against the
kube-apiserver. It does so by detecting where the kube-apiserver is running,
whether it's inside the cluster or not.
Tasks:
To follow up:
cluster
entity meaning (to include kube-apiserver)Support multiple apiserver backends (should be mostly just implementing the delete watcher function)Based on PR #17714. I reviewed that PR and addressed all important comments that I had directly on the PR.
Fixes: #14724