New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High-Scale IPcache: Chapter 1 #25148
Conversation
fccd9c2
to
dbe6ee1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pchaigno Excellent code comments. ✨ LGTM from a docs perspective
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Helm changes LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CLI changes LGTM
pkg/k8s/watchers/pod.go
Outdated
logger.Debug("Pod is using host networking") | ||
return nil | ||
} | ||
|
||
if option.Config.EnableHighScaleIPcache && nodeTypes.GetName() != newPod.Spec.NodeName { | ||
logger.Debug("Pod is not local; skipping ipcache upsert") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Pod watcher should only be listening for local pods anyway right? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at the callers of (k *K8sWatcher) createPodController
, there are 2.
- Only listens for pods on the local node
- Listens for all pods on all nodes
(2) occurs when disable-endpoint-crd: true
and presumably with kvstore mode. So I think we need this code (conceptually at least) to cover for (2).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, we can say that kvstore mode (and disable-endpoint-crd: true
) as not supported configuration when high-scale ipcache mode is enabled, and completely drop this change here, so that only case (1) is relevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, followups are being tracked at: #25370
This agent flag will be used in follow up commits. This new mode doesn't support the egress gateway for now because the egress gateway relies on identities from the ipcache to know if a destination is outside the cluster and if egress policies should therefore apply. IPsec is also not supported as the key SPIs are currently stored in the ipcache. This new mode requires IPv6 to be disabled (our tunnels are only on IPv4). Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
This helper will be used in a subsequent commit to add identities of remote pods in the ipcache only if they are well-known identities (e.g., kube-dns backend pods). This is necessary to allow FQDN policies to be resolved. Signed-off-by: Paul Chaignon <paul@cilium.io>
When using the high-scale IPcache mode, only reserved and well-known identities are inserted in the ipcache. This is to avoid overflowing the ipcache in very large clusters. We only really care about well-known identities being in the ipcache so that we allow connectivity to well-known pods (e.g., DNS pods). To enforce ingress policies, identities will instead be carried over tunnel metadata. On egress, policies will be enforced through FQDN rules. These two changes will be implemented in subsequent commits. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com>
Non-functional commit. Rename this function so that the name is clearer in intent. Signed-off-by: Chris Tarazi <chris@isovalent.com>
endpointMetadataFetcher knows how to fetch Kubernetes metadata for endpoints. Currently, there are two implementations: 1) Fetching from the K8s watcher store (cached) 2) Fetching directly from K8s itself (uncached) (1) was the default method before this commit. With this commit, (2) is possible when high-scale ipcache mode is enabled. This is because with the aforementioned option enabled, the pod watcher is disabled, meaning the store will be empty. In other wods, there's no pod metadata that's been cached, so we must fetch it directly during endpoint restoration. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>
This is a preparatory commit for the following commit to add another method for fetching pods. Signed-off-by: Chris Tarazi <chris@isovalent.com>
When high-scale ipcache mode is enabled, we cannot rely on the pod store because the pod watcher is disabled. Instead, validate endpoints during restoration by fetching the pod metadata directly from the kube-apiserver. Signed-off-by: Chris Tarazi <chris@isovalent.com>
We don't need the pod watcher at all, because we don't need to insert any pod IPs when in high-scale ipcache mode. We are already inserting the necessary IPs into the ipcache from the CiliumEndpoint watcher. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Paul Chaignon <paul@cilium.io>
dbe6ee1
to
40fa030
Compare
/test Job 'Cilium-PR-K8s-1.25-kernel-4.19' failed: Click to show.Test Name
Failure Output
Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.25-kernel-4.19/2123/ If it is a flake and a GitHub issue doesn't already exist to track it, comment Then please upload the Jenkins artifacts to that issue. |
Due to the GitHub outage today, 3 CI jobs passed but couldn't set their status to The changes were reviewed. There are some followups we need to address (cf. #25370), but they are not so critical that we need to block this. This PR is also blocking 5 other PRs so let's address the followups as followups. Marking ready to merge. |
This new feature has a CFP at cilium/design-cfps#7. The goal of this first PR is to define the new flag and remove all pod entries (see exceptions in CFP) from the ipcache when that flag is set.
Updates: #25243.