High-Scale IPcache: Chapter 1 #25148

pchaigno · 2023-04-26T19:51:27Z

This new feature has a CFP at cilium/design-cfps#7. The goal of this first PR is to define the new flag and remove all pod entries (see exceptions in CFP) from the ipcache when that flag is set.

Updates: #25243.

New high-scale ipcache mode to support clustermeshes with millions of pods.

zacharysarah

@pchaigno Excellent code comments. ✨ LGTM from a docs perspective

test/k8s/datapath_configuration.go

kaworu

Helm changes LGTM

asauber

CLI changes LGTM

pkg/k8s/watchers/pod.go

pkg/k8s/watchers/cilium_endpoint.go

joestringer · 2023-05-01T22:15:34Z

pkg/k8s/watchers/pod.go

 		logger.Debug("Pod is using host networking")
 		return nil
 	}

+	if option.Config.EnableHighScaleIPcache && nodeTypes.GetName() != newPod.Spec.NodeName {
+		logger.Debug("Pod is not local; skipping ipcache upsert")


nit: Pod watcher should only be listening for local pods anyway right? 🤔

If you look at the callers of (k *K8sWatcher) createPodController, there are 2.

Only listens for pods on the local node

Listens for all pods on all nodes

(2) occurs when disable-endpoint-crd: true and presumably with kvstore mode. So I think we need this code (conceptually at least) to cover for (2).

On the other hand, we can say that kvstore mode (and disable-endpoint-crd: true) as not supported configuration when high-scale ipcache mode is enabled, and completely drop this change here, so that only case (1) is relevant.

daemon/cmd/state.go

christarazi

LGTM, followups are being tracked at: #25370

This agent flag will be used in follow up commits. This new mode doesn't support the egress gateway for now because the egress gateway relies on identities from the ipcache to know if a destination is outside the cluster and if egress policies should therefore apply. IPsec is also not supported as the key SPIs are currently stored in the ipcache. This new mode requires IPv6 to be disabled (our tunnels are only on IPv4). Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Julian Wiedmann <jwi@isovalent.com>

This helper will be used in a subsequent commit to add identities of remote pods in the ipcache only if they are well-known identities (e.g., kube-dns backend pods). This is necessary to allow FQDN policies to be resolved. Signed-off-by: Paul Chaignon <paul@cilium.io>

When using the high-scale IPcache mode, only reserved and well-known identities are inserted in the ipcache. This is to avoid overflowing the ipcache in very large clusters. We only really care about well-known identities being in the ipcache so that we allow connectivity to well-known pods (e.g., DNS pods). To enforce ingress policies, identities will instead be carried over tunnel metadata. On egress, policies will be enforced through FQDN rules. These two changes will be implemented in subsequent commits. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com>

Non-functional commit. Rename this function so that the name is clearer in intent. Signed-off-by: Chris Tarazi <chris@isovalent.com>

endpointMetadataFetcher knows how to fetch Kubernetes metadata for endpoints. Currently, there are two implementations: 1) Fetching from the K8s watcher store (cached) 2) Fetching directly from K8s itself (uncached) (1) was the default method before this commit. With this commit, (2) is possible when high-scale ipcache mode is enabled. This is because with the aforementioned option enabled, the pod watcher is disabled, meaning the store will be empty. In other wods, there's no pod metadata that's been cached, so we must fetch it directly during endpoint restoration. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>

This is a preparatory commit for the following commit to add another method for fetching pods. Signed-off-by: Chris Tarazi <chris@isovalent.com>

When high-scale ipcache mode is enabled, we cannot rely on the pod store because the pod watcher is disabled. Instead, validate endpoints during restoration by fetching the pod metadata directly from the kube-apiserver. Signed-off-by: Chris Tarazi <chris@isovalent.com>

We don't need the pod watcher at all, because we don't need to insert any pod IPs when in high-scale ipcache mode. We are already inserting the necessary IPs into the ipcache from the CiliumEndpoint watcher. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>

Signed-off-by: Paul Chaignon <paul@cilium.io>

pchaigno · 2023-05-11T13:00:24Z

/test

Job 'Cilium-PR-K8s-1.25-kernel-4.19' failed:

Click to show.

Test Name

K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Failure Output

FAIL: Unexpected missed tail call

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.25-kernel-4.19/2123/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.25-kernel-4.19 so I can create one.

Then please upload the Jenkins artifacts to that issue.

pchaigno · 2023-05-11T17:10:04Z

Due to the GitHub outage today, 3 CI jobs passed but couldn't set their status to success: ConformanceEKS, ConformanceAWS-CNI, and runtime. k8s-1.25-kernel-4.19 failed with known flake #24514.

The changes were reviewed. There are some followups we need to address (cf. #25370), but they are not so critical that we need to block this. This PR is also blocking 5 other PRs so let's address the followups as followups. Marking ready to merge.

pchaigno added the release-note/major This PR introduces major new functionality to Cilium. label Apr 26, 2023

pchaigno requested a review from christarazi April 26, 2023 19:51

pchaigno force-pushed the high-scale-ipcache branch from fccd9c2 to dbe6ee1 Compare April 26, 2023 19:59

pchaigno marked this pull request as ready for review April 27, 2023 07:33

pchaigno requested review from a team as code owners April 27, 2023 07:33

pchaigno requested review from asauber, rgo3, kaworu and zacharysarah April 27, 2023 07:33

zacharysarah approved these changes Apr 28, 2023

View reviewed changes

julianwiedmann reviewed Apr 28, 2023

View reviewed changes

test/k8s/datapath_configuration.go Outdated Show resolved Hide resolved

kaworu approved these changes Apr 28, 2023

View reviewed changes

asauber approved these changes Apr 28, 2023

View reviewed changes

christarazi reviewed Apr 28, 2023

View reviewed changes

pkg/k8s/watchers/pod.go Show resolved Hide resolved

christarazi added area/daemon Impacts operation of the Cilium daemon. sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. labels Apr 28, 2023

joestringer reviewed May 1, 2023

View reviewed changes

pchaigno requested a review from christarazi May 2, 2023 10:06

christarazi approved these changes May 10, 2023

View reviewed changes

christarazi mentioned this pull request May 10, 2023

High-scale IPcache mode followups #25370

Closed

pchaigno and others added 9 commits May 11, 2023 14:57

daemon: Rename function to fetch endpoint metadata

70f2aea

Non-functional commit. Rename this function so that the name is clearer in intent. Signed-off-by: Chris Tarazi <chris@isovalent.com>

daemon: Refactor pod fetching into separate function

bf85d7a

This is a preparatory commit for the following commit to add another method for fetching pods. Signed-off-by: Chris Tarazi <chris@isovalent.com>

test: End-to-end test for high-scale IPcache

40fa030

Signed-off-by: Paul Chaignon <paul@cilium.io>

pchaigno force-pushed the high-scale-ipcache branch from dbe6ee1 to 40fa030 Compare May 11, 2023 12:59

pchaigno requested a review from joestringer May 11, 2023 13:00

pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 11, 2023

aanm merged commit 37d5264 into cilium:main May 12, 2023
55 of 57 checks passed

pchaigno deleted the high-scale-ipcache branch May 12, 2023 13:29

pchaigno added the feature/high-scale-ipcache Relates to the high-scale ipcache feature. label May 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High-Scale IPcache: Chapter 1 #25148

High-Scale IPcache: Chapter 1 #25148

pchaigno commented Apr 26, 2023 •

edited

zacharysarah left a comment

kaworu left a comment

asauber left a comment

joestringer May 1, 2023

christarazi May 9, 2023 •

edited

christarazi May 10, 2023 •

edited

christarazi left a comment

pchaigno commented May 11, 2023 •

edited by maintainer-s-little-helper bot

Test Name

Failure Output

pchaigno commented May 11, 2023

High-Scale IPcache: Chapter 1 #25148

High-Scale IPcache: Chapter 1 #25148

Conversation

pchaigno commented Apr 26, 2023 • edited

zacharysarah left a comment

Choose a reason for hiding this comment

kaworu left a comment

Choose a reason for hiding this comment

asauber left a comment

Choose a reason for hiding this comment

joestringer May 1, 2023

Choose a reason for hiding this comment

christarazi May 9, 2023 • edited

Choose a reason for hiding this comment

christarazi May 10, 2023 • edited

Choose a reason for hiding this comment

christarazi left a comment

Choose a reason for hiding this comment

pchaigno commented May 11, 2023 • edited by maintainer-s-little-helper bot

Test Name

Failure Output

pchaigno commented May 11, 2023

pchaigno commented Apr 26, 2023 •

edited

christarazi May 9, 2023 •

edited

christarazi May 10, 2023 •

edited

pchaigno commented May 11, 2023 •

edited by maintainer-s-little-helper bot