Get CEP from k8s cache during initialization. #24340

marseel · 2023-03-13T13:17:00Z

Getting CEP from k8s cache is much cheaper than reading it from Etcd.
Now, during initialization, we try to get it from the cache. If it fails, the next read will go to Etcd.

marseel · 2023-03-14T11:30:47Z

/test

Getting CEP from k8s cache is much cheaper than reading it from Etcd. Now, during initialization, we try to get it from the cache. If it fails, the next read will go to Etcd. Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>

tommyp1ckles · 2023-03-15T21:55:39Z

pkg/k8s/watchers/endpointsynchronizer.go

+					if firstTry {
+						// First we try getting CEP from the API server cache, as it's cheaper.
+						// If it fails we get it from etcd to be sure to have fresh data.
+						localCEP, err = ciliumClient.CiliumEndpoints(namespace).Get(ctx, podName, meta_v1.GetOptions{ResourceVersion: "0"})


Why does doing this get from the cache?

because of ResourceVersion: "0"
https://kubernetes.io/docs/reference/using-api/api-concepts/#semantics-for-get-and-list
This is also default what the informer does in ListAndWatch pattern code
It is a quite standard pattern, take a look here: https://github.com/kubernetes/kubernetes/blob/4557c694ef42da6c63f42ac4c81bea5716b471b7/pkg/controller/controller_utils.go#L1036

Normally without resourceVersion specified request goes directly to Etcd and raft consensus. With resourceVersion=0 it is answered by Kubernetes apiserver cache. This smallish improvement might make a significant improvement when cilium-agent is restarting and all agents are trying to get CEP from k8s, spamming k8s apiserver with requests.

Makes sense to me,

I am wondering is there any specified behaviour for consistency related to separate resources of the same name (i.e. different UID). The endpoint sync depends on CiliumEndpoint.UUID to prevent certain types of race conditions related to ambiguous "ownership" of agent/controller-instance -> CEP. There might be some weird cases if its possible for the server to return resources that no longer actually exist.

I couldn't find any specification on that from the docs.

For example:

Agent restores endpoint without a CEP UID reference). Resource no longer exists, but is returned. If this is a "local" CEP (i.e. referencing a endpoint on the same Node).

The sync code may try to backfill its CEP UID ref of a restored endpoint. In this case, the CEP will have an old UID reference which will be written out to disk in the endpoint restore header.

Patch updates to the CEP status are tested to ensure they have the same UID.

In this case, the error will be k8serrors.NotFound(I think ??).

I think the EPS controller might just loop, with the wrong CEP UID, forever?

edit: I think the same thing might happen if you do a restore, from an old endpoint header (without CEP UID) and the returned CEP is both local and an old version, then it might also get stuck failing to patch its CEP.

This bit of code is particularly annoying. It only is necessary for upgrade paths.

Currently the only version of Cilium to guarantee writing the CEP UID to the endpoint restore header state is v1.13. So I think in v1.14 we can remove this code, this will also remove these possible annoying edge cases.

I think if we just remove that code going forward we should be fine, that should clear up the issues I mentioned.
I think every other scenario just ends with the controller retrying until eventually consistency.

(I'll have to double check our upgrade path practices).

If the patch fails, wouldn't we just zero out localCEP and retry fetching it again?

cilium/pkg/k8s/watchers/endpointsynchronizer.go

Line 284 in b91758c

localCEP, err = ciliumClient.CiliumEndpoints(namespace).Patch(

Yeah right, good point. Forgot this falls through back to the needsInit block in that case.

marseel · 2023-03-16T17:50:23Z

/test

marseel added the release-note/misc This PR makes changes that have no direct user impact. label Mar 13, 2023

marseel requested a review from a team as a code owner March 13, 2023 13:17

marseel requested a review from tommyp1ckles March 13, 2023 13:17

Get CEP from k8s cache during initialization.

b91758c

Getting CEP from k8s cache is much cheaper than reading it from Etcd. Now, during initialization, we try to get it from the cache. If it fails, the next read will go to Etcd. Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>

marseel force-pushed the use_k8s_cache_for_cep branch from 265a9f6 to b91758c Compare March 15, 2023 11:12

tommyp1ckles reviewed Mar 15, 2023

View reviewed changes

tommyp1ckles approved these changes Mar 21, 2023

View reviewed changes

maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 21, 2023

youngnick merged commit 55bf4a5 into cilium:master Mar 22, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get CEP from k8s cache during initialization. #24340

Get CEP from k8s cache during initialization. #24340

marseel commented Mar 13, 2023

marseel commented Mar 14, 2023

tommyp1ckles Mar 15, 2023

marseel Mar 16, 2023 •

edited

tommyp1ckles Mar 20, 2023

tommyp1ckles Mar 20, 2023 •

edited

tommyp1ckles Mar 20, 2023 •

edited

tommyp1ckles Mar 20, 2023 •

edited

marseel Mar 21, 2023

tommyp1ckles Mar 21, 2023 •

edited

marseel commented Mar 16, 2023

Get CEP from k8s cache during initialization. #24340

Get CEP from k8s cache during initialization. #24340

Conversation

marseel commented Mar 13, 2023

marseel commented Mar 14, 2023

tommyp1ckles Mar 15, 2023

Choose a reason for hiding this comment

marseel Mar 16, 2023 • edited

Choose a reason for hiding this comment

tommyp1ckles Mar 20, 2023

Choose a reason for hiding this comment

tommyp1ckles Mar 20, 2023 • edited

Choose a reason for hiding this comment

tommyp1ckles Mar 20, 2023 • edited

Choose a reason for hiding this comment

tommyp1ckles Mar 20, 2023 • edited

Choose a reason for hiding this comment

marseel Mar 21, 2023

Choose a reason for hiding this comment

tommyp1ckles Mar 21, 2023 • edited

Choose a reason for hiding this comment

marseel commented Mar 16, 2023

marseel Mar 16, 2023 •

edited

tommyp1ckles Mar 20, 2023 •

edited

tommyp1ckles Mar 20, 2023 •

edited

tommyp1ckles Mar 20, 2023 •

edited

tommyp1ckles Mar 21, 2023 •

edited