Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/k8s: fallback on retrieving CiliumNode from kube-apiserver #22298

Merged
merged 1 commit into from Nov 22, 2022

Conversation

aanm
Copy link
Member

@aanm aanm commented Nov 21, 2022

pkg/k8s: fallback on retrieving CiliumNode from kube-apiserver

Retrieving objects from caches can be useful to prevent doing useless
requests to kube-apiserver. In the unlikely event that the object
doesn't exist in the local cache Cilium can try to retrieve it from
kube-apiserver directly. For this particular case, with CiliumNode, it
is causing Cilium to fatal as it is unable to retrieve CiliumNode from
the cache, due subsystem initialization issues, thus we will fallback on
retrieving the object directly from kube-apiserver.

In this case, the subsystem initialization issue happened due to the
fact that CiliumNode watcher is blocked on its event handler by the
egressGatewayManager [1] which is blocked by the initialization of the
identity allocator [2]. Unfortunately, the identity allocator is only
initialized at a later stage causing the CiliumNode cache from being
populated with all of its nodes.

[1]

k.egressGatewayManager.OnUpdateNode(n)

[2]
if err := manager.identityAllocator.WaitForInitialGlobalIdentities(identityCtx); err != nil {

Fixes: 69e4c69 ("k8s: optimize API calls made to kube-apiserver")

Fix Cilium fatal "Could not create or update CiliumNode resource, despite retries" on environments with `enable-ipv4-egress-gateway`

@aanm aanm added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. needs-backport/1.12 labels Nov 21, 2022
@aanm aanm requested a review from a team as a code owner November 21, 2022 20:25
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.12.5 Nov 21, 2022
@christarazi christarazi added the sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers. label Nov 21, 2022
Retrieving objects from caches can be useful to prevent doing useless
requests to kube-apiserver. In the unlikely event that the object
doesn't exist in the local cache Cilium can try to retrieve it from
kube-apiserver directly. For this particular case, with CiliumNode, it
is causing Cilium to fatal as it is unable to retrieve CiliumNode from
the cache, due subsystem initialization issues, thus we will fallback on
retrieving the object directly from kube-apiserver.

In this case, the subsystem initialization issue happened due to the
fact that CiliumNode watcher is blocked on its event handler by the
egressGatewayManager [1] which is blocked by the initialization of the
identity allocator [2]. Unfortunately, the identity allocator is only
initialized at a later stage causing the CiliumNode cache from being
populated with all of its nodes.

[1] https://github.com/cilium/cilium/blob/933bdcbec9319b0148b12688f720fbaaf55e0dba/pkg/k8s/watchers/cilium_node.go#L56
[2] https://github.com/cilium/cilium/blob/933bdcbec9319b0148b12688f720fbaaf55e0dba/pkg/egressgateway/manager.go#L83

Fixes: 69e4c69 ("k8s: optimize API calls made to kube-apiserver")
Signed-off-by: André Martins <andre@cilium.io>
@aanm
Copy link
Member Author

aanm commented Nov 21, 2022

/test

@aanm aanm added the feature/egress-gateway Impacts the egress IP gateway feature. label Nov 21, 2022
Copy link
Contributor

@michi-covalent michi-covalent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a magical one-line fix

@aanm
Copy link
Member Author

aanm commented Nov 22, 2022

Travis hit #21730

@aanm aanm merged commit 3a5e985 into cilium:master Nov 22, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.12 in 1.12.5 Nov 22, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.12 in 1.12.5 Nov 22, 2022
@jrajahalme jrajahalme added backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. and removed backport-pending/1.12 labels Nov 24, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.12 to Backport done to v1.12 in 1.12.5 Nov 24, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.12 to Backport done to v1.12 in 1.12.5 Nov 24, 2022
@aanm aanm deleted the pr/fix-cn-cache-retrieval branch November 24, 2022 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. feature/egress-gateway Impacts the egress IP gateway feature. kind/bug This is a bug in the Cilium logic. kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers.
Projects
No open projects
1.12.5
Backport done to v1.12
Development

Successfully merging this pull request may close these issues.

None yet

7 participants