Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.8] Delete Cilium Endpoints for no longer running Pods #13592

Merged
merged 2 commits into from
Oct 20, 2020

Conversation

aanm
Copy link
Member

@aanm aanm commented Oct 15, 2020

Since the backport #13263 was not enough, we need to backport a couple more commits from the original PR #13220

These commits are also required since a leftover Pod might no longer have a local cilium endpoint available. This means, on restore, the leftover Cilium endpoint will never be removed by the Cilium agent. Thus, the job of garbage collecting these Cilium Endpoints needs to be done by the Cilium Operator.

Delete Cilium Endpoints for no longer running Pods

Fixes #13240

@aanm aanm added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. backport/1.8 labels Oct 15, 2020
@aanm aanm requested a review from a team as a code owner October 15, 2020 17:19
@maintainer-s-little-helper maintainer-s-little-helper bot added the kind/backports This PR provides functionality previously merged into master. label Oct 15, 2020
@aanm
Copy link
Member Author

aanm commented Oct 15, 2020

test-backport-1.8

@aanm
Copy link
Member Author

aanm commented Oct 16, 2020

retest-net-next

[ upstream commit 2988760 ]

This field is essential to understand if the pod is still running or
not.

Signed-off-by: André Martins <andre@cilium.io>
[ upstream commit b3adc4d ]

In Kubernetes, a Job creates a pod which will complete with either
the "Succeeded" or "Failed" PodPhase. Kubernetes will leave these
Pods around until the Job is deleted by the operator. As soon the pod
enters either one of the previously described PodPhases, Kubelet will
send a CNI delete event to Cilium agent which will then release the
allocated IP addresses of that pod, making the IP address available
again.

If not disabled, Cilium will create a Cilium Endpoint for each Pod in
the cluster that has its network managed by Cilium.

Cilium agent populates the ipcache with the information retrieved from
Pods and Cilium Endpoints events, in case of duplicated information,
ipcache will be stored with the state from Cilium Endpoints.

In a unlikely case of Cilium agent not running and the Pod enters the
"Succeeded" state, it will mean the Cilium agent will not be available
to delete the Cilium Endpoint created for that Pod.

To complement this fix, Cilium agents will also prune Cilium Endpoints
of not running pods on start up.

Signed-off-by: André Martins <andre@cilium.io>
@aanm
Copy link
Member Author

aanm commented Oct 20, 2020

test-backport-1.8

@nebril
Copy link
Member

nebril commented Oct 20, 2020

test-missed-k8s

@aanm aanm merged commit 5d5219d into cilium:v1.8 Oct 20, 2020
@aanm aanm deleted the pr/backport-13220 branch October 21, 2020 08:17
@joestringer joestringer added this to Backport done to v1.8 in 1.8.5 Oct 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master. kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
1.8.5
Backport done to v1.8
Development

Successfully merging this pull request may close these issues.

None yet

3 participants