Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operator: remove pod list of an entire cluster #11376

Merged
merged 1 commit into from May 13, 2020
Merged

Conversation

aanm
Copy link
Member

@aanm aanm commented May 6, 2020

Doing a pod list every 30 minutes can be brutal for kube-apiserver, specially if it is not running with a limit of the number of pods that should be returned. Having a watcher that receives notifications for the pods running as well as the CEPs, is enough to provide a GC for Cilium Identities and CEPs.

Since we have a k8s watcher and the CEPs have an owner set to themselves the CEP GC interval was decreased to 5 minutes.

Also, given that we don't depend on the CEP GC to perform Cilium IdentityGC we can set the identity heartbeat timeout to 2* the KVStore Lease TTL.

Signed-off-by: André Martins andre@cilium.io

Fixes #11472

@maintainer-s-little-helper
Copy link

Please set the appropriate release note label.

@maintainer-s-little-helper maintainer-s-little-helper bot added this to In progress in 1.8.0 May 6, 2020
@aanm aanm added the release-note/bug This PR fixes an issue in a previous release of Cilium. label May 6, 2020
@aanm aanm force-pushed the pr/avoid-pod-list branch 2 times, most recently from 1eee453 to 3371814 Compare May 7, 2020 20:15
@aanm aanm changed the base branch from pr/use-slimmer-protobuf-definitions-others to master May 7, 2020 20:15
@aanm aanm added area/operator Impacts the cilium-operator component release-note/misc This PR makes changes that have no direct user impact. and removed release-note/bug This PR fixes an issue in a previous release of Cilium. labels May 7, 2020
@aanm
Copy link
Member Author

aanm commented May 7, 2020

test-me-please

@aanm aanm marked this pull request as ready for review May 7, 2020 20:15
@aanm aanm requested a review from a team May 7, 2020 20:20
@coveralls
Copy link

coveralls commented May 7, 2020

Coverage Status

Coverage decreased (-0.07%) to 37.791% when pulling 29f49b0 on pr/avoid-pod-list into f8972f8 on master.

@aanm aanm requested a review from tgraf May 8, 2020 15:00
@aanm
Copy link
Member Author

aanm commented May 11, 2020

test-me-please

// there is more data, continue
continue perCEPFetch
if !exists {
// FIXME: this is fragile has we might have received the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are a few typos in this comment...

@aanm
Copy link
Member Author

aanm commented May 11, 2020

test-me-please

@aanm aanm requested a review from vadorovsky May 11, 2020 20:14
@aanm aanm requested a review from a team May 11, 2020 20:16
@aanm
Copy link
Member Author

aanm commented May 11, 2020

test-me-please

@aanm aanm requested a review from errordeveloper May 11, 2020 20:58
@aanm
Copy link
Member Author

aanm commented May 12, 2020

retest-net-next

@aanm aanm force-pushed the pr/avoid-pod-list branch 2 times, most recently from ad9813c to 842782d Compare May 12, 2020 09:10
@aanm
Copy link
Member Author

aanm commented May 12, 2020

test-me-please

operator/k8s_identity.go Outdated Show resolved Hide resolved
@@ -57,6 +58,11 @@ func (i *IdentityHeartbeatStore) IsAlive(identity string) bool {
i.mutex.RLock()
defer i.mutex.RUnlock()

// The identity is definitely alive if there's a CE using it.
if watchers.HasCEWithIdentity(identity) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looked very expensive at first but the index makes it bearable it seems.

@aanm
Copy link
Member Author

aanm commented May 13, 2020

test-me-please

@aanm aanm force-pushed the pr/avoid-pod-list branch 2 times, most recently from 5f1e79b to 199c64c Compare May 13, 2020 10:39
@aanm
Copy link
Member Author

aanm commented May 13, 2020

test-me-please

Doing a pod list every 30 minutes can be brutal for kube-apiserver,
specially if it is not running with a limit of the number of pods that
should be returned. Having a watcher that receives notifications for the
pods running as well as the CEPs, is enough to provide a GC for
Cilium Identities and CEPs.

Since we have a k8s watcher and the CEPs have an owner set to
themselves the CEP GC interval was decreased to 5 minutes.

Also, given that we don't depend on the CEP GC to perform Cilium Identity
GC we can set the identity heartbeat timeout to 2* the KVStore Lease TTL.

Signed-off-by: André Martins <andre@cilium.io>
@aanm
Copy link
Member Author

aanm commented May 13, 2020

test-me-please

@aanm
Copy link
Member Author

aanm commented May 13, 2020

hit #11512

@aanm aanm merged commit 37139c3 into master May 13, 2020
1.8.0 automation moved this from In progress to Merged May 13, 2020
@tgraf tgraf deleted the pr/avoid-pod-list branch May 13, 2020 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/operator Impacts the cilium-operator component release-note/misc This PR makes changes that have no direct user impact.
Projects
No open projects
1.8.0
  
Merged
Development

Successfully merging this pull request may close these issues.

operator: identityHeartbeat is nil in k8s_cep_gc.go
6 participants