k8s: CEP GC runs in steps #6029

raybejjani · 2018-10-24T09:31:02Z

On large clusters the CEP GC would fetch the entire list of CEPs. These
can grow to significant sizes (100kB in one instance) and this causes
extreme degradation in kube-apiserver. We now iterate 10 CEPs at a
time, thus lowering the load we put on the cluster.

I also return the cache.Store object from our k8s helper factories. No functionality has changed there.

I began by trying to watch for CEPs in the controller but the code became a little complex. This was partly because only the node that should do GC needs to watch for CEPs, but that meant creating a non-shared informer (shared informers cannot stop once they begin watching for a specific type). In the end, limiting the list to a small number seemed like a reasonable compromise, especially since the GC isn't a critical operation here.

Relates to: #5913

This change is

raybejjani · 2018-10-24T09:31:10Z

test-me-please

raybejjani · 2018-10-24T10:05:35Z

test-me-please

raybejjani · 2018-10-24T12:34:48Z

test-me-please

ianvernon · 2018-10-24T15:37:04Z

pkg/endpoint/endpoint.go

+					ceps, err := ciliumClient.CiliumEndpoints(meta_v1.NamespaceAll).List(listOpts)
+					switch {
+					case ceps.Continue != "" && err != nil:
+						// this is ok, it means we saw a 410 ResourceExpired error but we


The comment is helpful, but is this documented somewhere? I'd suggest putting a link in a comment here to said documentation.

Good point. I was, in fact, checking this incorrectly. I've fixed it.

ianvernon · 2018-10-24T15:38:19Z

@raybejjani do you have any benchmarks from manual testing that you can post here to validate the fix?

raybejjani · 2018-10-25T14:17:34Z

@ianvernon I added 10k CEPs, all would be GCed. That took 12
The no-op GC took

Oct 25 13:54:30 k8s1 cilium-agent[29443]: level=info msg="Time taken 48.722672ms" controller="sync-to-k8s-ciliumendpoint-gc (k8s1)" subsys=endpoint

The real one took

Oct 25 14:14:14 k8s1 cilium-agent[32403]: level=info msg="Time taken 6m33.807662252s" controller="sync-to-k8s-ciliumendpoint-gc (k8s1)" endpointID=0 k8sPodName=default/xwing-2316 subsys=endpoint

Oddly, listing the CEPs after this still shows them but the count is decreasing... so I think there is a delay on the delete.

I'll run this again with no fetch limit... I'll be a few minutes

raybejjani · 2018-10-25T14:21:55Z

Ok, maybe not... I stopped the agent and the count stopped dropping. I think another GC run had started... although that shouldn't have happened (the interval is 30 minutes)

raybejjani · 2018-10-25T14:26:51Z

test-me-please

raybejjani · 2018-10-25T15:23:22Z

Running this again but with 1000 CEPs (10k was taking too long)

No CEPs to delete

Oct 25 14:38:35 k8s1 cilium-agent[15383]: level=info msg="Time taken 54.671552ms" controller="sync-to-k8s-ciliumendpoint-gc (k8s1)" subsys=endpoint

1000 CEPs to delete

Oct 25 14:51:04 k8s1 cilium-agent[20124]: level=info msg="Time taken 4m27.606818717s, orphans: 1002" controller="sync-to-k8s-ciliumendpoint-gc (k8s1)" endpointID=0 k8sPodName=default/xwing-999 subsys=endpoint

1000 CEPs to delete without a fetch limit

Oct 25 15:00:00 k8s1 cilium-agent[26276]: level=info msg="Time taken 4m2.605554694s, orphans: 1000" controller="sync-to-k8s-ciliumendpoint-gc (k8s1)" endpointID=0 k8sPodName=default/xwing-999 subsys=endpoint

1000 CEPs to delete with Propagation: Background

Oct 25 15:19:18 k8s1 cilium-agent[5325]: level=info msg="Time taken 4m3.37785256s, orphans: 1000" controller="sync-to-k8s-ciliumendpoint-gc (k8s1)" endpointID=0 k8sPodName=default/xwing-999 subsys=endpoint

I don't think we can extrapolate to much larger numbers, necessarily, since there may be some cliff somewhere. In any case, the goal was to lower the likelihood of overloading the apiserver (or the etcd behind it) when fetching these.

raybejjani · 2018-10-25T15:24:10Z

test-me-please

pkg/endpoint/endpoint.go

raybejjani

I'll run some more benchmarks but I updated the code a little.

pkg/endpoint/endpoint.go

raybejjani · 2018-11-03T15:20:00Z

Ok, I finally ran with different limit values. It doesn't seem to make a difference:

No-op reference. Time taken 1.043820863s, CEPs: 0
Limit 10. Time taken 4m4.20950814s, CEPs: 1000
Limit 25. Time taken 4m4.236100466s, CEPs: 1000
Limit 50. Time taken 4m4.291099082s, CEPs: 1000
No limit. Time taken 4m4.207404591s, CEPs: 1000

raybejjani · 2018-11-03T15:21:23Z

test-me-please

coveralls · 2018-11-03T15:31:20Z

Coverage decreased (-0.1%) to 44.072% when pulling e41505c on raybejjani:cep-smaller-gc into e2b43dd on cilium:master.

raybejjani · 2018-11-11T14:21:26Z

test-me-please

raybejjani · 2018-11-12T12:21:42Z

test-me-please

The watcher also keeps a local cache of objects. We can use this for lookups in places like garbage-collection. Signed-off-by: Ray Bejjani <ray@covalent.io>

On large clusters the CEP GC would fetch the entire list of CEPs. These can grow to significant sizes (100kB in one instance) and this causes extreme degradation in kube-apiserver. We now iterate 10 CEPs at a time, thus lowering the load we put on the cluster. Signed-off-by: Ray Bejjani <ray@covalent.io>

raybejjani · 2018-11-14T16:46:30Z

test-me-please

WingkaiHo · 2019-03-01T07:09:42Z

if this patch merge to 1.2 version

tgraf · 2019-03-01T08:57:48Z

@WingkaiHo I'm assuming you ran into scale issues with CEP. We recommend upgrading to 1.4 to use CEP at scale. Upgrading this commit only will not be sufficient to resolve all scale issues present in 1.2 and 1.3.

raybejjani requested a review from a team as a code owner October 24, 2018 09:31

raybejjani requested a review from a team October 24, 2018 09:31

raybejjani added kind/enhancement This would improve or streamline existing functionality. pending-review area/daemon Impacts operation of the Cilium daemon. sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers. labels Oct 24, 2018

raybejjani force-pushed the cep-smaller-gc branch from cb7a1ab to df4bd02 Compare October 24, 2018 10:04

ianvernon reviewed Oct 24, 2018

View reviewed changes

raybejjani force-pushed the cep-smaller-gc branch from df4bd02 to 7f59875 Compare October 25, 2018 14:22

raybejjani added wip and removed pending-review labels Oct 25, 2018

raybejjani force-pushed the cep-smaller-gc branch from 7f59875 to 30d9828 Compare October 25, 2018 15:24

raybejjani added pending-review and removed wip labels Oct 25, 2018

aanm requested changes Oct 25, 2018

View reviewed changes

pkg/endpoint/endpoint.go Outdated Show resolved Hide resolved

pkg/endpoint/endpoint.go Outdated Show resolved Hide resolved

tgraf requested changes Oct 28, 2018

View reviewed changes

pkg/endpoint/endpoint.go Outdated Show resolved Hide resolved

pkg/endpoint/endpoint.go Outdated Show resolved Hide resolved

raybejjani commented Oct 29, 2018

View reviewed changes

pkg/endpoint/endpoint.go Outdated Show resolved Hide resolved

pkg/endpoint/endpoint.go Outdated Show resolved Hide resolved

pkg/endpoint/endpoint.go Outdated Show resolved Hide resolved

raybejjani force-pushed the cep-smaller-gc branch from 30d9828 to 4790d8d Compare October 29, 2018 16:05

raybejjani added wip and removed pending-review labels Oct 29, 2018

raybejjani force-pushed the cep-smaller-gc branch from 4790d8d to 7b9380a Compare November 3, 2018 15:20

raybejjani added pending-review and removed wip labels Nov 3, 2018

tgraf approved these changes Nov 7, 2018

View reviewed changes

tgraf added the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Nov 7, 2018

tgraf added this to the 1.4-bugfix milestone Nov 7, 2018

aanm approved these changes Nov 8, 2018

View reviewed changes

raybejjani force-pushed the cep-smaller-gc branch from 7b9380a to 89c344f Compare November 11, 2018 14:21

aanm and others added 2 commits November 14, 2018 17:40

k8s: Return cache.Store from k8s watches factory

1b6bfde

The watcher also keeps a local cache of objects. We can use this for lookups in places like garbage-collection. Signed-off-by: Ray Bejjani <ray@covalent.io>

raybejjani force-pushed the cep-smaller-gc branch from 89c344f to e41505c Compare November 14, 2018 16:44

raybejjani requested a review from a team as a code owner November 14, 2018 16:44

raybejjani merged commit 0a17787 into cilium:master Nov 15, 2018

raybejjani deleted the cep-smaller-gc branch November 15, 2018 12:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k8s: CEP GC runs in steps #6029

k8s: CEP GC runs in steps #6029

raybejjani commented Oct 24, 2018 •

edited by tgraf

Loading

raybejjani commented Oct 24, 2018

raybejjani commented Oct 24, 2018

raybejjani commented Oct 24, 2018

ianvernon Oct 24, 2018

raybejjani Oct 25, 2018

ianvernon commented Oct 24, 2018

raybejjani commented Oct 25, 2018 •

edited

Loading

raybejjani commented Oct 25, 2018

raybejjani commented Oct 25, 2018

raybejjani commented Oct 25, 2018

raybejjani commented Oct 25, 2018

raybejjani left a comment

raybejjani commented Nov 3, 2018

raybejjani commented Nov 3, 2018

coveralls commented Nov 3, 2018 •

edited

Loading

raybejjani commented Nov 11, 2018

raybejjani commented Nov 12, 2018

raybejjani commented Nov 14, 2018

WingkaiHo commented Mar 1, 2019

tgraf commented Mar 1, 2019

k8s: CEP GC runs in steps #6029

k8s: CEP GC runs in steps #6029

Conversation

raybejjani commented Oct 24, 2018 • edited by tgraf Loading

raybejjani commented Oct 24, 2018

raybejjani commented Oct 24, 2018

raybejjani commented Oct 24, 2018

ianvernon Oct 24, 2018

Choose a reason for hiding this comment

raybejjani Oct 25, 2018

Choose a reason for hiding this comment

ianvernon commented Oct 24, 2018

raybejjani commented Oct 25, 2018 • edited Loading

raybejjani commented Oct 25, 2018

raybejjani commented Oct 25, 2018

raybejjani commented Oct 25, 2018

raybejjani commented Oct 25, 2018

raybejjani left a comment

Choose a reason for hiding this comment

raybejjani commented Nov 3, 2018

raybejjani commented Nov 3, 2018

coveralls commented Nov 3, 2018 • edited Loading

raybejjani commented Nov 11, 2018

raybejjani commented Nov 12, 2018

raybejjani commented Nov 14, 2018

WingkaiHo commented Mar 1, 2019

tgraf commented Mar 1, 2019

raybejjani commented Oct 24, 2018 •

edited by tgraf

Loading

raybejjani commented Oct 25, 2018 •

edited

Loading

coveralls commented Nov 3, 2018 •

edited

Loading