ipam: Add exponential backoff when pool maintanance fails #21473

gandro · 2022-09-28T11:40:42Z

When pool maintenance fails, the pool maintenance trigger is triggered
such that the logic is executed again. However, if maintenance fails for
example because of external reasons, the default retry interval of 10
milliseconds is way to short. Especially if the cloud provider API is
overloaded, multiple nodes can be stuck in a 10 millisecond retry loop,
which will make the situation even worse.

Therefore, this commit introduces an exponential backoff if the pool
maintenance function fails with an error. The minimum trigger interval
remains 10 milliseconds to allow for other trigger reasons (e.g.
because of a resync) to not be delayed as long as the node is healthy.

This adds a new optional callback to the trigger mechanism which will be called if a trigger is stopped via the Trigger.Shutdown. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

This commit extracts the main ClusterSizeDependantInterval computation so it can be used by different node managers. It will be used in a subsequent commit. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

When pool maintenance fails, the pool maintenance trigger is triggered such that the logic is executed again. However, if maintenance fails for example because of external reasons, the default retry interval of 10 milliseconds is way to short. Especially if the cloud provider API is overloaded, multiple nodes can be stuck in a 10 millisecond retry loop, which will make the situation even worse. Therefore, this commit introduces an exponential backoff if the pool maintenance function fails with an error. The minimum trigger interval remains 10 milliseconds to allow for other trigger reasons (e.g. because of a resync) to not be delayed as long as the node is healthy. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

gandro · 2022-09-29T09:42:50Z

/test

gandro · 2022-09-29T13:48:29Z

Failing CI pipelines are not required and the failures are unrelated (infra issue in both cases). Marking ready-to-merge.

gandro added 2 commits September 28, 2022 13:22

trigger: Add ShutdownFunc callback

3bd9a34

This adds a new optional callback to the trigger mechanism which will be called if a trigger is stopped via the Trigger.Shutdown. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

backoff: Extract ClusterSizeDependantInterval helper

7a9aff7

This commit extracts the main ClusterSizeDependantInterval computation so it can be used by different node managers. It will be used in a subsequent commit. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>

maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Sep 28, 2022

maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Sep 28, 2022

gandro marked this pull request as ready for review September 28, 2022 14:08

gandro requested review from a team as code owners September 28, 2022 14:08

gandro requested a review from jibi September 28, 2022 14:08

gandro force-pushed the pr/gandro/ipam-add-maintenance-backoff branch from 2a1065b to f6d8d26 Compare September 28, 2022 14:11

christarazi approved these changes Sep 29, 2022

View reviewed changes

jibi approved these changes Sep 29, 2022

View reviewed changes

gandro added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Sep 29, 2022

ti-mo approved these changes Sep 30, 2022

View reviewed changes

ti-mo merged commit 9674d71 into cilium:master Sep 30, 2022

wu0407 mentioned this pull request Oct 31, 2022

CFP: need reset backoff period in pkg/backoff #21936

Closed

bimmlerd mentioned this pull request Jan 16, 2023

[v1.12] - ipam: Add exponential backoff when pool maintanance fails bimmlerd/cilium#29

Closed

gandro mentioned this pull request Sep 26, 2023

Introduce backoff on cloud IPAM failures #28273

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ipam: Add exponential backoff when pool maintanance fails #21473

ipam: Add exponential backoff when pool maintanance fails #21473

gandro commented Sep 28, 2022

gandro commented Sep 29, 2022

gandro commented Sep 29, 2022 •

edited

Loading

ipam: Add exponential backoff when pool maintanance fails #21473

ipam: Add exponential backoff when pool maintanance fails #21473

Conversation

gandro commented Sep 28, 2022

gandro commented Sep 29, 2022

gandro commented Sep 29, 2022 • edited Loading

gandro commented Sep 29, 2022 •

edited

Loading