Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.14] ipam: Fix bug where IP lease did not expire #29652

Merged
merged 2 commits into from Dec 6, 2023

Conversation

gandro
Copy link
Member

@gandro gandro commented Dec 6, 2023

Once this PR is merged, a GitHub action will update the labels of these PRs:

 29443

[ upstream commit be9e853 ]

This commit addresses two problems with the IPAM expiration timer:

1. Before this commit, each timer consisted of a Go routine calling
   `time.Sleep` to wait for expiration to occur. The default expiration
   timeout is 10 minutes. This meant, that for every IP allocated via
   CNI ADD, we had a Go routine unconditionally sleeping for 10 minutes,
   only to (in most cases) wake up and learn that the expiration timer was stopped.
   This commit improves that situation by having the expiration Go
   routine wake up and exit early if it was stopped (either via IP Release
   or `StopExpirationTimer`).

2. In CI, we set the hidden `max-internal-timer-delay` option to 5
   seconds (see cilium#27253). This meant that the `time.Sleep`
   expiration timer would effectively be 5 seconds instead of 10 minutes. 5
   seconds however is not enough for an endpoint to be created via CNI ADD
   and complete its first endpoint regeneration. This therefore led to
   endpoint IPs being released while the endpoint was still being created.
   Due to another bug (fixed in the next commit) the expiration timer
   failed to actually  release the IP, which is why this bug was not
   discovered earlier when we introduced the 5 second limit.  This commit
   addresses this issue by adding an escape hatch to `pkg/time`, allowing
   the creation of a timer which is not subject to the
  `max-internal-timer-delay`.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit cf76f89 ]

The `AllocateNextWithExpiration`[1] function is used to allocate an IP
via API from the CNI plugin. Such IPs are always allocated with an
expiration timer, which means that if the CNI ADD fails later on and is
never retried, the IP is automatically released. Only once an endpoint
is created, we then stop the expiration timer during the endpoint
creation request [2], making the allocation of the IP permanent until it
is explicitly freed.

The current expiration implementation however has a bug: Instead of
releasing the IP back into the IPAM pool from where the IP was actually
allocated from, we forwarded the desired pool, which can be empty and is
later overwritten with the actual pool.

Because we passed in an empty pool into `StartExpirationTimer`, this led
to IP expiration being broken in almost all cases:

```
2023-11-24T06:24:37.089657953Z level=warning msg="Unable to release IP after expiration" error="no IPAM pool provided for IP release of 10.0.1.41" ip=10.0.1.41 subsys=ipam uuid=2320c5c1-b4c0-4a2e-8f3d-2b906330ab55
```

This commit fixes that by using the realized pool (from the result)
instead of the desired pool from the request. In addition, the unit
tests are also adapted to cover this case to ensure we don't regress.

[1] https://github.com/cilium/cilium/blob/0fcd1c86e347b2701880c9034e7ea3a74cd6b13e/daemon/cmd/ipam.go#L46
[2] https://github.com/cilium/cilium/blob/95a7d1288d5a13a5a216dcdb09383f1f483e5ac1/daemon/cmd/endpoint.go#L536

Reported-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro added kind/backports This PR provides functionality previously merged into master. backport/1.14 This PR represents a backport for Cilium 1.14.x of a PR that was merged to main. labels Dec 6, 2023
@gandro gandro marked this pull request as ready for review December 6, 2023 09:14
@gandro gandro requested a review from a team as a code owner December 6, 2023 09:14
@gandro gandro changed the title v1.14 Backports 2023-12-06 [v1.14] ipam: Fix bug where IP lease did not expire Dec 6, 2023
@gandro
Copy link
Member Author

gandro commented Dec 6, 2023

/test-backport-1.14

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Dec 6, 2023
@gandro gandro merged commit 16ee6d9 into cilium:v1.14 Dec 6, 2023
60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.14 This PR represents a backport for Cilium 1.14.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants