New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v1.14] ipam: Fix bug where IP lease did not expire #29652
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ upstream commit be9e853 ] This commit addresses two problems with the IPAM expiration timer: 1. Before this commit, each timer consisted of a Go routine calling `time.Sleep` to wait for expiration to occur. The default expiration timeout is 10 minutes. This meant, that for every IP allocated via CNI ADD, we had a Go routine unconditionally sleeping for 10 minutes, only to (in most cases) wake up and learn that the expiration timer was stopped. This commit improves that situation by having the expiration Go routine wake up and exit early if it was stopped (either via IP Release or `StopExpirationTimer`). 2. In CI, we set the hidden `max-internal-timer-delay` option to 5 seconds (see cilium#27253). This meant that the `time.Sleep` expiration timer would effectively be 5 seconds instead of 10 minutes. 5 seconds however is not enough for an endpoint to be created via CNI ADD and complete its first endpoint regeneration. This therefore led to endpoint IPs being released while the endpoint was still being created. Due to another bug (fixed in the next commit) the expiration timer failed to actually release the IP, which is why this bug was not discovered earlier when we introduced the 5 second limit. This commit addresses this issue by adding an escape hatch to `pkg/time`, allowing the creation of a timer which is not subject to the `max-internal-timer-delay`. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit cf76f89 ] The `AllocateNextWithExpiration`[1] function is used to allocate an IP via API from the CNI plugin. Such IPs are always allocated with an expiration timer, which means that if the CNI ADD fails later on and is never retried, the IP is automatically released. Only once an endpoint is created, we then stop the expiration timer during the endpoint creation request [2], making the allocation of the IP permanent until it is explicitly freed. The current expiration implementation however has a bug: Instead of releasing the IP back into the IPAM pool from where the IP was actually allocated from, we forwarded the desired pool, which can be empty and is later overwritten with the actual pool. Because we passed in an empty pool into `StartExpirationTimer`, this led to IP expiration being broken in almost all cases: ``` 2023-11-24T06:24:37.089657953Z level=warning msg="Unable to release IP after expiration" error="no IPAM pool provided for IP release of 10.0.1.41" ip=10.0.1.41 subsys=ipam uuid=2320c5c1-b4c0-4a2e-8f3d-2b906330ab55 ``` This commit fixes that by using the realized pool (from the result) instead of the desired pool from the request. In addition, the unit tests are also adapted to cover this case to ensure we don't regress. [1] https://github.com/cilium/cilium/blob/0fcd1c86e347b2701880c9034e7ea3a74cd6b13e/daemon/cmd/ipam.go#L46 [2] https://github.com/cilium/cilium/blob/95a7d1288d5a13a5a216dcdb09383f1f483e5ac1/daemon/cmd/endpoint.go#L536 Reported-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro
added
kind/backports
This PR provides functionality previously merged into master.
backport/1.14
This PR represents a backport for Cilium 1.14.x of a PR that was merged to main.
labels
Dec 6, 2023
gandro
changed the title
v1.14 Backports 2023-12-06
[v1.14] ipam: Fix bug where IP lease did not expire
Dec 6, 2023
ti-mo
approved these changes
Dec 6, 2023
maintainer-s-little-helper
bot
added
the
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
label
Dec 6, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backport/1.14
This PR represents a backport for Cilium 1.14.x of a PR that was merged to main.
kind/backports
This PR provides functionality previously merged into master.
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Once this PR is merged, a GitHub action will update the labels of these PRs: