New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v1.9] Backport of #17492 #17493
Merged
joestringer
merged 2 commits into
cilium:v1.9
from
christarazi:pr/christarazi/v1.9-backport-17492
Oct 1, 2021
Merged
[v1.9] Backport of #17492 #17493
joestringer
merged 2 commits into
cilium:v1.9
from
christarazi:pr/christarazi/v1.9-backport-17492
Oct 1, 2021
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
christarazi
added
backport/1.9
kind/backports
This PR provides functionality previously merged into master.
labels
Sep 29, 2021
/test-backport-1.9 |
christarazi
force-pushed
the
pr/christarazi/v1.9-backport-17492
branch
from
September 29, 2021 23:20
006ecef
to
d1268ca
Compare
/test-backport-1.9 |
[ upstream commit c88e06f ] This will be used in the subsequent commit to check for a specific error when allocating an IP. Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit baa9e9a ] In CRD-backed IPAM modes such as ENI mode, some IPs were recently removed from the IPAM pool [1]. In user environments where Cilium was running a version prior to [1], it is possible for endpoints to be assigned "unavailable" IPs. Upon upgrade (to a version which includes [1]), endpoint restoration will fail with [2]. In order to workaround this failure and not disrupt the upgrade, this change introduces a hidden flag (`--bypass-ip-availability-upon-restore`) which will inform Cilium to continue on if the restored endpoint's IP is not available for reallocation, bypassing the specific error "IP is not available". Other errors will not be bypassed, in order to reduce the scope of this stop-gap solution. With the flag set, restored endpoints which had "unavailable" IPs will keep them. Any new endpoints / pods will be assigned fresh, valid IPs from the pool. This flag is only meant to be enabled with CRD-backed IPAM modes such as ENI mode. The reason is because of the change described in [1], where the primary ENI IP was removed from the IPAM pool. In any other mode that this flag is enabled in, the user is warned that the flag is not intended for other modes and will have no effect. This patch is intended to be reverted in the future, as this stop-gap solution will no longer be required, as users of Cilium don't upgrade from versions prior to [1]. I propose that we revert this in the following release that this patch makes it in (N+1). How was this tested? 1) Deployed a Cilium version that doesn't include [1] on EKS cluster 2) Created a Deployment object which I scaled to max out the ENI IPs, such that at least one pod is assigned an "unavailable" IP 3) Upgraded Cilium to a version which does include [1] and observe [2] failures 4) Reset cluster back to state from (2) 5) Upgrade Cilium to the version that contains this commit 6) Observe log msgs from this commit and endpoint restoration succeeding 7) Scale Deployment to 0 and back up, to restart all pods 8) Observe that they all get fresh IPs and none of the "unavailable" IPs [1]: cilium#15453 [2]: ``` { "time":"2021-09-20T16:57:00.400086481Z", "level":"WARN", "origin":"cilium.io/agent", "message":"Unable to restore endpoint, ignoring", "params":{ "endpointID":"992", "error":"Failed to re-allocate IP of endpoint: unable to reallocate 10.0.133.193 IPv4 address: IP 10.0.133.193 is not available", "k8sPodName":"default/pod-1", "subsys":"daemon" } } ``` Signed-off-by: Chris Tarazi <chris@isovalent.com>
christarazi
force-pushed
the
pr/christarazi/v1.9-backport-17492
branch
from
September 30, 2021 21:35
d1268ca
to
020cc98
Compare
/test-backport-1.9 Edit: GKE images failed to pull: https://jenkins.cilium.io/job/Cilium-PR-K8s-GKE/6507/ |
/test-gke Edit: failed again, turns out it's #17282 |
joestringer
approved these changes
Oct 1, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Once this PR is merged, you can update the PR labels via: