Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eni: Fix Cilium overallocating network interfaces #15911

Merged
merged 2 commits into from May 4, 2021

Conversation

gandro
Copy link
Member

@gandro gandro commented Apr 28, 2021

This fixes a bug where Cilium wrongly overestimated the amount of available
ENI IP addresses. This bug was introduced when we removed the primary
ENI IP address from the IPAM pool, but forgot to adjust the number of
addresses used to compare with the AWS instance limits.

This led to the operator overestimating the number of available IP
addresses by one. This in turn could lead to the operator first failing
to allocate more IPs (because it exceeded the limit) and then
unnecessarily creating a new ENI to fulfill the allocation request.

Fixes: 7c1bb35 ("aws/ec2: Exclude primary ENI IP from IPAM pool")
Fixes: #15877

@gandro gandro added release-note/bug This PR fixes an issue in a previous release of Cilium. area/eni Impacts ENI based IPAM. sig/ipam IP address management, including cloud IPAM needs-backport/1.8 labels Apr 28, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.10.0-rc2 Apr 28, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.8.10 Apr 28, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.9.7 Apr 28, 2021
@gandro gandro force-pushed the pr/gandro/fix-eni-creation-loop branch 2 times, most recently from 9563b6d to 1eb7545 Compare April 28, 2021 13:09
This commit modifies the CreateNetworkInterface mock API to mirror what
the actual EC2 API implementation does for the `toAllocate` value, i.e.
only use this number for secondary IP addresses, as a primary address is
always implicitly allocated.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This fixes a bug where Cilium wrongly overestimated the amount of available
ENI IP addresses. This bug was introduced when we removed the primary
ENI IP address from the IPAM pool, but forgot to adjust the number of
addresses used to compare with the AWS instance limits.

This led to the operator overestimating the number of available IP
addresses by one. This in turn could lead to the operator first failing
to allocate more IPs (because it exceeded the limit) and then
unnecessarily creating a new ENI to fulfill the allocation request.

Fixes: 7c1bb35 ("aws/ec2: Exclude primary ENI IP from IPAM pool")
Fixes: cilium#15877

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro force-pushed the pr/gandro/fix-eni-creation-loop branch from 1eb7545 to 2f60008 Compare April 28, 2021 18:13
@gandro gandro marked this pull request as ready for review April 28, 2021 18:13
@gandro gandro requested a review from a team as a code owner April 28, 2021 18:13
@gandro gandro requested review from a team and tgraf April 28, 2021 18:13
@gandro
Copy link
Member Author

gandro commented Apr 28, 2021

test-me-please

1 similar comment
@gandro
Copy link
Member Author

gandro commented Apr 28, 2021

test-me-please

@gandro
Copy link
Member Author

gandro commented Apr 29, 2021

retest-1.16-netnext

Edit: Was https://jenkins.cilium.io/job/Cilium-PR-K8s-1.16-net-next/406/ #15455

@gandro
Copy link
Member Author

gandro commented Apr 29, 2021

retest-gke

Edit: Cilium was sigkilled, might have been a issue with the cluster - retesting https://jenkins.cilium.io/job/Cilium-PR-K8s-GKE/5331/

@gandro
Copy link
Member Author

gandro commented May 3, 2021

retest-1.16-netnext

Seems like it hit a network fluke: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.16-net-next/408/

@gandro
Copy link
Member Author

gandro commented May 3, 2021

test-1.19-5.4

@gandro
Copy link
Member Author

gandro commented May 3, 2021

test-1.16-netnext

@gandro gandro added the kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. label May 3, 2021
Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this! LGTM 🚀

@gandro
Copy link
Member Author

gandro commented May 4, 2021

Marking this ready to merge, the CI failure is in the datapath which is not affected by this very ENI-specific PR. Created an issue for it #15998

@gandro gandro added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 4, 2021
@ti-mo ti-mo merged commit 119cd19 into cilium:master May 4, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.8 in 1.8.10 May 4, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.9 in 1.9.7 May 6, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.8 to Backport done to v1.8 in 1.8.10 May 6, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.8 to Backport done to v1.8 in 1.8.10 May 6, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.9 to Backport done to v1.9 in 1.9.7 May 6, 2021
@brb brb mentioned this pull request May 7, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.10 in 1.10.0-rc2 May 7, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.10 to Backport done to v1.10 in 1.10.0-rc2 May 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/eni Impacts ENI based IPAM. kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/ipam IP address management, including cloud IPAM
Projects
No open projects
1.10.0-rc2
Backport done to v1.10
1.8.10
Backport done to v1.8
1.9.7
Backport done to v1.9
Development

Successfully merging this pull request may close these issues.

Cilium on EKS continually creates network interfaces
7 participants