Fix possible IP leak in case ENI's are not present in the CN yet #18352

codablock · 2022-01-03T10:40:08Z

buildAllocationResult may return an error in case of inconsistencies found
in the local CN's status. For example, there are situations where an IP
is already part of spec.ipam.pool (including the resource/ENI where the IP
comes from), while the corresponding ENI is not part of status.eni.enis
yet.

If that is the case, the IP would be allocated (e.g. by allocateNext)
and then marked as allocated (via a.markAllocated). Shortly after that,
a.buildAllocationResult() would fail and then NOT undo the changes done
by a.markAllocated(). This will then result in the IP never being freed up
again. At the same time, kubelet will keep scheduling PODs onto the same
node without knowing that IPs run out and thus causing new PODs to never
get an IP.

Why exactly this inconsistency between the spec and the status arise is
a different topic and should maybe be investigated further.

This commit/PR fixes this issue by simply moving a.markAllocated() after
the a.buildAllocationResult() result, so that the function is bailed out
early enough.

Fix possible IP leak in case ENI's are not present in the CN yet

codablock · 2022-01-03T10:40:45Z

FYI, This was found/fixed/tested on the v1.11 branch. Backporting to at least 1.11 is required as I assume.

codablock · 2022-01-03T12:10:06Z

Some additional info on how I encountered this issue and maybe how to reproduce it. We have a cluster running that does automatic downscaling of all deployments at night and then relies on cluster-autoscaler to also shut down nodes. Next morning, all deployments are upscaled again, causing cluster-autoscaler to also start many nodes at once.

This causes many nodes to appear in k8s at the same time, all being NotReady at the beginning. Cilium agents are then started on each node. When cilium agents start to get ready, the node are also marked Ready, causing the k8s scheduler to immediately schedule dozens of PODs onto the Ready nodes, long before cilium-operator had a chance to attach new ENIs and IPs to the fresh nodes.

This means that all PODs scheduled to the fresh nodes run into a temporary state where the CNI plugin reports that there are no more IPs available. All this is expected and normal until this point.

After a few seconds, cilium-operator finishes attaching new ENIs to the fresh nodes and then tries to update the CN. The update to the spec.pool seems to be successful then, causing the agent to allocate the IP. But as the update to the status seems to fail, the agent then bails out with the IP being marked as used and thus causing the leak.

This is only happening with very high load on the apiserver. At the same time, I can observe errors like these happening in cilium-operator:

level=warning msg="Failed to update CiliumNode" attempt=1 error="Operation cannot be fulfilled on ciliumnodes.cilium.io \"ip-100-66-62-168.eu-central-1.compute.internal\": the object has been modified; please apply your changes to the latest version and try again" instanceID=i-009466ca3d82a1ec0 name=ip-100-66-62-168.eu-central-1.compute.internal subsys=ipam updateStatus=true

Please note the attempt=1 in the log line, it indicates that the first attempt also failed and that no further attempt is done (looking at the many for retry := 0; retry < 2; retry++ loops found in the code). I assume (without 100% knowing) that this is the reason for the inconsistency in spec vs status.

christarazi

Thanks for investigating and submitting this PR, great work!

Just one comment below on error / log handling.

pkg/ipam/crd.go

christarazi · 2022-01-03T19:46:04Z

Additionally from #18352 (comment):

Could you add the repro steps and explanation in the commit msg itself? This is helpful for developers in the future coming across this and being reminded how the issue manifested.
Could you file an issue to fix the insufficient number of attempts to resolve the custom resource update conflict in the Operator?

christarazi · 2022-01-03T19:46:48Z

Pinging @jaffcheng who worked on this code last for quick look.

codablock · 2022-01-03T23:26:43Z

@christarazi Thanks for taking a look at the PR. I've applied your suggestion in regard to the error message and also added the repro/description/analysis to the commit message.

I'll do the issue about the retries tomorrow.

christarazi

Thanks!

codablock · 2022-01-04T08:33:49Z

Here is the issue for the retry issue: #18366

buildAllocationResult may return an error in case of inconsistencies found in the local CN's status. For example, there are situations where an IP is already part of spec.ipam.pool (including the resource/ENI where the IP comes from), while the corresponding ENI is not part of status.eni.enis yet. If that is the case, the IP would be allocated (e.g. by allocateNext) and then marked as allocated (via a.markAllocated). Shortly after that, a.buildAllocationResult() would fail and then NOT undo the changes done by a.markAllocated(). This will then result in the IP never being freed up again. At the same time, kubelet will keep scheduling PODs onto the same node without knowing that IPs run out and thus causing new PODs to never get an IP. Why exactly this inconsistency between the spec and the status arise is a different topic and should maybe be investigated further. This commit/PR fixes this issue by simply moving a.markAllocated() after the a.buildAllocationResult() result, so that the function is bailed out early enough. Some additional info on how I encountered this issue and maybe how to reproduce it. We have a cluster running that does automatic downscaling of all deployments at night and then relies on cluster-autoscaler to also shut down nodes. Next morning, all deployments are upscaled again, causing cluster-autoscaler to also start many nodes at once. This causes many nodes to appear in k8s at the same time, all being `NotReady` at the beginning. Cilium agents are then started on each node. When cilium agents start to get ready, the node are also marked `Ready`, causing the k8s scheduler to immediately schedule dozens of PODs onto the `Ready` nodes, long before cilium-operator had a chance to attach new ENIs and IPs to the fresh nodes. This means that all PODs scheduled to the fresh nodes run into a temporary state where the CNI plugin reports that there are no more IPs available. All this is expected and normal until this point. After a few seconds, cilium-operator finishes attaching new ENIs to the fresh nodes and then tries to update the CN. The update to the spec.pool seems to be successful then, causing the agent to allocate the IP. But as the update to the status seems to fail, the agent then bails out with the IP being marked as used and thus causing the leak. This is only happening with very high load on the apiserver. At the same time, I can observe errors like these happening in cilium-operator: ``` level=warning msg="Failed to update CiliumNode" attempt=1 error="Operation cannot be fulfilled on ciliumnodes.cilium.io \"ip-100-66-62-168.eu-central-1.compute.internal\": the object has been modified; please apply your changes to the latest version and try again" instanceID=i-009466ca3d82a1ec0 name=ip-100-66-62-168.eu-central-1.compute.internal subsys=ipam updateStatus=true ``` Please note the `attempt=1` in the log line, it indicates that the first attempt also failed and that no further attempt is done (looking at the many `for retry := 0; retry < 2; retry++` loops found in the code). I assume (without 100% knowing) that this is the reason for the inconsistency in spec vs status. Signed-off-by: Alexander Block <ablock84@gmail.com>

twpayne

Very nice, thank you :)

twpayne · 2022-01-04T14:06:42Z

/test

Job 'Cilium-PR-K8s-GKE' hit: #17270 (93.27% similarity)

christarazi · 2022-01-04T21:19:11Z

Marked ready to merge as the CI failures are unrelated and are flakes. The L4LB test has been broken for some time due to infra issues.

joestringer · 2022-01-19T00:28:58Z

CC @twpayne , this seems like it should be marked as a "bug" in the release notes rather than "minor". "Minor" would be anything the user should know about that isn't a bug.

codablock requested review from a team and twpayne January 3, 2022 10:40

maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 3, 2022

maintainer-s-little-helper bot assigned twpayne Jan 3, 2022

codablock mentioned this pull request Jan 3, 2022

Cilium agent can crash during startup if ENI is not attached yet #16494

Closed

christarazi requested changes Jan 3, 2022

View reviewed changes

pkg/ipam/crd.go Outdated Show resolved Hide resolved

codablock force-pushed the fix-ip-leak branch from d9739ee to 514f590 Compare January 3, 2022 23:25

christarazi approved these changes Jan 3, 2022

View reviewed changes

codablock mentioned this pull request Jan 4, 2022

Optimistic locking of CNs causes failures in updates on high load #18366

Open

codablock force-pushed the fix-ip-leak branch from 514f590 to 8e848ce Compare January 4, 2022 08:34

jaffcheng approved these changes Jan 4, 2022

View reviewed changes

twpayne added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Jan 4, 2022

maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 4, 2022

twpayne added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. needs-backport/1.11 labels Jan 4, 2022

maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Jan 4, 2022

twpayne approved these changes Jan 4, 2022

View reviewed changes

maintainer-s-little-helper bot unassigned twpayne Jan 4, 2022

twpayne added the needs-backport/1.10 label Jan 4, 2022

maintainer-s-little-helper bot added this to Needs backport from master in 1.10.7 Jan 4, 2022

maintainer-s-little-helper bot mentioned this pull request Jan 4, 2022

CI: K8sIstioTest Istio Bookinfo Demo Tests bookinfo inter-service connectivity #17270

Closed

christarazi added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 4, 2022

christarazi merged commit aea1b9f into cilium:master Jan 4, 2022

codablock deleted the fix-ip-leak branch January 4, 2022 21:37

codablock mentioned this pull request Jan 5, 2022

Don't hold idMDMU lock while calling UpdateIdentities from InjectLabels #18343

Merged

pchaigno mentioned this pull request Jan 10, 2022

v1.11 backports 2022-01-10 #18418

Merged

pchaigno added backport-pending/1.11 and removed needs-backport/1.11 labels Jan 10, 2022

pchaigno mentioned this pull request Jan 11, 2022

v1.10 backports 2022-01-11 #18445

Merged

aditighag mentioned this pull request Jan 14, 2022

v1.10 backports 2022-01-14 #18487

Merged

aditighag added backport-pending/1.10 and removed needs-backport/1.10 labels Jan 14, 2022

maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.10 in 1.10.7 Jan 14, 2022

joestringer added backport-done/1.10 and removed backport-pending/1.10 labels Jan 18, 2022

maintainer-s-little-helper bot moved this from Backport pending to v1.10 to Backport done to v1.10 in 1.10.7 Jan 18, 2022

joestringer added backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. and removed backport-pending/1.11 labels Jan 18, 2022

This was referenced Jan 18, 2022

Prepare for release v1.10.7 #18534

Merged

Prepare for release v1.11.1 #18535

Merged

joestringer added release-note/bug This PR fixes an issue in a previous release of Cilium. and removed release-note/minor This PR changes functionality that users may find relevant to operating Cilium. labels Jan 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix possible IP leak in case ENI's are not present in the CN yet #18352

Fix possible IP leak in case ENI's are not present in the CN yet #18352

codablock commented Jan 3, 2022

codablock commented Jan 3, 2022

codablock commented Jan 3, 2022 •

edited

christarazi left a comment

christarazi commented Jan 3, 2022

christarazi commented Jan 3, 2022

codablock commented Jan 3, 2022 •

edited

christarazi left a comment

codablock commented Jan 4, 2022

twpayne left a comment

twpayne commented Jan 4, 2022 •

edited by maintainer-s-little-helper bot

christarazi commented Jan 4, 2022

joestringer commented Jan 19, 2022

Fix possible IP leak in case ENI's are not present in the CN yet #18352

Fix possible IP leak in case ENI's are not present in the CN yet #18352

Conversation

codablock commented Jan 3, 2022

codablock commented Jan 3, 2022

codablock commented Jan 3, 2022 • edited

christarazi left a comment

Choose a reason for hiding this comment

christarazi commented Jan 3, 2022

christarazi commented Jan 3, 2022

codablock commented Jan 3, 2022 • edited

christarazi left a comment

Choose a reason for hiding this comment

codablock commented Jan 4, 2022

twpayne left a comment

Choose a reason for hiding this comment

twpayne commented Jan 4, 2022 • edited by maintainer-s-little-helper bot

christarazi commented Jan 4, 2022

joestringer commented Jan 19, 2022

codablock commented Jan 3, 2022 •

edited

codablock commented Jan 3, 2022 •

edited

twpayne commented Jan 4, 2022 •

edited by maintainer-s-little-helper bot