Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In the iPam mode of CRD, the pod may allocate an IP that has already been released #32531

Open
2 of 3 tasks
weiwei02 opened this issue May 14, 2024 · 1 comment
Open
2 of 3 tasks
Labels
area/ipam Impacts IP address management functionality. kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. sig/agent Cilium agent related.

Comments

@weiwei02
Copy link

weiwei02 commented May 14, 2024

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

When using IPAM in CRD mode, I encountered an issue where the IP address already assigned to the pod was marked as released. After analysis, it was found that there are bugs in the current version of Cilium.

The steps and related codes to reproduce the issue are as follows:

  1. The operator triggered 'PrepareIPRelease()' and marked an IP in 'CiliumNode. status. release.ips' as 'marked-for-release'. node.go:L789
  2. The agent received a new 'CiliumNode' update event and triggered the 'updateLocalNodeResource()` function. allocator.go#L196
  3. Concurrent with 2., create a new pod and allocate a new IP.
  4. Finally, the newly allocated IP for the third part will be marked as' ready-for-release 'in the 2. step, and the IP address release will be completed. The IP address of the Pod that has already been running was released..

The detailed code related to the problem is as follows:

cilium/pkg/ipam/crd.go

Lines 428 to 443 in 73b3704

// Some functions like crdAllocator.Allocate() acquire lock on allocator first and then on nodeStore.
// So release nodestore lock before acquiring allocator lock to avoid potential deadlocks from inconsistent
// lock ordering.
n.mutex.Unlock()
allocator.mutex.Lock()
_, ok := allocator.allocated[ip]
allocator.mutex.Unlock()
n.mutex.Lock()
if ok {
// IP still in use, update the operator to stop releasing the IP.
n.ownNode.Status.IPAM.ReleaseIPs[ip] = ipamOption.IPAMDoNotRelease
} else {
n.ownNode.Status.IPAM.ReleaseIPs[ip] = ipamOption.IPAMReadyForRelease
}
releaseUpstreamSyncNeeded = true

Cilium Version

1.14.6

Kernel Version

5.15.54-051554-generic

Kubernetes Version

1.24.4

Regression

No response

Sysdump

No response

Relevant log output

No response

Anything else?

No response

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct
@weiwei02 weiwei02 added kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels May 14, 2024
@squeed
Copy link
Contributor

squeed commented May 16, 2024

Thanks for the comment; this certainly looks like an issue.

@squeed squeed added sig/agent Cilium agent related. area/ipam Impacts IP address management functionality. needs/triage This issue requires triaging to establish severity and next steps. and removed needs/triage This issue requires triaging to establish severity and next steps. labels May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ipam Impacts IP address management functionality. kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. sig/agent Cilium agent related.
Projects
None yet
Development

No branches or pull requests

2 participants