-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
azure/ipam: panic in Node.ResyncInterfacesAndIPs #26222
Labels
area/azure
Impacts Azure based IPAM.
area/ipam
Impacts IP address management functionality.
kind/bug/CI
This is a bug in the testing code.
release-blocker/1.14
This issue will prevent the release of the next version of Cilium.
sig/agent
Cilium agent related.
Comments
lmb
added
kind/bug/CI
This is a bug in the testing code.
area/azure
Impacts Azure based IPAM.
area/ipam
Impacts IP address management functionality.
labels
Jun 14, 2023
joestringer
added
sig/agent
Cilium agent related.
release-blocker/1.14
This issue will prevent the release of the next version of Cilium.
labels
Jun 15, 2023
Does this need to be a release blocker? This seems to be a flake which exists since 2020 #11785 (comment) |
Context: the bug in that original code was supposedly fixed, it also was in a different line (something to do with logger). See #11786 |
tommyp1ckles
added a commit
to tommyp1ckles/cilium
that referenced
this issue
Jul 18, 2023
A race condition between when all the resync triggers are setup for an upserted CiliumNode and when the k8s node object is emplaced can cause a crash. Specifically, this can arise between when then ipam nodemanager lock is released and when the call to UpdatedResources occurs. In CI was likely caused by the "ipam-node-interval-refresh" invoking a resync which resulted in a panic in the ResyncInterfacesAndIPs function. While testing this I was able to cause other, similar, panics by delaying the UpdatedResource call, so this should fix a class of potential crashes. Fixes: cilium#26222 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
aditighag
pushed a commit
that referenced
this issue
Jul 19, 2023
A race condition between when all the resync triggers are setup for an upserted CiliumNode and when the k8s node object is emplaced can cause a crash. Specifically, this can arise between when then ipam nodemanager lock is released and when the call to UpdatedResources occurs. In CI was likely caused by the "ipam-node-interval-refresh" invoking a resync which resulted in a panic in the ResyncInterfacesAndIPs function. While testing this I was able to cause other, similar, panics by delaying the UpdatedResource call, so this should fix a class of potential crashes. Fixes: #26222 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
nbusseneau
pushed a commit
that referenced
this issue
Jul 24, 2023
[ upstream commit 3521998 ] A race condition between when all the resync triggers are setup for an upserted CiliumNode and when the k8s node object is emplaced can cause a crash. Specifically, this can arise between when then ipam nodemanager lock is released and when the call to UpdatedResources occurs. In CI was likely caused by the "ipam-node-interval-refresh" invoking a resync which resulted in a panic in the ResyncInterfacesAndIPs function. While testing this I was able to cause other, similar, panics by delaying the UpdatedResource call, so this should fix a class of potential crashes. Fixes: #26222 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
aanm
pushed a commit
that referenced
this issue
Jul 25, 2023
[ upstream commit 3521998 ] A race condition between when all the resync triggers are setup for an upserted CiliumNode and when the k8s node object is emplaced can cause a crash. Specifically, this can arise between when then ipam nodemanager lock is released and when the call to UpdatedResources occurs. In CI was likely caused by the "ipam-node-interval-refresh" invoking a resync which resulted in a panic in the ResyncInterfacesAndIPs function. While testing this I was able to cause other, similar, panics by delaying the UpdatedResource call, so this should fix a class of potential crashes. Fixes: #26222 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
ldelossa
pushed a commit
to ldelossa/cilium
that referenced
this issue
Sep 27, 2023
[ upstream commit 3521998 ] A race condition between when all the resync triggers are setup for an upserted CiliumNode and when the k8s node object is emplaced can cause a crash. Specifically, this can arise between when then ipam nodemanager lock is released and when the call to UpdatedResources occurs. In CI was likely caused by the "ipam-node-interval-refresh" invoking a resync which resulted in a panic in the ResyncInterfacesAndIPs function. While testing this I was able to cause other, similar, panics by delaying the UpdatedResource call, so this should fix a class of potential crashes. Fixes: cilium#26222 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
gandro
pushed a commit
to gandro/cilium
that referenced
this issue
Dec 7, 2023
A race condition between when all the resync triggers are setup for an upserted CiliumNode and when the k8s node object is emplaced can cause a crash. Specifically, this can arise between when then ipam nodemanager lock is released and when the call to UpdatedResources occurs. In CI was likely caused by the "ipam-node-interval-refresh" invoking a resync which resulted in a panic in the ResyncInterfacesAndIPs function. While testing this I was able to cause other, similar, panics by delaying the UpdatedResource call, so this should fix a class of potential crashes. Fixes: cilium#26222 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/azure
Impacts Azure based IPAM.
area/ipam
Impacts IP address management functionality.
kind/bug/CI
This is a bug in the testing code.
release-blocker/1.14
This issue will prevent the release of the next version of Cilium.
sig/agent
Cilium agent related.
https://github.com/cilium/cilium/actions/runs/5265497133/jobs/9518338900?pr=25684
Originally posted by @lmb in #11785 (comment)
The text was updated successfully, but these errors were encountered: