Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/azure: fix potential deadlocks #13517

Merged
merged 2 commits into from
Oct 13, 2020
Merged

pkg/azure: fix potential deadlocks #13517

merged 2 commits into from
Oct 13, 2020

Conversation

aanm
Copy link
Member

@aanm aanm commented Oct 12, 2020

Read per commit, this fixes some potential deadlocks found

Fixes #13262

Fix 1 potential deadlock in Azure IPAM and 1 other in ENI and Azure IPAM

In case of an error the node manager mutex was never unlocked in case
of an error which would them cause a deadlock for this structure.

Fixes: 2eb51a3 ("eni: Refactor ENI IPAM into generic ipam.NodeManager")
Signed-off-by: André Martins <andre@cilium.io>
Fixes the following potential deadlock

```
POTENTIAL DEADLOCK: Inconsistent locking. saw this ordering in one goroutine:
happened before
pkg/lock/lock_debug.go:78 lock.(*internalRWMutex).RLock { i.RWMutex.RLock() } <<<<<
pkg/azure/ipam/node.go:162 ipam.(*Node).ResyncInterfacesAndIPs { n.manager.mutex.RLock() }
pkg/ipam/node.go:358 ipam.(*Node).recalculate { a, err := n.ops.ResyncInterfacesAndIPs(context.TODO(), scopedLog) }
pkg/ipam/node.go:334 ipam.(*Node).UpdatedResource { n.recalculate() }
pkg/ipam/node_manager.go:315 ipam.(*NodeManager).Update { return node.UpdatedResource(resource) }
pkg/azure/ipam/ipam_test.go:345 ipam.(*IPAMSuite).TestIpamManyNodes { mngr.Update(state[i].cn) }
~/.gimme/versions/go1.15.2.linux.amd64/src/reflect/value.go:475 reflect.Value.call { call(frametype, fn, args, uint32(frametype.size), uint32(retOffset)) }
~/.gimme/versions/go1.15.2.linux.amd64/src/reflect/value.go:336 reflect.Value.Call { return v.call("Call", in) }
pkg/../vendor/gopkg.in/check.v1/check.go:781 check%2ev1.(*suiteRunner).forkTest.func1 { c.method.Call([]reflect.Value{reflect.ValueOf(c)}) }
pkg/../vendor/gopkg.in/check.v1/check.go:675 check%2ev1.(*suiteRunner).forkCall.func1 { dispatcher(c) }
happened after
pkg/lock/lock_debug.go:78 lock.(*internalRWMutex).RLock { i.RWMutex.RLock() } <<<<<
pkg/ipam/types/types.go:340 types.(*InstanceMap).ForeachAddress { m.mutex.RLock() }
pkg/azure/ipam/node.go:164 ipam.(*Node).ResyncInterfacesAndIPs { n.manager.instances.ForeachAddress(n.node.InstanceID(), func(instanceID, interfaceID, ip, poolID string, addressObj ipamTypes.Address) error { }
pkg/ipam/node.go:358 ipam.(*Node).recalculate { a, err := n.ops.ResyncInterfacesAndIPs(context.TODO(), scopedLog) }
pkg/ipam/node.go:334 ipam.(*Node).UpdatedResource { n.recalculate() }
pkg/ipam/node_manager.go:315 ipam.(*NodeManager).Update { return node.UpdatedResource(resource) }
pkg/azure/ipam/ipam_test.go:345 ipam.(*IPAMSuite).TestIpamManyNodes { mngr.Update(state[i].cn) }
~/.gimme/versions/go1.15.2.linux.amd64/src/reflect/value.go:475 reflect.Value.call { call(frametype, fn, args, uint32(frametype.size), uint32(retOffset)) }
~/.gimme/versions/go1.15.2.linux.amd64/src/reflect/value.go:336 reflect.Value.Call { return v.call("Call", in) }
pkg/../vendor/gopkg.in/check.v1/check.go:781 check%2ev1.(*suiteRunner).forkTest.func1 { c.method.Call([]reflect.Value{reflect.ValueOf(c)}) }
pkg/../vendor/gopkg.in/check.v1/check.go:675 check%2ev1.(*suiteRunner).forkCall.func1 { dispatcher(c) }
in another goroutine: happened before
pkg/lock/lock_debug.go:78 lock.(*internalRWMutex).RLock { i.RWMutex.RLock() } <<<<<
pkg/ipam/types/types.go:381 types.(*InstanceMap).ForeachInterface { m.mutex.RLock() }
pkg/azure/ipam/node.go:74 ipam.(*Node).PrepareIPAllocation { err = n.manager.instances.ForeachInterface(n.node.InstanceID(), func(instanceID, interfaceID string, interfaceObj ipamTypes.InterfaceRevision) error { }
pkg/ipam/node.go:548 ipam.(*Node).determineMaintenanceAction { a.allocation, err = n.ops.PrepareIPAllocation(scopedLog) }
pkg/ipam/node.go:584 ipam.(*Node).maintainIPPool { a, err := n.determineMaintenanceAction() }
pkg/ipam/node.go:678 ipam.(*Node).MaintainIPPool { err := n.maintainIPPool(ctx) }
pkg/ipam/node_manager.go:272 ipam.(*NodeManager).Update.func1 { if err := node.MaintainIPPool(context.TODO()); err != nil { }
pkg/trigger/trigger.go:206 trigger.(*Trigger).waiter { t.params.TriggerFunc(reasons) }
happened after
pkg/lock/lock_debug.go:78 lock.(*internalRWMutex).RLock { i.RWMutex.RLock() } <<<<<
pkg/azure/ipam/instances.go:75 ipam.(*InstancesManager).FindSubnetForAllocation { m.mutex.RLock() }
pkg/azure/ipam/node.go:115 ipam.(*Node).PrepareIPAllocation.func1 { poolID, available := n.manager.FindSubnetForAllocation(preferredPoolIDs) }
pkg/ipam/types/types.go:364 types.foreachInterface { if err := fn(instanceID, rev.Resource.InterfaceID(), rev); err != nil { }
pkg/ipam/types/types.go:386 types.(*InstanceMap).ForeachInterface { return foreachInterface(instanceID, instance, fn) }
pkg/azure/ipam/node.go:74 ipam.(*Node).PrepareIPAllocation { err = n.manager.instances.ForeachInterface(n.node.InstanceID(), func(instanceID, interfaceID string, interfaceObj ipamTypes.InterfaceRevision) error { }
pkg/ipam/node.go:548 ipam.(*Node).determineMaintenanceAction { a.allocation, err = n.ops.PrepareIPAllocation(scopedLog) }
pkg/ipam/node.go:584 ipam.(*Node).maintainIPPool { a, err := n.determineMaintenanceAction() }
pkg/ipam/node.go:678 ipam.(*Node).MaintainIPPool { err := n.maintainIPPool(ctx) }
pkg/ipam/node_manager.go:272 ipam.(*NodeManager).Update.func1 { if err := node.MaintainIPPool(context.TODO()); err != nil { }
pkg/trigger/trigger.go:206 trigger.(*Trigger).waiter { t.params.TriggerFunc(reasons) }
```

Fixes: 24cb061 ("azure: Calculate available addresses based on subnet resource")
Signed-off-by: André Martins <andre@cilium.io>
@aanm aanm added area/eni Impacts ENI based IPAM. area/azure Impacts Azure based IPAM. kind/bug/race-detector labels Oct 12, 2020
@aanm aanm requested a review from a team as a code owner October 12, 2020 19:50
@aanm aanm requested a review from a team October 12, 2020 19:50
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 12, 2020
@aanm aanm added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. labels Oct 12, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 12, 2020
@aanm aanm removed the release-note/bug This PR fixes an issue in a previous release of Cilium. label Oct 12, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 12, 2020
@aanm aanm added release-note/bug This PR fixes an issue in a previous release of Cilium. needs-backport/1.8 and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Oct 12, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.8.5 Oct 12, 2020
@aanm
Copy link
Member Author

aanm commented Oct 12, 2020

test-me-please

@christarazi christarazi added sig/ipam IP address management, including cloud IPAM integration/cloud Related to integration with cloud environments such as AKS, EKS, GKE, etc. labels Oct 12, 2020
@aanm aanm merged commit 1662955 into cilium:master Oct 13, 2020
@aanm aanm deleted the pr/fix-13262 branch October 13, 2020 07:40
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.8 in 1.8.5 Oct 14, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.8 to Backport done to v1.8 in 1.8.5 Oct 15, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.8 to Backport done to v1.8 in 1.8.5 Oct 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/azure Impacts Azure based IPAM. area/eni Impacts ENI based IPAM. integration/cloud Related to integration with cloud environments such as AKS, EKS, GKE, etc. kind/bug/race-detector kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/ipam IP address management, including cloud IPAM
Projects
No open projects
1.8.5
Backport done to v1.8
Development

Successfully merging this pull request may close these issues.

Potential deadlock: Inconsistent locking in pkg/ipam/node
3 participants