New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.7 Backport ENI data race fixes #11766
v1.7 Backport ENI data race fixes #11766
Conversation
[ upstream commit 699a8a5 ] This fixes data races found when running the unit-tests with `-race`. The mutex is to protect access to `n.enis`. ``` WARNING: DATA RACE Write at 0x00c0007b40e0 by goroutine 135: github.com/cilium/cilium/pkg/aws/eni.(*Node).ResyncInterfacesAndIPs() /home/chris/code/cilium/cilium/pkg/aws/eni/node.go:430 +0xfa github.com/cilium/cilium/pkg/ipam.(*Node).recalculate() /home/chris/code/cilium/cilium/pkg/ipam/node.go:352 +0xfb github.com/cilium/cilium/pkg/ipam.(*NodeManager).resyncNode() /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:342 +0x7a github.com/cilium/cilium/pkg/ipam.(*NodeManager).Resync.func1() /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:389 +0x88 Previous read at 0x00c0007b40e0 by goroutine 46: github.com/cilium/cilium/pkg/aws/eni.(*Node).findNextIndex() /home/chris/code/cilium/cilium/pkg/aws/eni/node.go:292 +0x86 github.com/cilium/cilium/pkg/aws/eni.(*Node).CreateInterface() /home/chris/code/cilium/cilium/pkg/aws/eni/node.go:339 +0x584 github.com/cilium/cilium/pkg/ipam.(*Node).createInterface() /home/chris/code/cilium/cilium/pkg/ipam/node.go:435 +0x290 github.com/cilium/cilium/pkg/ipam.(*Node).maintainIPPool() /home/chris/code/cilium/cilium/pkg/ipam/node.go:628 +0x85d github.com/cilium/cilium/pkg/ipam.(*Node).MaintainIPPool() /home/chris/code/cilium/cilium/pkg/ipam/node.go:663 +0x82 github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update.func1() /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:241 +0x8b github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter() /home/chris/code/cilium/cilium/pkg/trigger/trigger.go:206 +0x4b9 Goroutine 135 (running) created at: github.com/cilium/cilium/pkg/ipam.(*NodeManager).Resync() /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:388 +0x2c5 github.com/cilium/cilium/pkg/ipam.NewNodeManager.func1() /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:168 +0x101 github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter() /home/chris/code/cilium/cilium/pkg/trigger/trigger.go:206 +0x4b9 Goroutine 46 (running) created at: github.com/cilium/cilium/pkg/trigger.NewTrigger() /home/chris/code/cilium/cilium/pkg/trigger/trigger.go:129 +0x23d github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update() /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:236 +0x523 github.com/cilium/cilium/pkg/aws/eni.(*ENISuite).TestNodeManagerManyNodes() /home/chris/code/cilium/cilium/pkg/aws/eni/node_manager_test.go:593 +0x80a runtime.call32() /usr/lib/go/src/runtime/asm_amd64.s:539 +0x3a reflect.Value.Call() /usr/lib/go/src/reflect/value.go:321 +0xd3 gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:781 +0xa0a gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9 ``` ``` WARNING: DATA RACE Write at 0x00c000110060 by goroutine 94: github.com/cilium/cilium/pkg/aws/eni.(*Node).ResyncInterfacesAndIPs() /home/chris/code/cilium/cilium/pkg/aws/eni/node.go:439 +0xfa github.com/cilium/cilium/pkg/ipam.(*Node).recalculate() /home/chris/code/cilium/cilium/pkg/ipam/node.go:352 +0xfb github.com/cilium/cilium/pkg/ipam.(*NodeManager).resyncNode() /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:342 +0x7a github.com/cilium/cilium/pkg/ipam.(*NodeManager).Resync.func1() /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:389 +0x88 Previous read at 0x00c000110060 by goroutine 92: github.com/cilium/cilium/pkg/aws/eni.(*Node).PrepareIPAllocation() /home/chris/code/cilium/cilium/pkg/aws/eni/node.go:211 +0xefc github.com/cilium/cilium/pkg/ipam.(*Node).determineMaintenanceAction() /home/chris/code/cilium/cilium/pkg/ipam/node.go:542 +0x184 github.com/cilium/cilium/pkg/ipam.(*Node).maintainIPPool() /home/chris/code/cilium/cilium/pkg/ipam/node.go:578 +0x53 github.com/cilium/cilium/pkg/ipam.(*Node).MaintainIPPool() /home/chris/code/cilium/cilium/pkg/ipam/node.go:663 +0x82 github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update.func1() /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:241 +0x8b github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter() /home/chris/code/cilium/cilium/pkg/trigger/trigger.go:206 +0x4b9 Goroutine 94 (running) created at: github.com/cilium/cilium/pkg/ipam.(*NodeManager).Resync() /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:388 +0x2c5 github.com/cilium/cilium/pkg/ipam.NewNodeManager.func1() /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:168 +0x101 github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter() /home/chris/code/cilium/cilium/pkg/trigger/trigger.go:206 +0x4b9 Goroutine 92 (running) created at: github.com/cilium/cilium/pkg/trigger.NewTrigger() /home/chris/code/cilium/cilium/pkg/trigger/trigger.go:129 +0x23d github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update() /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:236 +0x523 github.com/cilium/cilium/pkg/aws/eni.(*ENISuite).TestNodeManagerManyNodes() /home/chris/code/cilium/cilium/pkg/aws/eni/node_manager_test.go:593 +0x80a runtime.call32() /usr/lib/go/src/runtime/asm_amd64.s:539 +0x3a reflect.Value.Call() /usr/lib/go/src/reflect/value.go:321 +0xd3 gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:781 +0xa0a gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9 ``` Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 9168268 ] In this test suite, the `metricsapi` is global variable which is shared among all the `Test*` functions. It is possible that it becomes polluted over time during test execution. This is an attempt to resolve the following: ``` FAIL: node_manager_test.go:563: ENISuite.TestNodeManagerManyNodes node_manager_test.go:602: c.Errorf("Node %s allocation mismatch. expected: %d allocated: %d", s.name, minAllocate, node.Stats().AvailableIPs) ... Error: Node node53 allocation mismatch. expected: 10 allocated: 18 node_manager_test.go:602: c.Errorf("Node %s allocation mismatch. expected: %d allocated: %d", s.name, minAllocate, node.Stats().AvailableIPs) ... Error: Node node59 allocation mismatch. expected: 10 allocated: 18 node_manager_test.go:617: c.Assert(metricsapi.AllocatedIPs("available"), check.Equals, numNodes*minAllocate) ... obtained int = 1016 ... expected int = 1000 ... OOPS: 17 passed, 1 FAILED Signed-off-by: Chris Tarazi <chris@isovalent.com>
never-tell-me-the-odds |
never-tell-me-the-odds |
4d6c50b
to
a7ceb7e
Compare
never-tell-me-the-odds |
Fixes: ``` WARNING: DATA RACE Write at 0x00c0003e4450 by goroutine 81: github.com/cilium/cilium/pkg/aws/eni.(*Node).recalculateLocked() /home/chris/code/cilium/cilium-backports/pkg/aws/eni/node.go:235 +0x75 github.com/cilium/cilium/pkg/aws/eni.(*NodeManager).resyncNode() /home/chris/code/cilium/cilium-backports/pkg/aws/eni/node_manager.go:261 +0xeb github.com/cilium/cilium/pkg/aws/eni.(*NodeManager).Resync.func1() /home/chris/code/cilium/cilium-backports/pkg/aws/eni/node_manager.go:309 +0x88 Previous read at 0x00c0003e4450 by goroutine 123: github.com/cilium/cilium/pkg/aws/eni.(*Node).maintainIpPool() /home/chris/code/cilium/cilium-backports/pkg/aws/eni/node.go:736 +0xa74 github.com/cilium/cilium/pkg/aws/eni.(*Node).MaintainIpPool() /home/chris/code/cilium/cilium-backports/pkg/aws/eni/node.go:776 +0x90 github.com/cilium/cilium/pkg/aws/eni.(*NodeManager).Update.func1() /home/chris/code/cilium/cilium-backports/pkg/aws/eni/node_manager.go:154 +0x8b github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter() /home/chris/code/cilium/cilium-backports/pkg/trigger/trigger.go:210 +0x4b9 Goroutine 81 (running) created at: github.com/cilium/cilium/pkg/aws/eni.(*NodeManager).Resync() /home/chris/code/cilium/cilium-backports/pkg/aws/eni/node_manager.go:308 +0x27c github.com/cilium/cilium/pkg/aws/eni.NewNodeManager.func1() /home/chris/code/cilium/cilium-backports/pkg/aws/eni/node_manager.go:111 +0x101 github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter() /home/chris/code/cilium/cilium-backports/pkg/trigger/trigger.go:210 +0x4b9 Goroutine 123 (running) created at: github.com/cilium/cilium/pkg/trigger.NewTrigger() /home/chris/code/cilium/cilium-backports/pkg/trigger/trigger.go:133 +0x23d github.com/cilium/cilium/pkg/aws/eni.(*NodeManager).Update() /home/chris/code/cilium/cilium-backports/pkg/aws/eni/node_manager.go:149 +0x482 github.com/cilium/cilium/pkg/aws/eni.(*ENISuite).TestNodeManagerManyNodes() /home/chris/code/cilium/cilium-backports/pkg/aws/eni/node_manager_test.go:579 +0x9c8 runtime.call32() /usr/lib/go/src/runtime/asm_amd64.s:539 +0x3a reflect.Value.Call() /usr/lib/go/src/reflect/value.go:321 +0xd3 gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1() /home/chris/go/pkg/mod/gopkg.in/check.v1@v1.0.0-20180628173108-788fd7840127/check.go:781 +0xa0a gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1() /home/chris/go/pkg/mod/gopkg.in/check.v1@v1.0.0-20180628173108-788fd7840127/check.go:675 +0xd9 ``` Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit cac8d0d ] Fixes: ``` WARNING: DATA RACE Write at 0x00c0005b0750 by goroutine 308: runtime.mapassign_faststr() /usr/local/go/src/runtime/map_faststr.go:202 +0x0 github.com/cilium/cilium/pkg/aws/eni.UpdateLimitsFromUserDefinedMappings() /home/vagrant/go/src/github.com/cilium/cilium/pkg/aws/eni/limits.go:269 +0xdf github.com/cilium/cilium/pkg/aws/eni.(*ENISuite).TestUpdateLimitsFromUserDefinedMappings() /home/vagrant/go/src/github.com/cilium/cilium/pkg/aws/eni/limits_test.go:47 +0x11d runtime.call32() /usr/local/go/src/runtime/asm_amd64.s:539 +0x3a reflect.Value.Call() /usr/local/go/src/reflect/value.go:321 +0xd3 gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1() /home/vagrant/go/src/github.com/cilium/cilium/vendor/gopkg.in/check.v1/check.go:781 +0xa0a gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1() /home/vagrant/go/src/github.com/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9 ``` Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com>
This fixes a potential data race as `n.resource` is a live pointer to an object. Fixes: 06bce43 ("aws/eni: Fix race condition leading to overaggressive ENI allocation") Signed-off-by: Chris Tarazi <chris@isovalent.com>
a7ceb7e
to
43253d0
Compare
never-tell-me-the-odds |
test-missed-k8s Edit: hit known flake #10442 |
restart-ginkgo |
restart-ginkgo |
test-missed-k8s Edit: hit known flake #10442 |
test-missed-k8s |
test-focus K8sPolicyTest Basic Test Redirects traffic to proxy when no policy is applied with proxy-visibility annotation Tests DNS proxy visibility without policy |
The focused test passed, which was the same test that failed in |
Changes look good. Given these changes are in the ENI area, CI (other than unit testing) is not likely to provide us a lot of signal. For reference, did you perform any manual validation on an EKS cluster? I'm thinking this is probably good to merge assuming it passes basic validation in a real environment. |
Just deployed a quick sanity check cluster and it seems to be fine. Deployed the connectivity check and it was all good. Weirdly enough, As far as this PR is considered, I think it's good to go. |
The fixes in this PR do not attempt to fix all data races within the code, as
many of them are not "crucial". These data races come from test code. For
example, the test code may access an internal field within the
Node
struct,but an equivalent access from within the implementation has a mutex protecting
it.
The fixes landing in this PR are around "crucial" data structures like the
Node.enis
, a map containing all references to ENIs in a node. These fixesattempt to mitigate any potential upgrade regression from older versions. Note,
the 1.6 tree contains may of the same data races as the code difference between
this tree (1.7) and 1.6 is not significant. The code difference likely did not
include new data races. This means that these data races have been around
since 1.6, at least.
This PR is a partial backport of the two PRs below:
#11685
#10587
Commits from #11685:
metricsapi
Commits from #10587:
This PR also contains two new commits which are fixes for (potential) data races that only
exists in this 1.7 tree: