Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Azure-related data races #17054

Merged

Conversation

christarazi
Copy link
Member

@christarazi christarazi commented Aug 4, 2021

  • azure/api/mock: Add (*API).updateInstancesLocked helper
  • azure/api/mock: Make copy of instance map to avoid data races
  • azure: Set interface ID upfront in a dedicated method

Fixes: #13619

@christarazi christarazi added area/CI Continuous Integration testing issue or flake area/CI-improvement Topic or proposal to improve the Continuous Integration workflow release-note/ci This PR makes changes to the CI. kind/enhancement This would improve or streamline existing functionality. area/azure Impacts Azure based IPAM. labels Aug 4, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Aug 4, 2021
@christarazi christarazi changed the title pr/christarazi/azure unit test race Fix Azure-related data races Aug 4, 2021
@christarazi christarazi force-pushed the pr/christarazi/azure-unit-test-race branch from e6e1390 to 04d6afe Compare August 4, 2021 20:15
@christarazi
Copy link
Member Author

ci-aks

This will be utilized in the next commit, as we need to update the
instances inside (*API).AssignPrivateIpAddressesVMSS().

Signed-off-by: Chris Tarazi <chris@isovalent.com>
This fixes the following data race:
```
WARNING: DATA RACE
Write at 0x00c000280ba0 by goroutine 47:
  github.com/cilium/cilium/pkg/azure/api/mock.(*API).AssignPrivateIpAddressesVMSS.func1()
      /home/chris/code/cilium/cilium/pkg/azure/api/mock/mock.go:214 +0x406
  github.com/cilium/cilium/pkg/ipam/types.foreachInterface()
      /home/chris/code/cilium/cilium/pkg/ipam/types/types.go:356 +0x199
  github.com/cilium/cilium/pkg/ipam/types.(*InstanceMap).ForeachInterface()
      /home/chris/code/cilium/cilium/pkg/ipam/types/types.go:383 +0x33c
  github.com/cilium/cilium/pkg/azure/api/mock.(*API).AssignPrivateIpAddressesVMSS()
      /home/chris/code/cilium/cilium/pkg/azure/api/mock/mock.go:190 +0x267
  github.com/cilium/cilium/pkg/azure/ipam.(*Node).AllocateIPs()
      /home/chris/code/cilium/cilium/pkg/azure/ipam/node.go:139 +0x4b3
  github.com/cilium/cilium/pkg/ipam.(*Node).maintainIPPool()
      /home/chris/code/cilium/cilium/pkg/ipam/node.go:609 +0xa41
  github.com/cilium/cilium/pkg/ipam.(*Node).MaintainIPPool()
      /home/chris/code/cilium/cilium/pkg/ipam/node.go:667 +0xd0
  github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update.func2()
      /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:271 +0xab
  github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter()
      /home/chris/code/cilium/cilium/pkg/trigger/trigger.go:198 +0x53a

Previous read at 0x00c000280ba0 by goroutine 86:
  github.com/cilium/cilium/pkg/azure/types.(*AzureInterface).ForeachAddress()
      /home/chris/code/cilium/cilium/pkg/azure/types/types.go:184 +0x5b
  github.com/cilium/cilium/pkg/ipam/types.foreachAddress()
      /home/chris/code/cilium/cilium/pkg/ipam/types/types.go:315 +0x136
  github.com/cilium/cilium/pkg/ipam/types.(*InstanceMap).ForeachAddress()
      /home/chris/code/cilium/cilium/pkg/ipam/types/types.go:337 +0x18e
  github.com/cilium/cilium/pkg/azure/ipam.(*Node).ResyncInterfacesAndIPs()
      /home/chris/code/cilium/cilium/pkg/azure/ipam/node.go:159 +0x2f2
  github.com/cilium/cilium/pkg/ipam.(*Node).recalculate()
      /home/chris/code/cilium/cilium/pkg/ipam/node.go:347 +0x102
  github.com/cilium/cilium/pkg/ipam.(*NodeManager).resyncNode()
      /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:382 +0x8d
  github.com/cilium/cilium/pkg/ipam.(*NodeManager).Resync.func1()
      /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:429 +0x88

Goroutine 47 (running) created at:
  github.com/cilium/cilium/pkg/trigger.NewTrigger()
      /home/chris/code/cilium/cilium/pkg/trigger/trigger.go:119 +0x264
  github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update()
      /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:266 +0x630
  github.com/cilium/cilium/pkg/azure/ipam.(*IPAMSuite).TestIpamManyNodes()
      /home/chris/code/cilium/cilium/pkg/azure/ipam/ipam_test.go:338 +0xe25
  runtime.call16()
      /usr/lib/go/src/runtime/asm_amd64.s:550 +0x3d
  reflect.Value.Call()
      /usr/lib/go/src/reflect/value.go:337 +0xd8
  gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1()
      /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:775 +0xb3b
  gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1()
      /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:669 +0xe1

Goroutine 86 (running) created at:
  github.com/cilium/cilium/pkg/ipam.(*NodeManager).Resync()
      /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:428 +0x2e4
  github.com/cilium/cilium/pkg/ipam.NewNodeManager.func1()
      /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:162 +0x104
  github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter()
      /home/chris/code/cilium/cilium/pkg/trigger/trigger.go:198 +0x53a
```

Signed-off-by: Chris Tarazi <chris@isovalent.com>
This fixes a data race[1].

The reason this data race existed is because we were writing to the
AzureInterface object in methods that were, at face value, getter
methods. Since the methods are pointer receivers, we were modifying
live data. There should be no reason why we write into the object's
fields in a getter method, when we can do that upfront. Specifically,
when the AzureInterface ID field is set, other fields that are
unexported are also populated. Previously, they were populated upon
their respective getters, which caused the data race.

Note, it is worth calling out that whenever a new AzureInterface object
is created, it is important that the caller also calls
(*AzureInterface).SetID() as well to set the other unexported fields.

[1]:
```
WARNING: DATA RACE
Read at 0x00c000489358 by goroutine 62:
  github.com/cilium/cilium/pkg/azure/types.(*AzureInterface).GetVMID()
      /home/chris/code/cilium/cilium/pkg/azure/types/types.go:178 +0x14a
  github.com/cilium/cilium/pkg/azure/ipam.(*Node).AllocateIPs()
      /home/chris/code/cilium/cilium/pkg/azure/ipam/node.go:139 +0x1d7
  github.com/cilium/cilium/pkg/ipam.(*Node).maintainIPPool()
      /home/chris/code/cilium/cilium/pkg/ipam/node.go:609 +0xa41
  github.com/cilium/cilium/pkg/ipam.(*Node).MaintainIPPool()
      /home/chris/code/cilium/cilium/pkg/ipam/node.go:667 +0xd0
  github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update.func2()
      /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:271 +0xab
  github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter()
      /home/chris/code/cilium/cilium/pkg/trigger/trigger.go:198 +0x53a

Previous write at 0x00c000489358 by goroutine 46:
  github.com/cilium/cilium/pkg/azure/types.(*AzureInterface).extractIDs()
      /home/chris/code/cilium/cilium/pkg/azure/types/types.go:148 +0x168
  github.com/cilium/cilium/pkg/azure/types.(*AzureInterface).GetVMID()
      /home/chris/code/cilium/cilium/pkg/azure/types/types.go:179 +0x6a4
  github.com/cilium/cilium/pkg/azure/api/mock.(*API).AssignPrivateIpAddressesVMSS.func1()
      /home/chris/code/cilium/cilium/pkg/azure/api/mock/mock.go:196 +0xcb
  github.com/cilium/cilium/pkg/ipam/types.foreachInterface()
      /home/chris/code/cilium/cilium/pkg/ipam/types/types.go:356 +0x199
  github.com/cilium/cilium/pkg/ipam/types.(*InstanceMap).ForeachInterface()
      /home/chris/code/cilium/cilium/pkg/ipam/types/types.go:383 +0x33c
  github.com/cilium/cilium/pkg/azure/api/mock.(*API).AssignPrivateIpAddressesVMSS()
      /home/chris/code/cilium/cilium/pkg/azure/api/mock/mock.go:190 +0x28a
  github.com/cilium/cilium/pkg/azure/ipam.(*Node).AllocateIPs()
      /home/chris/code/cilium/cilium/pkg/azure/ipam/node.go:139 +0x37c
  github.com/cilium/cilium/pkg/ipam.(*Node).maintainIPPool()
      /home/chris/code/cilium/cilium/pkg/ipam/node.go:609 +0xa41
  github.com/cilium/cilium/pkg/ipam.(*Node).MaintainIPPool()
      /home/chris/code/cilium/cilium/pkg/ipam/node.go:667 +0xd0
  github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update.func2()
      /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:271 +0xab
  github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter()
      /home/chris/code/cilium/cilium/pkg/trigger/trigger.go:198 +0x53a

Goroutine 62 (running) created at:
  github.com/cilium/cilium/pkg/trigger.NewTrigger()
      /home/chris/code/cilium/cilium/pkg/trigger/trigger.go:119 +0x264
  github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update()
      /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:266 +0x630
  github.com/cilium/cilium/pkg/azure/ipam.(*IPAMSuite).TestIpamManyNodes()
      /home/chris/code/cilium/cilium/pkg/azure/ipam/ipam_test.go:334 +0xd45
  runtime.call16()
      /usr/lib/go/src/runtime/asm_amd64.s:550 +0x3d
  reflect.Value.Call()
      /usr/lib/go/src/reflect/value.go:337 +0xd8
  gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1()
      /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:775 +0xb3b
  gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1()
      /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:669 +0xe1

Goroutine 46 (running) created at:
  github.com/cilium/cilium/pkg/trigger.NewTrigger()
      /home/chris/code/cilium/cilium/pkg/trigger/trigger.go:119 +0x264
  github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update()
      /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:266 +0x630
  github.com/cilium/cilium/pkg/azure/ipam.(*IPAMSuite).TestIpamManyNodes()
      /home/chris/code/cilium/cilium/pkg/azure/ipam/ipam_test.go:334 +0xd45
  runtime.call16()
      /usr/lib/go/src/runtime/asm_amd64.s:550 +0x3d
  reflect.Value.Call()
      /usr/lib/go/src/reflect/value.go:337 +0xd8
  gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1()
      /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:775 +0xb3b
  gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1()
      /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:669 +0xe1
```

Signed-off-by: Chris Tarazi <chris@isovalent.com>
@christarazi christarazi force-pushed the pr/christarazi/azure-unit-test-race branch from 04d6afe to 0882040 Compare August 9, 2021 05:40
@christarazi
Copy link
Member Author

ci-aks

@tklauser tklauser merged commit d5016ef into cilium:master Aug 19, 2021
@christarazi christarazi deleted the pr/christarazi/azure-unit-test-race branch August 19, 2021 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/azure Impacts Azure based IPAM. area/CI Continuous Integration testing issue or flake area/CI-improvement Topic or proposal to improve the Continuous Integration workflow kind/bug/race-detector kind/enhancement This would improve or streamline existing functionality. release-note/ci This PR makes changes to the CI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RACE detector] data race in pkg/azure/types.(*AzureInterface).DeepCopyInto()
5 participants