Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/node: fix concurrent access of entry node #14591

Merged
merged 1 commit into from
Jan 14, 2021

Conversation

aanm
Copy link
Member

@aanm aanm commented Jan 12, 2021

Fixes the following data race:

Read at 0x00c00079f200 by goroutine 43:
  github.com/cilium/cilium/pkg/node/manager.(*Manager).StartNeighborRefresh.func1.1()
      /go/src/github.com/cilium/cilium/pkg/node/manager/manager.go:581 +0xc4

Previous write at 0x00c00079f200 by goroutine 166:
  github.com/cilium/cilium/pkg/node/manager.(*Manager).NodeUpdated()
      /go/src/github.com/cilium/cilium/pkg/node/manager/manager.go:414 +0x824
  github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).ciliumNodeInit.func2()
      /go/src/github.com/cilium/cilium/pkg/k8s/watchers/cilium_node.go:71 +0x2d4
  k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:238 +0xa5
  k8s.io/client-go/tools/cache.(*ResourceEventHandlerFuncs).OnUpdate()
      <autogenerated>:1 +0x2e
  github.com/cilium/cilium/pkg/k8s/informer.NewInformerWithStore.func1()
      /go/src/github.com/cilium/cilium/pkg/k8s/informer/informer.go:114 +0x389
  k8s.io/client-go/tools/cache.(*DeltaFIFO).Pop()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/delta_fifo.go:507 +0x50d
  k8s.io/client-go/tools/cache.(*controller).processLoop()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:183 +0x83
  k8s.io/client-go/tools/cache.(*controller).processLoop-fm()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:181 +0x44
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x75
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xba
  k8s.io/apimachinery/pkg/util/wait.JitterUntil()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x114
  k8s.io/apimachinery/pkg/util/wait.Until()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x507
  k8s.io/client-go/tools/cache.(*controller).Run()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:154 +0x4a9

Goroutine 43 (running) created at:
  github.com/cilium/cilium/pkg/node/manager.(*Manager).StartNeighborRefresh.func1()
      /go/src/github.com/cilium/cilium/pkg/node/manager/manager.go:578 +0x368
  github.com/cilium/cilium/pkg/controller.(*Controller).runController()
      /go/src/github.com/cilium/cilium/pkg/controller/controller.go:208 +0xd10

Goroutine 166 (running) created at:
  github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).ciliumNodeInit()
      /go/src/github.com/cilium/cilium/pkg/k8s/watchers/cilium_node.go:101 +0x184

Fixes: 5ec4d51 ("daemon, node: refresh neighbor by sending arping periodically")
Signed-off-by: André Martins andre@cilium.io

Fixes the following data race:

```
Read at 0x00c00079f200 by goroutine 43:
  github.com/cilium/cilium/pkg/node/manager.(*Manager).StartNeighborRefresh.func1.1()
      /go/src/github.com/cilium/cilium/pkg/node/manager/manager.go:581 +0xc4

Previous write at 0x00c00079f200 by goroutine 166:
  github.com/cilium/cilium/pkg/node/manager.(*Manager).NodeUpdated()
      /go/src/github.com/cilium/cilium/pkg/node/manager/manager.go:414 +0x824
  github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).ciliumNodeInit.func2()
      /go/src/github.com/cilium/cilium/pkg/k8s/watchers/cilium_node.go:71 +0x2d4
  k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:238 +0xa5
  k8s.io/client-go/tools/cache.(*ResourceEventHandlerFuncs).OnUpdate()
      <autogenerated>:1 +0x2e
  github.com/cilium/cilium/pkg/k8s/informer.NewInformerWithStore.func1()
      /go/src/github.com/cilium/cilium/pkg/k8s/informer/informer.go:114 +0x389
  k8s.io/client-go/tools/cache.(*DeltaFIFO).Pop()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/delta_fifo.go:507 +0x50d
  k8s.io/client-go/tools/cache.(*controller).processLoop()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:183 +0x83
  k8s.io/client-go/tools/cache.(*controller).processLoop-fm()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:181 +0x44
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x75
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xba
  k8s.io/apimachinery/pkg/util/wait.JitterUntil()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x114
  k8s.io/apimachinery/pkg/util/wait.Until()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x507
  k8s.io/client-go/tools/cache.(*controller).Run()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:154 +0x4a9

Goroutine 43 (running) created at:
  github.com/cilium/cilium/pkg/node/manager.(*Manager).StartNeighborRefresh.func1()
      /go/src/github.com/cilium/cilium/pkg/node/manager/manager.go:578 +0x368
  github.com/cilium/cilium/pkg/controller.(*Controller).runController()
      /go/src/github.com/cilium/cilium/pkg/controller/controller.go:208 +0xd10

Goroutine 166 (running) created at:
  github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).ciliumNodeInit()
      /go/src/github.com/cilium/cilium/pkg/k8s/watchers/cilium_node.go:101 +0x184
```

Fixes: 5ec4d51 ("daemon, node: refresh neighbor by sending arping periodically")
Signed-off-by: André Martins <andre@cilium.io>
@aanm aanm added kind/bug This is a bug in the Cilium logic. release-note/misc This PR makes changes that have no direct user impact. needs-backport/1.8 kind/bug/race-detector labels Jan 12, 2021
@aanm aanm requested a review from brb January 12, 2021 13:09
@aanm aanm requested a review from a team as a code owner January 12, 2021 13:09
@maintainer-s-little-helper maintainer-s-little-helper bot added this to In progress in 1.10.0 Jan 12, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.9.2 Jan 12, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.8.7 Jan 12, 2021
@aanm
Copy link
Member Author

aanm commented Jan 12, 2021

cc @jaffcheng

@aanm
Copy link
Member Author

aanm commented Jan 12, 2021

test-me-please

@aanm
Copy link
Member Author

aanm commented Jan 13, 2021

retest-gke

@joestringer
Copy link
Member

Hit #14608 , the test is not run in any of the other CI jobs. The test itself is known to fail occasionally but I could not find any evidence of the exact same issue. Re-running CI to see if it is a flake.

@joestringer
Copy link
Member

joestringer commented Jan 13, 2021

retest-gke

EDIT: Node allocation seemed to fail.

15:26:47  ERROR: (gcloud.container.clusters.resize) FAILED_PRECONDITION: Operation operation-1610580406679-d1267395 is currently operating on cluster cilium-ci-21. Please wait and try again once it is done.

@joestringer
Copy link
Member

retest-gke

@aanm aanm merged commit a544c51 into cilium:master Jan 14, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 14, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.9 in 1.9.2 Jan 14, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.9 to Backport done to v1.9 in 1.9.2 Jan 18, 2021
This was referenced Jan 19, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.8 in 1.8.7 Jan 20, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.8 to Backport done to v1.8 in 1.8.7 Jan 21, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.8 to Backport done to v1.8 in 1.8.7 Jan 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug/race-detector kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
No open projects
1.8.7
Backport done to v1.8
1.9.2
Backport done to v1.9
Development

Successfully merging this pull request may close these issues.

None yet

5 participants