Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daemon: Init k8s watchers after setting agent flags #18770

Merged

Conversation

pchaigno
Copy link
Member

The way our agent startup is structured can lead to a data race on the EnableBPFMasquerade flag. In the main thread, we initialize k8s watchers via goroutines which may end up using flag EnableBPFMasquerade. We then update the value of that flag. So we can have a race between the main thread and a k8s watcher goroutine. The full stack traces for the data race are shown below.

Moving the k8s watcher initialization after the flags are fully updated removes this data race.

Click to show the stack traces.
WARNING: DATA RACE
Write at 0x000005803072 by main goroutine:
  github.com/cilium/cilium/daemon/cmd.NewDaemon()
      /go/src/github.com/cilium/cilium/daemon/cmd/daemon.go:710 +0x5587
  github.com/cilium/cilium/daemon/cmd.runDaemon()
      /go/src/github.com/cilium/cilium/daemon/cmd/daemon_main.go:1652 +0xa98
  github.com/cilium/cilium/daemon/cmd.glob..func1()
      /go/src/github.com/cilium/cilium/daemon/cmd/daemon_main.go:126 +0x494
  github.com/spf13/cobra.(*Command).execute()
      /go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:860 +0xaf3
  github.com/spf13/cobra.(*Command).ExecuteC()
      /go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:974 +0x5da
  github.com/spf13/cobra.(*Command).Execute()
      /go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:902 +0x65
  github.com/cilium/cilium/daemon/cmd.Execute()
      /go/src/github.com/cilium/cilium/daemon/cmd/daemon_main.go:140 +0x4e
  main.main()
      /go/src/github.com/cilium/cilium/daemon/main.go:16 +0x24
  runtime.main()
      /usr/local/go/src/runtime/proc.go:255 +0x226
  github.com/cilium/ebpf.init()
      /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:220 +0x868
  github.com/cilium/ebpf.init()
      /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:188 +0x808
  github.com/cilium/ebpf.init()
      /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:166 +0x7a8
  github.com/cilium/ebpf.init()
      /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:148 +0x748
  github.com/cilium/ebpf.init()
      /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:108 +0x6e8
  github.com/cilium/ebpf.init()
      /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:92 +0x688
  github.com/cilium/ebpf.init()
      /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:76 +0x628
  github.com/cilium/ebpf.init()
      /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:59 +0x5c8
  github.com/cilium/ebpf.init()
      /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:41 +0x568
  runtime.doInit()
      /usr/local/go/src/runtime/proc.go:6498 +0x122
  github.com/cilium/ebpf/internal/btf.init()
      /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/internal/btf/btf.go:820 +0x1bc

Previous read at 0x000005803072 by goroutine 179:
  github.com/cilium/cilium/pkg/option.(*DaemonConfig).IptablesMasqueradingIPv4Enabled()
      /go/src/github.com/cilium/cilium/pkg/option/config.go:2182 +0x17a9
  github.com/cilium/cilium/pkg/option.(*DaemonConfig).IptablesMasqueradingEnabled()
      /go/src/github.com/cilium/cilium/pkg/option/config.go:2194 +0x17dc
  github.com/cilium/cilium/pkg/node/manager.(*Manager).NodeUpdated()
      /go/src/github.com/cilium/cilium/pkg/node/manager/manager.go:414 +0x1778
  github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).ciliumNodeInit.func1()
      /go/src/github.com/cilium/cilium/pkg/k8s/watchers/cilium_node.go:43 +0x374
  k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:231 +0x79
  k8s.io/client-go/tools/cache.(*ResourceEventHandlerFuncs).OnAdd()
      <autogenerated>:1 +0x33
  github.com/cilium/cilium/pkg/k8s/informer.NewInformerWithStore.func1()
      /go/src/github.com/cilium/cilium/pkg/k8s/informer/informer.go:108 +0x2d8
  k8s.io/client-go/tools/cache.(*DeltaFIFO).Pop()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/delta_fifo.go:554 +0x8c9
  k8s.io/client-go/tools/cache.(*controller).processLoop()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:183 +0x61
  k8s.io/client-go/tools/cache.(*controller).processLoop-fm()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:181 +0x39
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x81
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xce
  k8s.io/apimachinery/pkg/util/wait.JitterUntil()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x104
  k8s.io/apimachinery/pkg/util/wait.Until()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x616
  k8s.io/client-go/tools/cache.(*controller).Run()
      /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:154 +0x5cd
  github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).ciliumNodeInit·dwrap·10()
      /go/src/github.com/cilium/cilium/pkg/k8s/watchers/cilium_node.go:96 +0x59

Goroutine 179 (running) created at:
  github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).ciliumNodeInit()
      /go/src/github.com/cilium/cilium/pkg/k8s/watchers/cilium_node.go:96 +0x8d
  github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).EnableK8sWatcher·dwrap·21()
      /go/src/github.com/cilium/cilium/pkg/k8s/watchers/watcher.go:438 +0x58

Fixes: #18727.

The way our agent startup is structured can lead to a data race on the
EnableBPFMasquerade flag. In the main thread, we initialize k8s watchers
via goroutines which may end up using flag EnableBPFMasquerade. We then
update the value of that flag. So we can have a race between the main
thread and a k8s watcher goroutine. The full stack traces for the data
race are shown below.

Moving the k8s watcher initialization after the flags are fully updated
removes this data race.

    WARNING: DATA RACE
    Write at 0x000005803072 by main goroutine:
      github.com/cilium/cilium/daemon/cmd.NewDaemon()
          /go/src/github.com/cilium/cilium/daemon/cmd/daemon.go:710 +0x5587
      github.com/cilium/cilium/daemon/cmd.runDaemon()
          /go/src/github.com/cilium/cilium/daemon/cmd/daemon_main.go:1652 +0xa98
      github.com/cilium/cilium/daemon/cmd.glob..func1()
          /go/src/github.com/cilium/cilium/daemon/cmd/daemon_main.go:126 +0x494
      github.com/spf13/cobra.(*Command).execute()
          /go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:860 +0xaf3
      github.com/spf13/cobra.(*Command).ExecuteC()
          /go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:974 +0x5da
      github.com/spf13/cobra.(*Command).Execute()
          /go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:902 +0x65
      github.com/cilium/cilium/daemon/cmd.Execute()
          /go/src/github.com/cilium/cilium/daemon/cmd/daemon_main.go:140 +0x4e
      main.main()
          /go/src/github.com/cilium/cilium/daemon/main.go:16 +0x24
      runtime.main()
          /usr/local/go/src/runtime/proc.go:255 +0x226
      github.com/cilium/ebpf.init()
          /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:220 +0x868
      github.com/cilium/ebpf.init()
          /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:188 +0x808
      github.com/cilium/ebpf.init()
          /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:166 +0x7a8
      github.com/cilium/ebpf.init()
          /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:148 +0x748
      github.com/cilium/ebpf.init()
          /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:108 +0x6e8
      github.com/cilium/ebpf.init()
          /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:92 +0x688
      github.com/cilium/ebpf.init()
          /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:76 +0x628
      github.com/cilium/ebpf.init()
          /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:59 +0x5c8
      github.com/cilium/ebpf.init()
          /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:41 +0x568
      runtime.doInit()
          /usr/local/go/src/runtime/proc.go:6498 +0x122
      github.com/cilium/ebpf/internal/btf.init()
          /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/internal/btf/btf.go:820 +0x1bc

    Previous read at 0x000005803072 by goroutine 179:
      github.com/cilium/cilium/pkg/option.(*DaemonConfig).IptablesMasqueradingIPv4Enabled()
          /go/src/github.com/cilium/cilium/pkg/option/config.go:2182 +0x17a9
      github.com/cilium/cilium/pkg/option.(*DaemonConfig).IptablesMasqueradingEnabled()
          /go/src/github.com/cilium/cilium/pkg/option/config.go:2194 +0x17dc
      github.com/cilium/cilium/pkg/node/manager.(*Manager).NodeUpdated()
          /go/src/github.com/cilium/cilium/pkg/node/manager/manager.go:414 +0x1778
      github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).ciliumNodeInit.func1()
          /go/src/github.com/cilium/cilium/pkg/k8s/watchers/cilium_node.go:43 +0x374
      k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd()
          /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:231 +0x79
      k8s.io/client-go/tools/cache.(*ResourceEventHandlerFuncs).OnAdd()
          <autogenerated>:1 +0x33
      github.com/cilium/cilium/pkg/k8s/informer.NewInformerWithStore.func1()
          /go/src/github.com/cilium/cilium/pkg/k8s/informer/informer.go:108 +0x2d8
      k8s.io/client-go/tools/cache.(*DeltaFIFO).Pop()
          /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/delta_fifo.go:554 +0x8c9
      k8s.io/client-go/tools/cache.(*controller).processLoop()
          /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:183 +0x61
      k8s.io/client-go/tools/cache.(*controller).processLoop-fm()
          /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:181 +0x39
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1()
          /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x81
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil()
          /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xce
      k8s.io/apimachinery/pkg/util/wait.JitterUntil()
          /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x104
      k8s.io/apimachinery/pkg/util/wait.Until()
          /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x616
      k8s.io/client-go/tools/cache.(*controller).Run()
          /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:154 +0x5cd
      github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).ciliumNodeInit·dwrap·10()
          /go/src/github.com/cilium/cilium/pkg/k8s/watchers/cilium_node.go:96 +0x59

    Goroutine 179 (running) created at:
      github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).ciliumNodeInit()
          /go/src/github.com/cilium/cilium/pkg/k8s/watchers/cilium_node.go:96 +0x8d
      github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).EnableK8sWatcher·dwrap·21()
          /go/src/github.com/cilium/cilium/pkg/k8s/watchers/watcher.go:438 +0x58

Signed-off-by: Paul Chaignon <paul@cilium.io>
@pchaigno pchaigno added area/daemon Impacts operation of the Cilium daemon. release-note/misc This PR makes changes that have no direct user impact. kind/bug/race-detector labels Feb 10, 2022
@pchaigno pchaigno requested review from a team and borkmann February 10, 2022 17:41
@pchaigno
Copy link
Member Author

/test

@pchaigno pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Feb 10, 2022
@jibi jibi merged commit bd2ac6c into cilium:master Feb 11, 2022
@pchaigno pchaigno deleted the fix-data-race-watcher-bpf-masquerading-flag branch February 11, 2022 09:41
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.11.2 Feb 11, 2022
@pchaigno
Copy link
Member Author

We need to backport this to v1.11 since #17871, the pull request fixes here, will be backported as well.

@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.10 in 1.11.2 Feb 14, 2022
@jibi jibi added backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. and removed backport-pending/1.11 labels Feb 16, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.10 to Backport done to v1.11 in 1.11.2 Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon Impacts operation of the Cilium daemon. backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. kind/bug/race-detector ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
No open projects
1.11.2
Backport done to v1.11
Development

Successfully merging this pull request may close these issues.

race: between daemon/cmd.NewDaemon() and pkg/option.(*DaemonConfig).IptablesMasqueradingIPv4Enabled()
3 participants