New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix various issues (data races, flakes) related to DaemonSuite and Endpoint related test code #12195
Fix various issues (data races, flakes) related to DaemonSuite and Endpoint related test code #12195
Conversation
Please set the appropriate release note label. |
test-me-please |
All green CI on first try 🎉 |
01f059a
to
f646e4b
Compare
test-me-please |
f646e4b
to
2896094
Compare
test-me-please Edit: net-next provisioning failure |
retest-net-next |
mutex = lock.RWMutex{} // Protects access to `pool` below. | ||
pool = idpool.NewIDPool(minID, maxID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will slow down the access of the pool
which already has a mutex builtin. I know the mutex is protecting the variable pool
not the contents of pool
. This entire mutex is only being added because of ReallocatePool
which is only used in unit tests. Since this is the case we should avoid using a mutex entirely and re-write the ReallocatePool
function to do something like:
// ReallocatePool starts over with a new pool.
// This function is only used for tests and its implementation is not
// optimized for production.
func ReallocatePool() {
for i := minID; i < maxID; i++ {
pool.Release(i)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for extra insight. I agree that the mutex would slow things down. Good catch on noticing that ReallocatePool
isn't used in implementation code.
Will add this and remote the mutex.
pkg/policy/repository.go
Outdated
@@ -32,7 +32,7 @@ import ( | |||
"github.com/cilium/cilium/pkg/metrics" | |||
"github.com/cilium/cilium/pkg/option" | |||
"github.com/cilium/cilium/pkg/policy/api" | |||
"github.com/cilium/proxy/go/cilium/api" | |||
cilium "github.com/cilium/proxy/go/cilium/api" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will remove
@@ -282,7 +282,7 @@ func (m *endpointCreationManager) DebugStatus() (output string) { | |||
|
|||
// createEndpoint attempts to create the endpoint corresponding to the change | |||
// request that was specified. | |||
func (d *Daemon) createEndpoint(ctx context.Context, epTemplate *models.EndpointChangeRequest) (*endpoint.Endpoint, int, error) { | |||
func (d *Daemon) createEndpoint(ctx context.Context, owner regeneration.Owner, epTemplate *models.EndpointChangeRequest) (*endpoint.Endpoint, int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this change, why can't we keep the old way and still pass d
to the functions on line 316, 412 and 497?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change allows us to pass DaemonSuite
as the owner
, because it implements the regeneration.Owner
interface. This is useful because we can mock out certain calls that happen during the creation of the endpoint. Specifically, to fix the metrics assertion flake (as mentioned in the commit msg and in the PR body), we want to mock out the call to (*Daemon).QueueEndpointBuild
because that causes the endpoint to transition states extremely quickly. By mocking this out, we allow the test to become more deterministic, thus less flaky. See this comment for more info on this.
Does that make sense? Happy to clarify further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I miss that we were passing ds
in the tests and not ds.d
Also I've removed all needs-backport labels for older branches to reduce the risk of introducing regressions |
For testing purposes, it's useful to pass the owner of the endpoint to `(*Daemon).createEndpoint`. Since an owner must implement the `regeneration.Owner` interface, we can utilize this fact to mock out the enqueuing of endpoint regeneration. In other words, we avoid the state transitions caused by endpoint regeneration because the enqueuing has been "blocked" by mocking it out. This allows us to assert on specific state transitions without worrying about endpoint regeneration kicking in and transitioning endpoint state, unintentionally. We also are utilizing the fact that this test suite (DaemonSuite) already implemented the `regeneration.Owner` interface for mocking purposes, but wasn't used in such a fashion. This commit also fixes two incorrect assertions in the test: 1) Remove premature assertion on metrics because the `TestEndpointAddNoLabels` test is about creating an endpoint without any initial labels set. The endpoint's initial label is assigned (`reserved:init`) within the call of `(*Daemon).createEndpoint`, so it is very unlikely that we catch the endpoint in any other state besides "ready". Therefore, we remove this premature assertion as it doesn't test anything useful. 2) The final state transition in `TestEndpointAddNoLabels` should be "regenerating" -> "ready" because now `(*Daemon).QueueEndpointBuild()` is using the implementation (not mocked out, like in the rest of the suite). Fixes: 4a4a59b ("daemon: Add assertions for endpoint state metrics") Fixes: 19030a7 ("daemon: Fix waiting on metrics in endpoint_test.go") Signed-off-by: Chris Tarazi <chris@isovalent.com>
The data race stemmed from the `ReallocatePool()` which is a function that's only used in testing code, and not implementation code. Therefore, we don't need a mutex to protect access to the variable itself. The data race is fixed by eliminating the overwrite of `pool`, and releasing the IDs instead, thus reusing the same instance of `pool`. Operations by `pool` are already protected by a mutex. Along with fixing the following data race, this commit also changes the usage of idpool to be a value rather than a pointer. idpool doesn't need to be a pointer as the only consumer of it is the endpoint manager, which is essentially used as a singleton during the lifetime of Cilium. Using values rather than pointers helps reduce the load on the Golang garbage collector. In this case, the impact is small, but useful nonetheless. ``` WARNING: DATA RACE Write at 0x0000050336b0 by goroutine 243: github.com/cilium/cilium/pkg/endpointmanager/idallocator.ReallocatePool() /home/chris/code/cilium/cilium/pkg/endpointmanager/idallocator/allocator.go:34 +0x14e github.com/cilium/cilium/pkg/endpointmanager.(*EndpointManager).RemoveAll() /home/chris/code/cilium/cilium/pkg/endpointmanager/manager.go:301 +0x80 github.com/cilium/cilium/daemon/cmd.(*DaemonSuite).TearDownTest() /home/chris/code/cilium/cilium/daemon/cmd/daemon_test.go:167 +0x8f github.com/cilium/cilium/daemon/cmd.(*DaemonEtcdSuite).TearDownTest() /home/chris/code/cilium/cilium/daemon/cmd/daemon_test.go:209 +0x44 runtime.call32() /usr/lib/go/src/runtime/asm_amd64.s:539 +0x3a reflect.Value.Call() /usr/lib/go/src/reflect/value.go:321 +0xd3 gopkg.in/check%2ev1.(*suiteRunner).runFixture.func1() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:730 +0x1c3 gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9 Previous read at 0x0000050336b0 by goroutine 196: github.com/cilium/cilium/pkg/endpointmanager/idallocator.Release() /home/chris/code/cilium/cilium/pkg/endpointmanager/idallocator/allocator.go:72 +0x3e github.com/cilium/cilium/pkg/endpointmanager.(*EndpointManager).ReleaseID() /home/chris/code/cilium/cilium/pkg/endpointmanager/manager.go:287 +0x50 github.com/cilium/cilium/pkg/endpoint.(*Endpoint).Unexpose.func1() /home/chris/code/cilium/cilium/pkg/endpoint/manager.go:143 +0xc4 Goroutine 243 (running) created at: gopkg.in/check%2ev1.(*suiteRunner).forkCall() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:672 +0x44d gopkg.in/check%2ev1.(*suiteRunner).runFunc() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:682 +0x83 gopkg.in/check%2ev1.(*suiteRunner).runFixture() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:726 +0x3d gopkg.in/check%2ev1.(*suiteRunner).runFixtureWithPanic() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:744 +0x82 gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:782 +0xa0b gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9 Goroutine 196 (finished) created at: github.com/cilium/cilium/pkg/endpoint.(*Endpoint).Unexpose() /home/chris/code/cilium/cilium/pkg/endpoint/manager.go:132 +0xd1 github.com/cilium/cilium/pkg/endpoint.(*Endpoint).Delete() /home/chris/code/cilium/cilium/pkg/endpoint/endpoint.go:2193 +0x174 github.com/cilium/cilium/daemon/cmd.(*Daemon).deleteEndpointQuiet() /home/chris/code/cilium/cilium/daemon/cmd/endpoint.go:633 +0xe6 github.com/cilium/cilium/daemon/cmd.(*Daemon).errorDuringCreation() /home/chris/code/cilium/cilium/daemon/cmd/endpoint.go:188 +0x4d github.com/cilium/cilium/daemon/cmd.(*Daemon).createEndpoint() /home/chris/code/cilium/cilium/daemon/cmd/endpoint.go:443 +0x177f github.com/cilium/cilium/daemon/cmd.(*DaemonSuite).TestUpdateLabelsFailed() /home/chris/code/cilium/cilium/daemon/cmd/endpoint_test.go:149 +0x124 github.com/cilium/cilium/daemon/cmd.(*DaemonEtcdSuite).TestUpdateLabelsFailed() <autogenerated>:1 +0x4d runtime.call32() /usr/lib/go/src/runtime/asm_amd64.s:539 +0x3a reflect.Value.Call() /usr/lib/go/src/reflect/value.go:321 +0xd3 gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:781 +0xa0a gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9 ``` Signed-off-by: Chris Tarazi <chris@isovalent.com>
This commit adds optimizations that cut the execution time of IDPoolTestSuite by roughly half, from ~1.6 seconds to ~0.8 seconds. This optimization involves eliminating superfluous memory allocations. (While shaving off 0.8 seconds [from a test suite that takes on the order of minutes] does not sound like much, this commit can serve as a template for future optimizations around tests which require a large number of memory allocations.) These allocations come from the `Assert` function in the Golang checker package which we use for assertions, etcetera. This function is costly when many Goroutines are running in parallel performing assertions. Each assertion involves a rather large memory allocation when processing the arguments to the function, before actually getting to the assertion itself. This is quite wasteful in this specific test suite where we're simply comparing integer and boolean types. This commit implements the optimization by replacing the call to `Assert` with a simple equality check. See the cpu and mem profiles below. Before optimization; cpu profile: ``` $ go test -c ./pkg/idpool && go test ./pkg/idpool -cpuprofile cpu.out -count 1 ok github.com/cilium/cilium/pkg/idpool 1.612s $ go tool pprof ./idpool.test cpu.out File: idpool.test Type: cpu Time: Jun 20, 2020 at 4:39pm (PDT) Duration: 1.60s, Total samples = 1.39s (86.77%) Entering interactive mode (type "help" for commands, "o" for options) (pprof) top10 Showing nodes accounting for 770ms, 55.40% of 1390ms total Showing top 10 nodes out of 129 flat flat% sum% cum cum% 320ms 23.02% 23.02% 320ms 23.02% runtime.procyield 70ms 5.04% 28.06% 200ms 14.39% runtime.mallocgc <--- Nearly 15% of CPU time is spent in memory allocations 70ms 5.04% 33.09% 70ms 5.04% runtime.unlock 60ms 4.32% 37.41% 310ms 22.30% runtime.findrunnable 50ms 3.60% 41.01% 310ms 22.30% runtime.lock 50ms 3.60% 44.60% 50ms 3.60% runtime.runqget 40ms 2.88% 47.48% 270ms 19.42% gopkg.in/check%2ev1.(*C).internalCheck 40ms 2.88% 50.36% 60ms 4.32% runtime.heapBitsSetType 40ms 2.88% 53.24% 150ms 10.79% sync.(*Mutex).Unlock 30ms 2.16% 55.40% 30ms 2.16% runtime.casgstatus ``` Before optimization; memory profle: ``` $ go test -c ./pkg/idpool && go test ./pkg/idpool -memprofile mem.out -count 1 ok github.com/cilium/cilium/pkg/idpool 1.303s $ go tool pprof ./idpool.test mem.out File: idpool.test Type: alloc_space Time: Jun 20, 2020 at 4:41pm (PDT) Entering interactive mode (type "help" for commands, "o" for options) (pprof) top10 Showing nodes accounting for 341.51MB, 100% of 341.51MB total flat flat% sum% cum cum% 225.01MB 65.89% 65.89% 315.51MB 92.39% gopkg.in/check%2ev1.(*C).internalCheck 90.50MB 26.50% 92.39% 90.50MB 26.50% gopkg.in/check%2ev1.(*notChecker).Info 18MB 5.27% 97.66% 18MB 5.27% gopkg.in/check%2ev1.Not (inline) 8MB 2.34% 100% 225.01MB 65.89% github.com/cilium/cilium/pkg/idpool.(*IDPoolTestSuite).TestAllocateID.func1 0 0% 100% 116.50MB 34.11% github.com/cilium/cilium/pkg/idpool.(*IDPoolTestSuite).TestAllocateID 0 0% 100% 315.51MB 92.39% gopkg.in/check%2ev1.(*C).Assert 0 0% 100% 116.50MB 34.11% gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1 0 0% 100% 116.50MB 34.11% gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1 0 0% 100% 116.50MB 34.11% reflect.Value.Call 0 0% 100% 116.50MB 34.11% reflect.Value.call (pprof) list internalCheck Total: 341.51MB ROUTINE ======================== gopkg.in/check%2ev1.(*C).internalCheck in /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/helpers.go 225.01MB 315.51MB (flat, cum) 92.39% of Total . . 192: comment = c . . 193: args = args[:len(args)-1] . . 194: } . . 195: } . . 196: 135MB 135MB 197: params := append([]interface{}{obtained}, args...) . 90.50MB 198: info := checker.Info() . . 199: . . 200: if len(params) != len(info.Params) { . . 201: names := append([]string{info.Params[0], info.Name}, info.Params[1:]...) . . 202: c.logCaller(2) . . 203: c.logString(fmt.Sprintf("%s(%s):", funcName, strings.Join(names, ", "))) . . 204: c.logString(fmt.Sprintf("Wrong number of parameters for %s: want %d, got %d", info.Name, len(names), len(params)+1)) . . 205: c.logNewLine() . . 206: c.Fail() . . 207: return false . . 208: } . . 209: . . 210: // Copy since it may be mutated by Check. 90MB 90MB 211: names := append([]string{}, info.Params...) . . 212: . . 213: // Do the actual check. . . 214: result, error := checker.Check(params, names) . . 215: if !result || error != "" { . . 216: c.logCaller(2) (pprof) list TestAllocateID Total: 341.51MB ROUTINE ======================== github.com/cilium/cilium/pkg/idpool.(*IDPoolTestSuite).TestAllocateID in /home/chris/code/cilium/cilium/pkg/idpool/idpool_test.go 0 116.50MB (flat, cum) 34.11% of Total . . 342: allocators.Wait() . . 343: close(allocated) . . 344: }() . . 345: . . 346: for id := range allocated { . 116.50MB 347: c.Assert(p.Insert(id), Equals, true) . . 348: } . . 349:} ROUTINE ======================== github.com/cilium/cilium/pkg/idpool.(*IDPoolTestSuite).TestAllocateID.func1 in /home/chris/code/cilium/cilium/pkg/idpool/idpool_test.go 8MB 225.01MB (flat, cum) 65.89% of Total . . 329: for i := 0; i < 256; i++ { . . 330: allocators.Add(1) . . 331: go func() { . . 332: for i := 1; i <= maxID; i++ { . . 333: id := p.AllocateID() 8MB 225.01MB 334: c.Assert(id, Not(Equals), NoID) . . 335: allocated <- id . . 336: } . . 337: allocators.Done() . . 338: }() . . 339: } ``` After optimization; cpu profle: ``` $ go test -c ./pkg/idpool && go test ./pkg/idpool -cpuprofile cpu.out -count 1 ok github.com/cilium/cilium/pkg/idpool 0.807s <--- Notice the time compared to the previous test run $ go tool pprof ./idpool.test cpu.out File: idpool.test Type: cpu Time: Jun 20, 2020 at 4:53pm (PDT) Duration: 800.83ms, Total samples = 660ms (82.41%) Entering interactive mode (type "help" for commands, "o" for options) (pprof) top10 Showing nodes accounting for 430ms, 65.15% of 660ms total Showing top 10 nodes out of 63 flat flat% sum% cum cum% 160ms 24.24% 24.24% 160ms 24.24% runtime.procyield 50ms 7.58% 31.82% 160ms 24.24% sync.(*Mutex).lockSlow 40ms 6.06% 37.88% 40ms 6.06% runtime.casgstatus 40ms 6.06% 43.94% 40ms 6.06% runtime.unlock 40ms 6.06% 50.00% 150ms 22.73% sync.(*Mutex).Unlock 20ms 3.03% 53.03% 300ms 45.45% github.com/cilium/cilium/pkg/idpool.(*IDPool).AllocateID 20ms 3.03% 56.06% 130ms 19.70% github.com/cilium/cilium/pkg/idpool.(*IDPool).Insert 20ms 3.03% 59.09% 20ms 3.03% runtime.cansemacquire 20ms 3.03% 62.12% 40ms 6.06% runtime.findrunnable 20ms 3.03% 65.15% 20ms 3.03% runtime.futex ``` By looking at the test run times, you can see it was nearly cut in half. We also no longer see any measurable time spent in `runtime.mallocgc`. Signed-off-by: Chris Tarazi <chris@isovalent.com>
This allows us to fix the following data race. The data race occurred because the creation of EndpointManager was racing with the controllers running in the background when the `Daemon` object is created. ``` $ go test ./daemon/cmd -race -v -count 1 -check.f DaemonEtcdSuite.TestEndpointAddNoLabels === RUN Test ================== WARNING: DATA RACE Write at 0x00c0007612f8 by goroutine 103: github.com/cilium/cilium/daemon/cmd.(*DaemonSuite).SetUpTest() /home/chris/code/cilium/cilium/daemon/cmd/daemon_test.go:145 +0x724 github.com/cilium/cilium/daemon/cmd.(*DaemonEtcdSuite).SetUpTest() /home/chris/code/cilium/cilium/daemon/cmd/daemon_test.go:195 +0x44 runtime.call32() /usr/lib/go/src/runtime/asm_amd64.s:539 +0x3a reflect.Value.Call() /usr/lib/go/src/reflect/value.go:321 +0xd3 gopkg.in/check%2ev1.(*suiteRunner).runFixture.func1() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:730 +0x1c3 gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9 Previous read at 0x00c0007612f8 by goroutine 122: github.com/cilium/cilium/daemon/cmd.(*Daemon).bootstrapFQDN.func1() /home/chris/code/cilium/cilium/daemon/cmd/fqdn.go:191 +0x3ea github.com/cilium/cilium/pkg/controller.(*Controller).runController() /home/chris/code/cilium/cilium/pkg/controller/controller.go:205 +0xc6f Goroutine 103 (running) created at: gopkg.in/check%2ev1.(*suiteRunner).forkCall() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:672 +0x44d gopkg.in/check%2ev1.(*suiteRunner).runFunc() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:682 +0x83 gopkg.in/check%2ev1.(*suiteRunner).runFixture() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:726 +0x3d gopkg.in/check%2ev1.(*suiteRunner).runFixtureWithPanic() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:744 +0x82 gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:769 +0x28c gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9 Goroutine 122 (running) created at: github.com/cilium/cilium/pkg/controller.(*Manager).updateController() /home/chris/code/cilium/cilium/pkg/controller/manager.go:120 +0xfb5 github.com/cilium/cilium/pkg/controller.(*Manager).UpdateController() /home/chris/code/cilium/cilium/pkg/controller/manager.go:59 +0xc7d github.com/cilium/cilium/daemon/cmd.(*Daemon).bootstrapFQDN() /home/chris/code/cilium/cilium/daemon/cmd/fqdn.go:172 +0xa42 github.com/cilium/cilium/daemon/cmd.NewDaemon() /home/chris/code/cilium/cilium/daemon/cmd/daemon.go:556 +0x33b6 github.com/cilium/cilium/daemon/cmd.(*DaemonSuite).SetUpTest() /home/chris/code/cilium/cilium/daemon/cmd/daemon_test.go:134 +0x2de github.com/cilium/cilium/daemon/cmd.(*DaemonEtcdSuite).SetUpTest() /home/chris/code/cilium/cilium/daemon/cmd/daemon_test.go:195 +0x44 runtime.call32() /usr/lib/go/src/runtime/asm_amd64.s:539 +0x3a reflect.Value.Call() /usr/lib/go/src/reflect/value.go:321 +0xd3 gopkg.in/check%2ev1.(*suiteRunner).runFixture.func1() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:730 +0x1c3 gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9 ``` Signed-off-by: Chris Tarazi <chris@isovalent.com>
fbea8ec
to
453ed1e
Compare
@@ -150,7 +150,7 @@ type Allocator struct { | |||
initialListDone waitChan | |||
|
|||
// idPool maintains a pool of available ids for allocation. | |||
idPool *idpool.IDPool | |||
idPool idpool.IDPool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this changes anything WRT go's GC because the variable is only initialized once when the agent starts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True; in this specific case, the impact in minimal. The general rule I try to follow is to only make a variable a pointer if it must be one. I noticed while looking at this code, it didn't need to be a pointer, and changing it so was quite trivial. I think moving forward we should also consider reducing our usage of pointers. Over time, we may find that we improve the performance of Cilium because it'll reduce the load on the GC.
This fixes a data race with `(*Daemon).policy`; the variable is overwritten in each test run. Each policy repository has event queues which are running in the background which accesses the repository. When the variable is overwritten with a new instance of a policy repository, the event queues accesses race with the write to `(*Daemon).policy`. The fix is to first shutdown and drain the event queues before proceeding onto the next test run. ``` WARNING: DATA RACE Write at 0x00c0011cef48 by goroutine 329: github.com/cilium/cilium/daemon/cmd.(*DaemonSuite).Test_addCiliumNetworkPolicyV2() /home/chris/code/cilium/cilium/daemon/cmd/policy_test.go:1236 +0x5e5 github.com/cilium/cilium/daemon/cmd.(*DaemonConsulSuite).Test_addCiliumNetworkPolicyV2() <autogenerated>:1 +0x4d runtime.call32() /usr/lib/go/src/runtime/asm_amd64.s:539 +0x3a reflect.Value.Call() /usr/lib/go/src/reflect/value.go:321 +0xd3 gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:781 +0xa0a gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9 Previous read at 0x00c0011cef48 by goroutine 210: github.com/cilium/cilium/daemon/cmd.(*Daemon).policyAdd() /home/chris/code/cilium/cilium/daemon/cmd/policy.go:430 +0x1330 github.com/cilium/cilium/daemon/cmd.(*PolicyAddEvent).Handle() /home/chris/code/cilium/cilium/daemon/cmd/policy.go:217 +0xc9 github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).Run.func1() /home/chris/code/cilium/cilium/pkg/eventqueue/eventqueue.go:260 +0x25d sync.(*Once).doSlow() /usr/lib/go/src/sync/once.go:66 +0x103 sync.(*Once).Do() /usr/lib/go/src/sync/once.go:57 +0x68 Goroutine 329 (running) created at: gopkg.in/check%2ev1.(*suiteRunner).forkCall() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:672 +0x44d gopkg.in/check%2ev1.(*suiteRunner).forkTest() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:763 +0x1b9 gopkg.in/check%2ev1.(*suiteRunner).runTest() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:818 +0x228 gopkg.in/check%2ev1.(*suiteRunner).run() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:624 +0x1c7 gopkg.in/check%2ev1.Run() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/run.go:92 +0x5a gopkg.in/check%2ev1.RunAll() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/run.go:84 +0x127 gopkg.in/check%2ev1.TestingT() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/run.go:72 +0x5f6 github.com/cilium/cilium/daemon/cmd.Test() /home/chris/code/cilium/cilium/daemon/cmd/daemon_test.go:50 +0x38 testing.tRunner() /usr/lib/go/src/testing/testing.go:991 +0x1eb Goroutine 210 (running) created at: github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).Run() /home/chris/code/cilium/cilium/pkg/eventqueue/eventqueue.go:248 +0xe6 github.com/cilium/cilium/pkg/policy.NewPolicyRepository() /home/chris/code/cilium/cilium/pkg/policy/repository.go:144 +0x11d github.com/cilium/cilium/daemon/cmd.(*DaemonSuite).Test_addCiliumNetworkPolicyV2.func2() /home/chris/code/cilium/cilium/daemon/cmd/policy_test.go:1006 +0xd4 github.com/cilium/cilium/daemon/cmd.(*DaemonSuite).Test_addCiliumNetworkPolicyV2() /home/chris/code/cilium/cilium/daemon/cmd/policy_test.go:1234 +0x574 github.com/cilium/cilium/daemon/cmd.(*DaemonConsulSuite).Test_addCiliumNetworkPolicyV2() <autogenerated>:1 +0x4d runtime.call32() /usr/lib/go/src/runtime/asm_amd64.s:539 +0x3a reflect.Value.Call() /usr/lib/go/src/reflect/value.go:321 +0xd3 gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:781 +0xa0a gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9 ``` Signed-off-by: Chris Tarazi <chris@isovalent.com>
453ed1e
to
a8191a5
Compare
test-me-please |
See commit msgs.
Fixes: #11955 (Specifically commit 355f037 )