New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deflake endpointmanager tests #31488
deflake endpointmanager tests #31488
Conversation
91fe146
to
6e7f160
Compare
/test CI triage:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
While we're in there, would it be possible to fix the spelling of |
6e7f160
to
5365276
Compare
/test |
CI triage:
|
5365276
to
ef8b41a
Compare
Go's race detector was unhappy with this test due to unserialised concurrent access to the last value of the fake gauge. Use an atomic float value instead, to ensure no weirdness can occur, and placate the race detector. Race detector warning (mildly edited): WARNING: DATA RACE Read at 0x00c000930488 by goroutine 205: github.com/cilium/cilium/pkg/endpointmanager.TestPolicyMapPressure.TestPolicyMapPressure.func1.func2() cilium/pkg/endpointmanager/policymap_pressure_test.go:27 +0x69 github.com/stretchr/testify/assert.Eventually.func1() cilium/vendor/github.com/stretchr/testify/assert/assertions.go:1902 +0x33 Previous write at 0x00c000930488 by goroutine 203: github.com/cilium/cilium/pkg/endpointmanager.(*fakeGague).Set() cilium/pkg/endpointmanager/policymap_pressure_test.go:45 +0x30 github.com/cilium/cilium/pkg/endpointmanager.(*policyMapPressure).update() cilium/pkg/endpointmanager/policymap_pressure.go:82 +0x32a github.com/cilium/cilium/pkg/endpointmanager.newPolicyMapPressure.func1() cilium/pkg/endpointmanager/policymap_pressure.go:57 +0x2e github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter() cilium/pkg/trigger/trigger.go:201 +0x771 github.com/cilium/cilium/pkg/trigger.NewTrigger.gowrap1() cilium/pkg/trigger/trigger.go:122 +0x33 Goroutine 205 (running) created at: github.com/stretchr/testify/assert.Eventually() cilium/vendor/github.com/stretchr/testify/assert/assertions.go:1902 +0x3d5 github.com/stretchr/testify/assert.(*Assertions).Eventually() cilium/vendor/github.com/stretchr/testify/assert/assertion_forward.go:319 +0xc7 github.com/cilium/cilium/pkg/endpointmanager.TestPolicyMapPressure.func1() cilium/pkg/endpointmanager/policymap_pressure_test.go:26 +0x2a8 github.com/cilium/cilium/pkg/endpointmanager.TestPolicyMapPressure() cilium/pkg/endpointmanager/policymap_pressure_test.go:30 +0x205 Goroutine 203 (running) created at: github.com/cilium/cilium/pkg/trigger.NewTrigger() cilium/pkg/trigger/trigger.go:122 +0x36d github.com/cilium/cilium/pkg/endpointmanager.newPolicyMapPressure() cilium/pkg/endpointmanager/policymap_pressure.go:52 +0x287 github.com/cilium/cilium/pkg/endpointmanager.TestPolicyMapPressure() cilium/pkg/endpointmanager/policymap_pressure_test.go:18 +0x84 Fixes: 28ce005 (endpointmanager: fix bpf policy pressure getting stuck.) Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
IDPool contains a mutex, passing copies around is a potential footgun. I don't think we ever used it incorrectly, but I don't see a reason for all the copying either. Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
Transform the test from using checkmate to standard Go tests, as it was not using any of the features anyway. Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
The endpointmanagers idallocator package was using a package global pool for its identifier allocation. That's fine for running the agent, but causes flakes in testing when multiple tests access the same pool. It's also not idiomatic Go. This patch makes the local endpoint identifier allocator a struct, and the next patch will move it into the endpointmanager package itself, as there is no other consumer. While at it, also ensure that the RemoveAll method is only called from a testing context, by taking a testing.TB as an argument. We cannot simply move the method into the _test.go files, as tests from other packages use it. Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
The endpoint manager assumed it was the only consumer of the idallocator pkg anyway. Having a pkg that only has one consumer is pointless, hence move it, including tests. This also allows unexporting everything, and reducing API surface. Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Datapath owned files look good to me!
This series of patches deflakes the endpoint manager tests. Four different flakes were observed when running just the
endpointmanager
pkg test in a loop, after this PR I achieved31m55s: 46344 runs so far, 0 failures
which implies a high likelihood that any potentially remaining flakes are highly unlikely to occur.There was a trivial data race in the pressure metrics test, fixed by using an atomic value.
This PR refactors the endpointmanager EP identifier allocation logic away from using a single, global identity pool, as that inherently causes flakes in tests sharing that pool. Specifically, there was a call to
mgr.expose
which did not check the error - the error was that in the global pool, one of the identities[1, 3, 5, 7]
had already been allocated, and hence the exposing failed. Fixed by creating a new allocation pool for each new manager.Further tests also didn't check
expose
errors, but these were inconsequential once the global ID pool was removed. Fixed the checks anyway, for good measure.In addition,
TestLookup
would fail iff the EP allocated would by chance get ID1234
.The commits attempt to be somewhat readable in sequence, but the whole change isn't massive either.
Fixes: #26630
Fixes: #28878
Fixes: #27837