Skip to content

Commit

Permalink
identitybackend: address race condition in test
Browse files Browse the repository at this point in the history
TestGetIdentity has been unreliable, even withstanding some previous
attempts at deflaking.

The issue lies in the use of the k8s fake infrastructure: the simple
testing object tracker of client-go does _not_ set the ResourceVersion
for resources created. This interacts badly with the logic of the
client-go reflector's ListAndWatch method, which relies on the resource
version to close the racy window between its List and Watch calls. The
real k8s api-server will replay events which occur after the completion
of List and before the establishment of the Watch, thanks to the
ResourceVersion. The object tracker's Watch implementation, however,
does (and can) not do so, as it doesn't have a resource version to
determine which events it would need to replay.

Notably, the HasSynced method of the informer will return true once the
initial List has succeeded. This isn't a guarantee for the Watch to be
established (and indeed, the reflector establishes the Watch _after_ the
list). This is fine for reality, again thanks to the resource version
and the api-server replaying.

The race, hence, is that the creation of the identities can happen
concurrently to the establishment of the watch (HasSynced guarantees
that it happens _after_ the list), and thus we race the creation of the
"RaceFreeWatcher" in the object tracker. If the watcher is late, it
misses the creation of an identity, and we time out waiting on the wait
group.

To fix this, instead of attempting to wait for the Watch
establishment (which doesn't seem easy, on first glance), just create
the resources _before_ list and watch is started, so that they are
returned in the initial list call.

Prior to this patch, the following commandline typically failed quickly:

while true; do go test ./pkg/k8s/identitybackend -run 'TestGetIdentity' -v -count=1 -timeout=10s || break; done

After this patch, it ran thousands of times reliably.

Co-authored-by: Fabian Fischer <fabian.fischer@isovalent.com>
Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
  • Loading branch information
2 people authored and tklauser committed Feb 23, 2024
1 parent 6b2d186 commit deb0687
Showing 1 changed file with 14 additions and 7 deletions.
21 changes: 14 additions & 7 deletions pkg/k8s/identitybackend/identity_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,20 @@ func TestGetIdentity(t *testing.T) {
addWaitGroup := sync.WaitGroup{}
addWaitGroup.Add(len(tc.identities))

// To avoid a race, we must create these before we start ListAndWatch, see #30873. There
// is no easy way of knowing when the watch is established. Specifically, 'HasSynced'
// does _not_ guarantee it: the fake object tracker doesn't do resource versioning and
// hence cannot replay events in the reflector's gap between list and watch. Ironically,
// therefore, if we waited for the informer's HasSynced, we'd _increase_ the likelihood
// of the race. Avoid the whole issue by creating the objects before the informer is
// even started, thus guaranteeing the objects are part of the initial list.
for _, identity := range tc.identities {
_, err = client.CiliumV2().CiliumIdentities().Create(ctx, &identity, v1.CreateOptions{})
if err != nil {
t.Fatalf("Can't create identity %s: %s", identity.Name, err)
}
}

go backend.ListAndWatch(ctx, FakeHandler{onListDoneChan: listenerReadyChan, onAddFunc: func() { addWaitGroup.Done() }}, stopChan)

select {
Expand All @@ -230,13 +244,6 @@ func TestGetIdentity(t *testing.T) {
t.Fatalf("Failed to listen for identities within 2 seconds")
}

for _, identity := range tc.identities {
_, err = client.CiliumV2().CiliumIdentities().Create(ctx, &identity, v1.CreateOptions{})
if err != nil {
t.Fatalf("Can't create identity %s: %s", identity.Name, err)
}
}

// Wait for watcher to process the identities in the background
addWaitGroup.Wait()

Expand Down

0 comments on commit deb0687

Please sign in to comment.