identitybackend: address race condition in test

TestGetIdentity has been unreliable, even withstanding some previous attempts at deflaking. The issue lies in the use of the k8s fake infrastructure: the simple testing object tracker of client-go does _not_ set the ResourceVersion for resources created. This interacts badly with the logic of the client-go reflector's ListAndWatch method, which relies on the resource version to close the racy window between its List and Watch calls. The real k8s api-server will replay events which occur after the completion of List and before the establishment of the Watch, thanks to the ResourceVersion. The object tracker's Watch implementation, however, does (and can) not do so, as it doesn't have a resource version to determine which events it would need to replay. Notably, the HasSynced method of the informer will return true once the initial List has succeeded. This isn't a guarantee for the Watch to be established (and indeed, the reflector establishes the Watch _after_ the list). This is fine for reality, again thanks to the resource version and the api-server replaying. The race, hence, is that the creation of the identities can happen concurrently to the establishment of the watch (HasSynced guarantees that it happens _after_ the list), and thus we race the creation of the "RaceFreeWatcher" in the object tracker. If the watcher is late, it misses the creation of an identity, and we time out waiting on the wait group. To fix this, instead of attempting to wait for the Watch establishment (which doesn't seem easy, on first glance), just create the resources _before_ list and watch is started, so that they are returned in the initial list call. Prior to this patch, the following commandline typically failed quickly: while true; do go test ./pkg/k8s/identitybackend -run 'TestGetIdentity' -v -count=1 -timeout=10s || break; done After this patch, it ran thousands of times reliably. Co-authored-by: Fabian Fischer <fabian.fischer@isovalent.com> Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
cilium · Feb 23, 2024 · deb0687 · deb0687
1 parent 6b2d186
commit deb0687
Showing 1 changed file with 14 additions and 7 deletions.
diff --git a/pkg/k8s/identitybackend/identity_test.go b/pkg/k8s/identitybackend/identity_test.go
@@ -222,6 +222,20 @@ func TestGetIdentity(t *testing.T) {
 			addWaitGroup := sync.WaitGroup{}
 			addWaitGroup.Add(len(tc.identities))
 
+			// To avoid a race, we must create these before we start ListAndWatch, see #30873. There
+			// is no easy way of knowing when the watch is established. Specifically, 'HasSynced'
+			// does _not_ guarantee it: the fake object tracker doesn't do resource versioning and
+			// hence cannot replay events in the reflector's gap between list and watch. Ironically,
+			// therefore, if we waited for the informer's HasSynced, we'd _increase_ the likelihood
+			// of the race. Avoid the whole issue by creating the objects before the informer is
+			// even started, thus guaranteeing the objects are part of the initial list.
+			for _, identity := range tc.identities {
+				_, err = client.CiliumV2().CiliumIdentities().Create(ctx, &identity, v1.CreateOptions{})
+				if err != nil {
+					t.Fatalf("Can't create identity %s: %s", identity.Name, err)
+				}
+			}
+
 			go backend.ListAndWatch(ctx, FakeHandler{onListDoneChan: listenerReadyChan, onAddFunc: func() { addWaitGroup.Done() }}, stopChan)
 
 			select {
@@ -230,13 +244,6 @@ func TestGetIdentity(t *testing.T) {
 				t.Fatalf("Failed to listen for identities within 2 seconds")
 			}
 
-			for _, identity := range tc.identities {
-				_, err = client.CiliumV2().CiliumIdentities().Create(ctx, &identity, v1.CreateOptions{})
-				if err != nil {
-					t.Fatalf("Can't create identity %s: %s", identity.Name, err)
-				}
-			}
-
 			// Wait for watcher to process the identities in the background
 			addWaitGroup.Wait()