Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.15] Author Backport of #30855 (Address race condition in TestGetIdentity) #31541

Merged
merged 2 commits into from Mar 26, 2024

Conversation

bimmlerd
Copy link
Member

Once this PR is merged, a GitHub action will update the labels of these PRs:

 30885

bimmlerd and others added 2 commits March 21, 2024 13:58
[ upstream commit deb0687 ]

[ backporter's note: onadd func was not backported doing it as part of
  this commit ]

TestGetIdentity has been unreliable, even withstanding some previous
attempts at deflaking.

The issue lies in the use of the k8s fake infrastructure: the simple
testing object tracker of client-go does _not_ set the ResourceVersion
for resources created. This interacts badly with the logic of the
client-go reflector's ListAndWatch method, which relies on the resource
version to close the racy window between its List and Watch calls. The
real k8s api-server will replay events which occur after the completion
of List and before the establishment of the Watch, thanks to the
ResourceVersion. The object tracker's Watch implementation, however,
does (and can) not do so, as it doesn't have a resource version to
determine which events it would need to replay.

Notably, the HasSynced method of the informer will return true once the
initial List has succeeded. This isn't a guarantee for the Watch to be
established (and indeed, the reflector establishes the Watch _after_ the
list). This is fine for reality, again thanks to the resource version
and the api-server replaying.

The race, hence, is that the creation of the identities can happen
concurrently to the establishment of the watch (HasSynced guarantees
that it happens _after_ the list), and thus we race the creation of the
"RaceFreeWatcher" in the object tracker. If the watcher is late, it
misses the creation of an identity, and we time out waiting on the wait
group.

To fix this, instead of attempting to wait for the Watch
establishment (which doesn't seem easy, on first glance), just create
the resources _before_ list and watch is started, so that they are
returned in the initial list call.

Prior to this patch, the following commandline typically failed quickly:

while true; do go test ./pkg/k8s/identitybackend -run 'TestGetIdentity' -v -count=1 -timeout=10s || break; done

After this patch, it ran thousands of times reliably.

Co-authored-by: Fabian Fischer <fabian.fischer@isovalent.com>
Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
[ upstream commit 1907334 ]

The previous patch explains and fixes a flake, this patch removes some
of the remaining cruft from earlier attempts at fixing said flake, as
well as running the test in parallel (for efficiency).

Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
@bimmlerd bimmlerd added kind/backports This PR provides functionality previously merged into master. backport/1.15 This PR represents a backport for Cilium 1.15.x of a PR that was merged to main. labels Mar 21, 2024
@bimmlerd bimmlerd changed the title v1.15 Backports 2024-03-21 [v1.15] Author Backport of #30855 (Address race condition in TestGetIdentity) Mar 21, 2024
@bimmlerd bimmlerd requested a review from gandro March 21, 2024 13:01
@bimmlerd bimmlerd marked this pull request as ready for review March 21, 2024 13:11
@bimmlerd bimmlerd requested a review from a team as a code owner March 21, 2024 13:11
@bimmlerd
Copy link
Member Author

/test-backport-1.15

@bimmlerd bimmlerd added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 26, 2024
@squeed squeed merged commit d0a8f78 into v1.15 Mar 26, 2024
219 checks passed
@squeed squeed deleted the pr/v1.15-backport-2024-03-21-01-55 branch March 26, 2024 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.15 This PR represents a backport for Cilium 1.15.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants