-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address race condition in TestGetIdentity #30885
Conversation
d04605f
to
5ae54e1
Compare
/test |
CI triage (WIP):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment nits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice writeup!
TestGetIdentity has been unreliable, even withstanding some previous attempts at deflaking. The issue lies in the use of the k8s fake infrastructure: the simple testing object tracker of client-go does _not_ set the ResourceVersion for resources created. This interacts badly with the logic of the client-go reflector's ListAndWatch method, which relies on the resource version to close the racy window between its List and Watch calls. The real k8s api-server will replay events which occur after the completion of List and before the establishment of the Watch, thanks to the ResourceVersion. The object tracker's Watch implementation, however, does (and can) not do so, as it doesn't have a resource version to determine which events it would need to replay. Notably, the HasSynced method of the informer will return true once the initial List has succeeded. This isn't a guarantee for the Watch to be established (and indeed, the reflector establishes the Watch _after_ the list). This is fine for reality, again thanks to the resource version and the api-server replaying. The race, hence, is that the creation of the identities can happen concurrently to the establishment of the watch (HasSynced guarantees that it happens _after_ the list), and thus we race the creation of the "RaceFreeWatcher" in the object tracker. If the watcher is late, it misses the creation of an identity, and we time out waiting on the wait group. To fix this, instead of attempting to wait for the Watch establishment (which doesn't seem easy, on first glance), just create the resources _before_ list and watch is started, so that they are returned in the initial list call. Prior to this patch, the following commandline typically failed quickly: while true; do go test ./pkg/k8s/identitybackend -run 'TestGetIdentity' -v -count=1 -timeout=10s || break; done After this patch, it ran thousands of times reliably. Co-authored-by: Fabian Fischer <fabian.fischer@isovalent.com> Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
The previous patch explains and fixes a flake, this patch removes some of the remaining cruft from earlier attempts at fixing said flake, as well as running the test in parallel (for efficiency). Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
5ae54e1
to
15ba583
Compare
/test |
CI triage ( 😞 )
|
There are some non-trivial conflicts on the v1.15 branch, and given this is about races I think it's probably better if I don't try to resolve them without context. Marking as "backport/author" |
Please read the commit message of the first commit to understand the race - second commit is a bit of cleanup.
Fixes: #30873
Fixes: #30255