New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bgpv1: avoid object tracker vs informer race #31010
Conversation
/test |
Thanks @bimmlerd for thorough explanation. I think we have this pattern repeated across tests in Cilium which uses fake client set. For eg BGP component tests Does it require wider discussion to address all these test cases ? |
I found ControllerRuntime has a Anyways, thanks for working on this David! |
Yep, see also #30906 and #30885. I'll cc you on a slack thread, also.
Nice find. I took a brief peek, and unfortunately it doesn't seem like this will be enough - the |
CI triage:
|
237520b
to
91fe546
Compare
/test (rebased to get the fix for the gateway conformance test) |
The fake k8s testing infrastructure has an unfortunate interaction between real informers and the object tracker used in fake clientsets: the informer uses ListAndWatch to subscribe to the api server, but ListAndWatch relies on the ResourceVersion field to bridge the gap between the List and the Watch. The simple object tracker does not provide the resource versioning, nor does its Watch implementation attempt to replay existing state. The race window, hence, allows for creation of objects _after_ the initial list, but _before_ the establishment of Watch. These objects are not observed by the informer and thus tests are likely to fail. As a workaround, we ensure that the object tracker has registered a watcher before we start the test in earnest. It is only important that we do not create/update/delete objects in the window between starting the hive (and hence running of the informer) and having ensured that the watcher is in place. Creation prior to starting the hive is okay, as well as after ensuring the watcher exists. Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
91fe546
to
4098bbb
Compare
/test |
Fixed a race in my fix, PTAL 😓 Specifically, I was closing the channel before actually calling Quantified using the deflaking tips from https://gist.github.com/liggitt/6a3a2217fa5f846b52519acfc0ffece0#running-unit-tests-to-reproduce-flakes I'm now confident that it's less broken than before :) before: |
Marking as needs-backport to v1.15. I hit this in #31354. |
The fake k8s testing infrastructure has an unfortunate interaction between real informers and the object tracker used in fake clientsets: the informer uses ListAndWatch to subscribe to the api server, but ListAndWatch relies on the ResourceVersion field to bridge the gap between the List and the Watch. The simple object tracker does not provide the resource versioning, nor does its Watch implementation attempt to replay existing state.
The race window, hence, allows for creation of objects after the initial list, but before the establishment of Watch. These objects are not observed by the informer and thus tests are likely to fail.
As a workaround, we ensure that the object tracker has registered a watcher before we start the test in earnest. It is only important that we do not create/update/delete objects in the window between starting the hive (and hence running of the informer) and having ensured that the watcher is in place. Creation prior to starting the hive is okay, as well as after ensuring the watcher exists.
Fixes: #31006