New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test/controlplane: Disable endpoint GC #26383
test/controlplane: Disable endpoint GC #26383
Conversation
I'm not super convinced this will solve the flake as we are waiting 10s for labels to appear. Can you change |
Sure, makes sense. Let's keep the issue open for now 👍 |
The Endpoint GC is currently not needed in controlplane testing. Since that GC needs to lock the endpoints map in the endpoint manager, it can possibly delay the execution of the endpoint manager "node subscriber" callback, that needs to hold the same lock. This in turn might lead to the slow down of the CiliumNodeUpdater callback that needs to wait for the previous callback before running. The inability of the CiliumNodeUpdater callback to run in time seems to be the possible culprit of the controlplane CiliumNodes test flake. To mitigate the flake, the endpoint GC is then disabled in controlplane testing. Related: cilium#26082 Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
3ca38cb
to
a575a13
Compare
/test |
Since this is just a change to the controlplane test environment setup, running the full set of CI tests is not needed. Marking ready-to-merge |
Ah yea, I wanted to trigger the |
Maybe we should change this to ensure that when we modify the control plane tsets, it will run the integration tests? Otherwise we're only relying on developers locally running the tests to confirm that the PR doesn't break something. (I agree that for this PR the change looks innocent enough it shouldn't introduce a failure, but this is likely to come up again in future, perhaps with a more complicated change. I don't want to encourage this pattern where we propose PRs, ignore CI and label the PR ready-to-merge to bypass all of the validation processes. We apply those checks for every other change that goes into Cilium, so they should be passable for PRs like this too.) |
Yes, I agree. I was in the middle of opening PR for it :) |
The Endpoint GC is currently not needed in controlplane testing.
Since that GC needs to lock the endpoints map in the endpoint manager, it can possibly delay the execution of the endpoint manager "node subscriber" callback, that needs to hold the same lock. This in turn might lead to the slow down of the CiliumNodeUpdater callback that needs to wait for the previous callback before running. The inability of the CiliumNodeUpdater callback to run in time seems to be the possible culprit of the controlplane CiliumNodes test flake.
To mitigate the flake, the endpoint GC is then disabled in controlplane testing.
Related: #26082