Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test/controlplane: Disable endpoint GC #26383

Merged

Conversation

pippolo84
Copy link
Member

@pippolo84 pippolo84 commented Jun 20, 2023

The Endpoint GC is currently not needed in controlplane testing.

Since that GC needs to lock the endpoints map in the endpoint manager, it can possibly delay the execution of the endpoint manager "node subscriber" callback, that needs to hold the same lock. This in turn might lead to the slow down of the CiliumNodeUpdater callback that needs to wait for the previous callback before running. The inability of the CiliumNodeUpdater callback to run in time seems to be the possible culprit of the controlplane CiliumNodes test flake.

To mitigate the flake, the endpoint GC is then disabled in controlplane testing.

Related: #26082

@pippolo84 pippolo84 added area/CI Continuous Integration testing issue or flake release-note/bug This PR fixes an issue in a previous release of Cilium. labels Jun 20, 2023
@pippolo84 pippolo84 requested a review from a team as a code owner June 20, 2023 15:45
@pippolo84 pippolo84 requested a review from nebril June 20, 2023 15:45
@marseel
Copy link
Contributor

marseel commented Jun 21, 2023

I'm not super convinced this will solve the flake as we are waiting 10s for labels to appear. Can you change Fixes to Related and let's keep the issue open for now?

@pippolo84
Copy link
Member Author

I'm not super convinced this will solve the flake as we are waiting 10s for labels to appear. Can you change Fixes to Related and let's keep the issue open for now?

Sure, makes sense. Let's keep the issue open for now 👍

The Endpoint GC is currently not needed in controlplane testing.

Since that GC needs to lock the endpoints map in the endpoint manager,
it can possibly delay the execution of the endpoint manager "node
subscriber" callback, that needs to hold the same lock.  This in turn
might lead to the slow down of the CiliumNodeUpdater callback that needs
to wait for the previous callback before running.  The inability of the
CiliumNodeUpdater callback to run in time seems to be the possible
culprit of the controlplane CiliumNodes test flake.

To mitigate the flake, the endpoint GC is then disabled in controlplane
testing.

Related: cilium#26082

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
@pippolo84 pippolo84 force-pushed the pr/pippolo84/controlplane-disable-endpoint-gc branch from 3ca38cb to a575a13 Compare June 21, 2023 12:18
@marseel
Copy link
Contributor

marseel commented Jun 21, 2023

/test

@pippolo84
Copy link
Member Author

Since this is just a change to the controlplane test environment setup, running the full set of CI tests is not needed. Marking ready-to-merge

@pippolo84 pippolo84 added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 21, 2023
@marseel
Copy link
Contributor

marseel commented Jun 21, 2023

Ah yea, I wanted to trigger the integration-test, but apparently changes in "test" directory are ignored for these tests.

@joestringer
Copy link
Member

joestringer commented Jun 21, 2023

Ah yea, I wanted to trigger the integration-test, but apparently changes in "test" directory are ignored for these tests.

Maybe we should change this to ensure that when we modify the control plane tsets, it will run the integration tests? Otherwise we're only relying on developers locally running the tests to confirm that the PR doesn't break something.

(I agree that for this PR the change looks innocent enough it shouldn't introduce a failure, but this is likely to come up again in future, perhaps with a more complicated change. I don't want to encourage this pattern where we propose PRs, ignore CI and label the PR ready-to-merge to bypass all of the validation processes. We apply those checks for every other change that goes into Cilium, so they should be passable for PRs like this too.)

@marseel
Copy link
Contributor

marseel commented Jun 21, 2023

Yes, I agree. I was in the middle of opening PR for it :)

@borkmann borkmann merged commit a42b70b into cilium:main Jun 22, 2023
48 of 54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants