New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: Delete DNS pods in AfterAll for datapath tests #16835
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, simple enough. This is a prime example of something that could easily reoccur again because someone isn't aware that this must be done if they configure endpoint-routes mode. However, it seems this would be unlikely to happen in the near future so we don't have to solve this problem now.
test-me-please |
I manually tried this out by just running:
And sure enough, kube-dns was gone at the end along with Cilium. I'm not the biggest fan of how the testsuite leaves your cluster in a broken state after it completes (rather than returning to where it started with whatever apps you previously had deployed), but I think that's already the case so if we want to fix that we can follow up separately. |
Cilium init pods went into crashloopbackoff, many of the jobs seem to have the same issue:
|
I'll try rebasing to see whether I just need to pull in #16815 to resolve the issue. |
f23532b
to
43464c4
Compare
test-me-please |
This init container change was introduced in #16815. Looks like there is a mismatch between cilium daemonset (up-to-date with master) used and image built on your PR (not up-to-date). Yeah, please rebase your PR. |
Also hit #16846 . |
Evidently not all of the tests ensure that DNS is deployed before running, so the current version of the fix breaks those tests because they're relying on DNS from previous runs. Maybe I'll just do a complete Cilium + DNS forced redeploy in the AfterAll of the relevant test suites (K8sCustomCalls, K8sDatapathConfig). It may take a few minutes longer but should do the trick. |
Commit a0e7712 ("test: Redeploy DNS after changing endpointRoutes") didn't go quite far enough: It ensured that between individual tests in a given file, the DNS pods would be redeployed during the next run if there were significant enough datapath changes. However, the way it did this was by storing state within the 'kubectl' variable, which is recreated in each test file. So if the last test in one CI run enabled endpoint routes mode, then the DNS pods would not be redeployed to disable endpoint routes mode as part of the next test. Fix it by redeploying DNS after removing Cilium from the cluster. Kubernetes will remove the current DNS pods and reschedule them, but they will not launch until the next test deploys a new version of Cilium. Reported-by: Chris Tarazi <chris@isovalent.com> Fixes: 0e77127dcd7 ("test: Redeploy DNS after changing endpointRoutes") Related: cilium#16717 Signed-off-by: Joe Stringer <joe@cilium.io>
43464c4
to
e4a887b
Compare
OK, turns out Cilium was already getting removed at the end of the tests, we just weren't cleaning up the DNS pods. My previous iteration removed the DNS deployment, and apparently subsequent tests wouldn't ensure this was available. But if Cilium has been removed, it should be sufficient to simply delete the DNS pods and let them get rescheduled into the cluster (then hopefully the next iteration of Cilium will provision the networking for them). |
test-me-please |
Triage:
This is addressing a known flake in the tree and is now only hitting other known flakes. Good to merge. |
Marking for backport to 1.9 following discussion here: #18031 (comment) |
Commit a0e7712 ("test: Redeploy DNS after changing endpointRoutes")
didn't go quite far enough: It ensured that between individual tests in
a given file, the DNS pods would be redeployed during the next run if
there were significant enough datapath changes. However, the way it did
this was by storing state within the 'kubectl' variable, which is
recreated in each test file. So if the last test in one CI run enabled
endpoint routes mode, then the DNS pods would not be redeployed to
disable endpoint routes mode as part of the next test.
Fix it by just deleting the overall kube-dns deployment at the end of
any test files which reconfigure this datapath option. I assume that the
next test will set up DNS again if it's not available.
Reported-by: Chris Tarazi chris@isovalent.com
Fixes: 0e77127dcd7 ("test: Redeploy DNS after changing endpointRoutes")
Fixes: #16717