New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: Redeploy DNS after endpointRoutes reconfiguration #16767
Conversation
test-me-please EDIT: All green. |
In general up until now, Cilium has expected endpointRoutes mode to be set to exactly one value upon deployment and for that value to stay the same for the remainder of operation. Toggling it can lead to a mix of endpoints in different datapath modes which is not well covered in CI. In Github issue #16717 we observed that if the testsuite toggles this setting then we can end up with kubedns pods remaining in endpoint routes mode, even though the rest of the daemon (and other pods) are not configured in this mode. This can lead to connectivity issues in DNS, and a range of test failures in subsequent tests because DNS is broken. Longer term to resolve this, we could improve on Cilium to ensure that users can successfully toggle this setting on or off at runtime and properly handle this case, or alternatively shift all logic over to endpoint-routes mode by default and disable the other option. Given that CI for the master branch is in a poor state due to this issue today, and that part of the issue is CI reconfiguring the datapath state of Cilium during the test setup in an unsupported manner, this commit proposes to force DNS pod redeployment as part of setup any time a test reconfigures the endpointRoutes mode. This should mitigate the testing side issue while we mull over the right longer-term solution. Signed-off-by: Joe Stringer <joe@cilium.io>
9c83570
to
0e30022
Compare
test-me-please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New version LGTM as well.
On the fence about v1.9 backport, from out-of-band discussion with @pchaigno this flake is likely caused by #16227 which was backported to v1.9. I briefly looked through scheduled CI job runs for v1.9 and didn't observe the issue at all. We'll hold off for now. Now that we know how to detect this problem, it's easy to figure out if we see this particular issue. If we see it on v1.9 we can backport it. |
Marking for backport to 1.9 following discussion here: #18031 (comment) |
In general up until now, Cilium has expected endpointRoutes mode to be
set to exactly one value upon deployment and for that value to stay the
same for the remainder of operation. Toggling it can lead to a mix of
endpoints in different datapath modes which is not well covered in CI.
In Github issue #16717 we observed that if the testsuite toggles this
setting then we can end up with kubedns pods remaining in endpoint
routes mode, even though the rest of the daemon (and other pods) are not
configured in this mode. This can lead to connectivity issues in DNS,
and a range of test failures in subsequent tests because DNS is broken.
Longer term to resolve this, we could improve on Cilium to ensure that
users can successfully toggle this setting on or off at runtime and
properly handle this case, or alternatively shift all logic over to
endpoint-routes mode by default and disable the other option.
Given that CI for the master branch is in a poor state due to this issue
today, and that part of the issue is CI reconfiguring the datapath state
of Cilium during the test setup in an unsupported manner, this commit
proposes to force DNS pod redeployment as part of setup any time a test
reconfigures the endpointRoutes mode. This should mitigate the testing
side issue while we mull over the right longer-term solution.
Fixes: #16717
Fixes: #16227
Related: #14955