-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: external workloads workflow consistently fails in "Verify DNS on VM" step #2070
Comments
External workfloads on main Cilium repo is working, perhaps something was changed in there that should have been changed on CLI as well but did not? |
this might be relevant. from the sysdump in https://github.com/cilium/cilium-cli/actions/runs/6725580157 cilium-sysdump-out.zip.zip
|
I did a few tests, and this error message is reported also when using earlier versions (both v1.13.x and v1.14.2), so I'd say it should not be strictly related. That said, the cause is [1] (which overrides the identity created by the clustermesh-apiserver with the default one of nodes), but I'm not 100% sure if it is a real bug or not (I guess it might impact policy enforcement towards external workloads). But I'm still confused why we are seeing this error only in the cilium/cilium-cli workflows. I've tried comparing the agent parameters with those on cilium/cilium, and there seems to be no difference. I've also tried reproducing this locally, without any luck. |
Had a fresh look again, and this looks incredibly suspicious. v1.14.2:
v1.14.3:
The behavioral change has been introduced in cilium/cilium@381e4ec15334.
Edit: this is the likely reason why we didn't hit it in cilium/cilium:
|
Ref: #2070 Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
Ref: #2070 Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
cilium/cilium#27841 changed how the routing mode gets set for GKE, and now it always gets set to "native". Use --datapath-mode flag to force the tunnel mode for the external workload test since that's the only configuration that's known to work [^1]. Fixes: #2070 [^1]: https://docs.cilium.io/en/latest/network/external-workloads/ Signed-off-by: renovate[bot] <bot@renovateapp.com> Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
cilium/cilium#27841 changed how the routing mode gets set for GKE, and now it always gets set to "native". Use --datapath-mode flag to force the tunnel mode for the external workload test since that's the only configuration that's known to work [^1]. Fixes: #2070 [^1]: https://docs.cilium.io/en/latest/network/external-workloads/ Signed-off-by: renovate[bot] <bot@renovateapp.com> Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
The external workloads workflow seems to be failing consistently:
https://github.com/cilium/cilium-cli/actions/runs/6622170504
It looks like
nslookup
is timing out trying to reach the DNS server:This seems to have started around 2023-10-21.
The text was updated successfully, but these errors were encountered: