-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: K8sServicesTest Checks service across nodes Checks ClusterIP Connectivity: exit status 42 #16237
Comments
Another occurence: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.21-kernel-4.9/485 (with PR #16175) |
I'm starting to look into this as part of my CI force duty. Since this issue here nicely links to all possibly related ones (thanks Tobias!), I'm assigning this one to me. |
In both sysdumps, the curl command fails because it receives an Hubble query for the two backend IPs on k8s1: $ cat cilium-8gbzn-hubble_observe.json | hubble observe --print-node-name --ip "fd02::18" --ip "fd02::114"
May 20 07:16:30.398 [k8s1]: [fd02::1d3]:59382 -> default/testds-6fklp:80 to-endpoint FORWARDED (TCP Flags: SYN)
May 20 07:16:30.398 [k8s1]: [fd02::1d3]:59382 <- default/testds-6fklp:80 to-stack FORWARDED (TCP Flags: SYN, ACK)
May 20 07:16:30.398 [k8s1]: default/testds-6fklp:80 <> [fd02::1d3]:59382 Stale or unroutable IP DROPPED (TCP Flags: SYN, ACK)
May 20 07:16:31.420 [k8s1]: default/testds-6fklp:80 <> [fd02::1d3]:59382 Stale or unroutable IP DROPPED (TCP Flags: SYN, ACK)
May 20 07:16:31.428 [k8s1]: default/testds-6fklp:80 <> [fd02::1d3]:59382 Stale or unroutable IP DROPPED (TCP Flags: SYN, ACK)
May 20 07:16:33.437 [k8s1]: default/testds-6fklp:80 <> [fd02::1d3]:59382 Stale or unroutable IP DROPPED (TCP Flags: SYN, ACK)
May 20 07:16:33.443 [k8s1]: default/testds-6fklp:80 <> [fd02::1d3]:59382 Stale or unroutable IP DROPPED (TCP Flags: SYN, ACK)
May 20 07:16:35.403 [k8s1]: default/testclient-hgkxc:59390 -> default/testds-6fklp:80 to-endpoint FORWARDED (TCP Flags: SYN)
May 20 07:16:35.403 [k8s1]: default/testds-6fklp:80 <> default/testclient-hgkxc:59390 to-overlay FORWARDED (TCP Flags: SYN, ACK)
May 20 07:16:35.404 [k8s1]: default/testclient-hgkxc:59390 -> default/testds-6fklp:80 to-endpoint FORWARDED (TCP Flags: ACK)
May 20 07:16:35.404 [k8s1]: default/testclient-hgkxc:59390 -> default/testds-6fklp:80 to-endpoint FORWARDED (TCP Flags: ACK, PSH)
May 20 07:16:35.404 [k8s1]: default/testds-6fklp:80 <> default/testclient-hgkxc:59390 to-overlay FORWARDED (TCP Flags: ACK, PSH)
May 20 07:16:35.404 [k8s1]: default/testclient-hgkxc:59390 -> default/testds-6fklp:80 to-endpoint FORWARDED (TCP Flags: ACK, FIN)
May 20 07:16:35.404 [k8s1]: default/testds-6fklp:80 <> default/testclient-hgkxc:59390 to-overlay FORWARDED (TCP Flags: ACK, FIN)
May 20 07:16:35.405 [k8s1]: default/testclient-hgkxc:59390 -> default/testds-6fklp:80 to-endpoint FORWARDED (TCP Flags: ACK)
[... more successful request with the same pattern and sport != 59382 ...]
May 20 07:16:35.434 [k8s1]: default/testclient-hgkxc:59406 -> default/testds-6fklp:80 to-endpoint FORWARDED (TCP Flags: SYN)
May 20 07:16:35.434 [k8s1]: default/testds-6fklp:80 <> default/testclient-hgkxc:59406 to-overlay FORWARDED (TCP Flags: SYN, ACK)
May 20 07:16:35.434 [k8s1]: default/testclient-hgkxc:59406 -> default/testds-6fklp:80 to-endpoint FORWARDED (TCP Flags: ACK)
May 20 07:16:35.434 [k8s1]: default/testclient-hgkxc:59406 -> default/testds-6fklp:80 to-endpoint FORWARDED (TCP Flags: ACK, PSH)
May 20 07:16:35.434 [k8s1]: default/testds-6fklp:80 <> default/testclient-hgkxc:59406 to-overlay FORWARDED (TCP Flags: ACK, PSH)
May 20 07:16:35.435 [k8s1]: default/testclient-hgkxc:59406 -> default/testds-6fklp:80 to-endpoint FORWARDED (TCP Flags: ACK, FIN)
May 20 07:16:35.435 [k8s1]: default/testds-6fklp:80 <> default/testclient-hgkxc:59406 to-overlay FORWARDED (TCP Flags: ACK, FIN)
May 20 07:16:35.435 [k8s1]: default/testclient-hgkxc:59406 -> default/testds-6fklp:80 to-endpoint FORWARDED (TCP Flags: ACK)
May 20 07:16:37.660 [k8s1]: default/testds-6fklp:80 <> default/testclient-hgkxc:59382 to-overlay FORWARDED (TCP Flags: SYN, ACK)
May 20 07:16:37.661 [k8s1]: default/testclient-hgkxc:59382 -> default/testds-6fklp:80 to-endpoint FORWARDED (TCP Flags: RST) Last two packets as observed on $ cat cilium-mvnqc-hubble_observe.json | hubble observe --print-node-name --ip "fd02::18" --ip "fd02::114"
[...]
May 20 07:16:37.715 [k8s2]: default/testclient-hgkxc:59382 <- default/testds-6fklp:80 to-endpoint FORWARDED (TCP Flags: SYN, ACK)
May 20 07:16:37.715 [k8s2]: default/testclient-hgkxc:59382 <> default/testds-6fklp:80 to-overlay FORWARDED (TCP Flags: RST) Cilium logs for the source IP ( $ grep 'fd02::1d3' pod-kube-system-cilium-8gbzn-cilium-agent.log
4252:2021-05-20T07:16:30.398987047Z level=debug msg="stale identity observed" identity=21966 ipAddr="fd02::1d3" oldIdentity=2 subsys=hubble
4278:2021-05-20T07:16:34.049594404Z level=debug msg="Upserting IP into ipcache layer" identity="{21966 custom-resource false}" ipAddr="fd02::1d3" k8sNamespace=default k8sPodName=testclient-hgkxc key=0 namedPorts="map[]" subsys=ipcache
4279:2021-05-20T07:16:34.049600093Z level=debug msg="Daemon notified of IP-Identity cache state change" identity="{21966 custom-resource false}" ipAddr="{fd02::1d3 ffffffffffffffffffffffffffffffff}" modification=Upsert subsys=datapath-ipcache In both sysdumps the The first request fails because the IPCache has not been updated yet at that point, so instead of sending the first response packet through the tunnel ( In summary, timeline on
It seems the |
Looking at the test code, I think it's pretty clear that the RST is caused by the first curl timing out due to the IPCache propagation delay. The first curl is reporting exit code 28, which is A simple fix would be to increase the timeout, but it looks like this is not the only CI test suffering from IPCache propagation delays. Edit: From the logs, it looks like identity allocation for |
This increases the curl connection timeout from 5 to 15 seconds to avoid issues with IPCache propagation delay. On Cilium master an 1.10, it seems that IPCache updates in CI can take up to 4-8 seconds. CI flakes likely caused by the increased IPCache propagation delay: - cilium#13839 - cilium#14959 - cilium#15103 - cilium#16237 Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
I assume this is should be fixed by #16381 now. |
Stacktrace:
https://jenkins.cilium.io/job/Cilium-PR-K8s-1.21-kernel-4.9/501
ec0e028f_K8sServicesTest_Checks_service_across_nodes_Checks_ClusterIP_Connectivity.zip
Possibly related to: #15103, #13839, #13011, #12690.
The text was updated successfully, but these errors were encountered: