Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: ClusterMesh (ci-multicluster) Setup & Test (4, disabled, ipv6, disabled, none) job failure #25064

Closed
michi-covalent opened this issue Apr 22, 2023 · 3 comments
Labels
area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me!

Comments

@michi-covalent
Copy link
Contributor

michi-covalent commented Apr 22, 2023

CI failure

https://github.com/cilium/cilium/actions/runs/4767503942/jobs/8475815473

πŸ“‹ Test Report
❌ 3/28 tests failed (8/174 actions), 5 tests skipped, 7 scenarios skipped:
Test [no-policies]:
  ❌ no-policies/pod-to-service/curl-0: cilium-test/client2-76f4d7c5bc-7zl8k (fd00:10:242:1::7e1d) -> cilium-test/echo-other-node (echo-other-node:8080)
  ❌ no-policies/pod-to-service/curl-2: cilium-test/client-6965d549d5-qrm6g (fd00:10:242:1::ecff) -> cilium-test/echo-other-node (echo-other-node:8080)
Test [no-policies-extra]:
  ❌ no-policies-extra/pod-to-remote-nodeport/curl-0: cilium-test/client-6965d549d5-qrm6g (fd00:10:242:1::ecff) -> cilium-test/echo-other-node (echo-other-node:8080)
  ❌ no-policies-extra/pod-to-remote-nodeport/curl-3: cilium-test/client2-76f4d7c5bc-7zl8k (fd00:10:242:1::7e1d) -> cilium-test/echo-other-node (echo-other-node:8080)
  ❌ no-policies-extra/pod-to-local-nodeport/curl-0: cilium-test/client-6965d549d5-qrm6g (fd00:10:242:1::ecff) -> cilium-test/echo-other-node (echo-other-node:8080)
  ❌ no-policies-extra/pod-to-local-nodeport/curl-2: cilium-test/client2-76f4d7c5bc-7zl8k (fd00:10:242:1::7e1d) -> cilium-test/echo-other-node (echo-other-node:8080)
Test [allow-all-except-world]:
  ❌ allow-all-except-world/pod-to-service/curl-0: cilium-test/client-6965d549d5-qrm6g (fd00:10:242:1::ecff) -> cilium-test/echo-other-node (echo-other-node:8080)
  ❌ allow-all-except-world/pod-to-service/curl-2: cilium-test/client2-76f4d7c5bc-7zl8k (fd00:10:242:1::7e1d) -> cilium-test/echo-other-node (echo-other-node:8080)

sysdumps:

cilium-sysdump-20230421-175904.zip
cilium-sysdump-20230421-180243.zip
cilium-sysdump-20230421-180318.zip
cilium-sysdump-20230421-175938.zip
cilium-sysdump-20230421-180014.zip
cilium-sysdump-20230421-180047.zip
cilium-sysdump-20230421-180122.zip
cilium-sysdump-20230421-180202.zip
cilium-sysdump-context1-final.zip
cilium-sysdump-context2-final.zip

@michi-covalent michi-covalent added area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me! labels Apr 22, 2023
@giorio94
Copy link
Member

giorio94 commented Jun 1, 2023

Hit the same in a different matrix entry (2, disabled, ipv4, wireguard, none): https://github.com/cilium/cilium/actions/runs/5134229230/jobs/9237943355

❌ 3/44 tests failed (8/289 actions), 12 tests skipped, 0 scenarios skipped:
Test [no-policies]:
  ❌ no-policies/pod-to-service/curl-0: cilium-test/client-6965d549d5-pvfts (10.242.1.46) -> cilium-test/echo-other-node (echo-other-node:8080)
  ❌ no-policies/pod-to-service/curl-2: cilium-test/client2-76f4d7c5bc-cs8l8 (10.242.1.180) -> cilium-test/echo-other-node (echo-other-node:8080)
Test [no-policies-extra]:
  ❌ no-policies-extra/pod-to-remote-nodeport/curl-0: cilium-test/client-6965d549d5-pvfts (10.242.1.46) -> cilium-test/echo-other-node (echo-other-node:8080)
  ❌ no-policies-extra/pod-to-remote-nodeport/curl-2: cilium-test/client2-76f4d7c5bc-cs8l8 (10.242.1.180) -> cilium-test/echo-other-node (echo-other-node:8080)
  ❌ no-policies-extra/pod-to-local-nodeport/curl-0: cilium-test/client-6965d549d5-pvfts (10.242.1.46) -> cilium-test/echo-other-node (echo-other-node:8080)
  ❌ no-policies-extra/pod-to-local-nodeport/curl-2: cilium-test/client2-76f4d7c5bc-cs8l8 (10.242.1.180) -> cilium-test/echo-other-node (echo-other-node:8080)
Test [allow-all-except-world]:
  ❌ allow-all-except-world/pod-to-service/curl-1: cilium-test/client2-76f4d7c5bc-cs8l8 (10.242.1.180) -> cilium-test/echo-other-node (echo-other-node:8080)
  ❌ allow-all-except-world/pod-to-service/curl-2: cilium-test/client-6965d549d5-pvfts (10.242.1.46) -> cilium-test/echo-other-node (echo-other-node:8080)

Sysdumps:

cilium-sysdump-20230531-144425.zip
cilium-sysdump-20230531-144504.zip
cilium-sysdump-20230531-144544.zip
cilium-sysdump-20230531-144623.zip
cilium-sysdump-20230531-144700.zip
cilium-sysdump-20230531-144734.zip
cilium-sysdump-20230531-144815.zip
cilium-sysdump-20230531-144853.zip
cilium-sysdump-context1-final.zip
cilium-sysdump-context2-final.zip

@giorio94
Copy link
Member

Hit a similar one: https://github.com/cilium/cilium/actions/runs/5252822823/jobs/9489404395
Matrix entry: (3, disabled, ipv4, ipsec, iptables)

connectivity test failed: 1 tests failed
πŸ“‹ Test Report
❌ 1/43 tests failed (1/277 actions), 13 tests skipped, 1 scenarios skipped:
Test [no-policies]:
  ❌ no-policies/pod-to-service/curl-0: cilium-test/client-6965d549d5-2f962 (10.242.1.200) -> cilium-test/echo-other-node (echo-other-node:8080)

Looking at the sysdump, this failure seems to be due to a race condition between the connectivity test and the propagation of the remote service information. Indeed, cilium service list reported no backends for the echo-other-node service (10.243.85.29):

ID   Frontend              Service Type   Backend                           
...
8    10.243.85.29:8080     ClusterIP                                        

As also confirmed by the Hubble flows:

Jun 13 08:07:20.712: cilium-test/client-6965d549d5-2f962:37474 (ID:74122) <> cilium-test/echo-other-node:8080 (world) from-endpoint FORWARDED (TCP Flags: SYN)
Jun 13 08:07:20.712: cilium-test/client-6965d549d5-2f962:37474 (ID:74122) <> cilium-test/echo-other-node:8080 (world) Service backend not found DROPPED (TCP Flags: SYN)
Jun 13 08:07:21.713: cilium-test/client-6965d549d5-2f962:37474 (ID:74122) <> cilium-test/echo-other-node:8080 (world) from-endpoint FORWARDED (TCP Flags: SYN)
Jun 13 08:07:21.713: cilium-test/client-6965d549d5-2f962:37474 (ID:74122) <> cilium-test/echo-other-node:8080 (world) Service backend not found DROPPED (TCP Flags: SYN)

All subsequent tests passed, and the service backends are correctly populated in the final sysdumps.

Sysdumps:
cilium-sysdump-20230613-080722.zip
cilium-sysdump-context1-final.zip
cilium-sysdump-context2-final.zip

@giorio94
Copy link
Member

giorio94 commented Aug 2, 2023

This issue should be fixed by cilium/cilium-cli#1758, as the CLI now waits for service propagation before starting the actual connectivity tests.

@giorio94 giorio94 closed this as completed Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake ci/flake This is a known failure that occurs in the tree. Please investigate me!
Projects
None yet
Development

No branches or pull requests

2 participants