test: force restarting of Cilium pods #11613

nebril · 2020-05-20T11:25:52Z

Fixes race between cilium being restarted and connectivity test.

maintainer-s-little-helper · 2020-05-20T11:25:54Z

Please set the appropriate release note label.

nebril · 2020-05-20T11:27:10Z

test-focus K8sFQDNTest.*

nebril · 2020-05-20T11:40:34Z

test-gke K8sFQDNTest.*

nebril · 2020-05-20T13:55:19Z

test-focus K8sFQDNTest.*

nebril · 2020-05-20T13:55:23Z

test-gke K8sFQDNTest.*

coveralls · 2020-05-20T14:25:56Z

Coverage decreased (-0.006%) to 36.874% when pulling b885df8 on pr/fqdn-delete-cilium into 60786b6 on master.

nebril · 2020-05-20T17:01:53Z

test-focus K8sFQDNTest.*

nebril · 2020-05-21T12:30:07Z

test-focus K8sFQDNTest.*

nebril · 2020-05-21T12:31:28Z

test-me-please

nebril · 2020-05-21T13:11:00Z

test-me-please

nebril · 2020-05-21T13:11:09Z

test-focus K8sFQDNTest.*

nebril · 2020-05-21T13:44:46Z

test-focus K8sFQDNTest.*

nebril · 2020-05-22T09:39:55Z

test-me-please

nebril · 2020-05-22T13:46:28Z

retest-runtime

nebril · 2020-05-22T13:46:36Z

retest-4.19

test/helpers/kubectl.go

test/k8sT/fqdn.go

test/helpers/kubectl.go

nebril · 2020-05-22T14:39:35Z

test-me-please

nebril · 2020-05-25T10:15:42Z

test-gke

test/helpers/kubectl.go

This change ensures that Cilium pods are being restarted in "Restart Cilium validate that FQDN is still working" test. By repeatedly calling `kill 1` in all Cilium pods, which was fastest way of restarting a pod I found. This test has been flaking a lot lately and the theory is that it was a race between connectivity test and restarting the pod. Signed-off-by: Maciej Kwiek <maciej@isovalent.com>

nebril · 2020-05-25T10:59:24Z

test-me-please

errordeveloper · 2020-05-26T20:23:53Z

@nebril could you elaborate on why exactly connectivity check is interfering here, just trying to understand the context better.

nebril · 2020-05-27T11:58:19Z

@errordeveloper the connectivity check was not interfering, but the point of the test is to run the test while Cilium is recovering to validate that dns cache works during restarting Cilium pods.

errordeveloper · 2020-05-27T12:03:14Z

@nebril that sounds like it would add even more racy behaviour, it sound to me that it would be more reliable to delete the the daemonset instead, or taint and drain the nodes.

nebril · 2020-05-27T12:13:39Z

@errordeveloper AFAIU if we delete the daemonset, Cilium pods will uninstall cleanly, deleting bpf maps. If we drain the nodes, the same applies, and also how will we test the workload running on a drained node?

errordeveloper · 2020-05-27T12:53:56Z

if we delete the daemonset, Cilium pods will uninstall cleanly, deleting bpf maps

I thought the opposite was actually the case.

how will we test the workload running on a drained node?

You just need to have the right toleration set. It may the case that it's not a full drain that is needed, but a taint that Cilium doesn't tolerate followed by deletion of the Cilium pod(s).

nebril · 2020-05-27T13:42:39Z

@errordeveloper without the Cilium pod scheduled on a node, we end up with a node that doesn't handle networking via our cni plugin, which is not what we want to test afaiu.

errordeveloper · 2020-05-27T13:53:25Z

@nebril I believe missing pod will have the same effect as restarting pod in this case, and just to be clear, my view is that ad-hoc commands is exactly what we should stop doing in the tests. If this is a hack that fixes another hack, I get that :)

nebril requested a review from a team as a code owner May 20, 2020 11:25

maintainer-s-little-helper bot added the dont-merge/needs-release-note label May 20, 2020

maintainer-s-little-helper bot added this to In progress in 1.8.0 May 20, 2020

nebril marked this pull request as draft May 20, 2020 11:26

nebril force-pushed the pr/fqdn-delete-cilium branch from 89f9364 to 706b6e6 Compare May 20, 2020 13:55

nebril force-pushed the pr/fqdn-delete-cilium branch from 706b6e6 to 73cc925 Compare May 21, 2020 12:28

nebril changed the title ~~test: delete Cilium daemonset in fqdn test~~ test: force restarting of Cilium pods May 21, 2020

nebril force-pushed the pr/fqdn-delete-cilium branch from 73cc925 to b133801 Compare May 21, 2020 13:10

nebril force-pushed the pr/fqdn-delete-cilium branch from b133801 to df79656 Compare May 21, 2020 13:44

nebril force-pushed the pr/fqdn-delete-cilium branch from df79656 to fbe86da Compare May 21, 2020 14:24

nebril marked this pull request as ready for review May 22, 2020 09:40

nebril added the release-note/ci This PR makes changes to the CI. label May 22, 2020

maintainer-s-little-helper bot removed the dont-merge/needs-release-note label May 22, 2020

tgraf requested changes May 22, 2020

View reviewed changes

test/helpers/kubectl.go Show resolved Hide resolved

test/k8sT/fqdn.go Outdated Show resolved Hide resolved

test/helpers/kubectl.go Outdated Show resolved Hide resolved

test/helpers/kubectl.go Outdated Show resolved Hide resolved

nebril force-pushed the pr/fqdn-delete-cilium branch from fbe86da to a3bd2d5 Compare May 22, 2020 14:39

nebril requested a review from tgraf May 22, 2020 14:39

aanm reviewed May 25, 2020

View reviewed changes

test/helpers/kubectl.go Outdated Show resolved Hide resolved

nebril force-pushed the pr/fqdn-delete-cilium branch from a3bd2d5 to b885df8 Compare May 25, 2020 10:58

tgraf approved these changes May 26, 2020

View reviewed changes

aanm merged commit b514a1c into master May 28, 2020

1.8.0 automation moved this from In progress to Merged May 28, 2020

aanm deleted the pr/fqdn-delete-cilium branch May 28, 2020 13:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: force restarting of Cilium pods #11613

test: force restarting of Cilium pods #11613

nebril commented May 20, 2020 •

edited

maintainer-s-little-helper bot commented May 20, 2020

nebril commented May 20, 2020

nebril commented May 20, 2020

nebril commented May 20, 2020

nebril commented May 20, 2020

coveralls commented May 20, 2020 •

edited

nebril commented May 20, 2020

nebril commented May 21, 2020

nebril commented May 21, 2020

nebril commented May 21, 2020

nebril commented May 21, 2020

nebril commented May 21, 2020

nebril commented May 22, 2020

nebril commented May 22, 2020

nebril commented May 22, 2020

nebril commented May 22, 2020

nebril commented May 25, 2020

nebril commented May 25, 2020

errordeveloper commented May 26, 2020

nebril commented May 27, 2020

errordeveloper commented May 27, 2020

nebril commented May 27, 2020

errordeveloper commented May 27, 2020

nebril commented May 27, 2020

errordeveloper commented May 27, 2020

test: force restarting of Cilium pods #11613

test: force restarting of Cilium pods #11613

Conversation

nebril commented May 20, 2020 • edited

maintainer-s-little-helper bot commented May 20, 2020

nebril commented May 20, 2020

nebril commented May 20, 2020

nebril commented May 20, 2020

nebril commented May 20, 2020

coveralls commented May 20, 2020 • edited

nebril commented May 20, 2020

nebril commented May 21, 2020

nebril commented May 21, 2020

nebril commented May 21, 2020

nebril commented May 21, 2020

nebril commented May 21, 2020

nebril commented May 22, 2020

nebril commented May 22, 2020

nebril commented May 22, 2020

nebril commented May 22, 2020

nebril commented May 25, 2020

nebril commented May 25, 2020

errordeveloper commented May 26, 2020

nebril commented May 27, 2020

errordeveloper commented May 27, 2020

nebril commented May 27, 2020

errordeveloper commented May 27, 2020

nebril commented May 27, 2020

errordeveloper commented May 27, 2020

nebril commented May 20, 2020 •

edited

coveralls commented May 20, 2020 •

edited