Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix kind job with network policy failures #26639

Merged
merged 2 commits into from Jul 11, 2023
Merged

Conversation

aojea
Copy link
Contributor

@aojea aojea commented Jul 5, 2023

The github runners do not have enough resources to run multiple network policy tests in parallel and this cause the test to freeze and create flakiness, remove the legacy tests are deprecated and removed in kubernetes 1.28 and only run 5 tests in parallel to avoid problems caused by resource starvation on CI.

Example of resource usage with this PR

https://github.com/cilium/cilium/actions/runs/5518744069?pr=26639

Example of resource usage without it

https://github.com/cilium/cilium/actions/runs/5513171662

xref: kubernetes/kubernetes#118915

Fixes: #26439,#26492

@aojea aojea requested review from a team as code owners July 5, 2023 09:23
@aojea aojea requested review from aanm and nebril July 5, 2023 09:23
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jul 5, 2023
@aojea
Copy link
Contributor Author

aojea commented Jul 5, 2023

interesting, operator and kube-scheduler are crashlooping

2023-07-05T10:09:43.0786192Z kube-system                 cilium-4cf87                                           1/1     Running            0                41m   172.18.0.3     cilium-testing-worker2         <none>           <none>
2023-07-05T10:09:43.0787770Z kube-system                 cilium-nvdnt                                           1/1     Running            0                41m   172.18.0.2     cilium-testing-worker          <none>           <none>
2023-07-05T10:09:43.0788596Z kube-system                 cilium-operator-b968b995f-64xsl                        0/1     CrashLoopBackOff   11 (6m16s ago)   41m   172.18.0.3     cilium-testing-worker2         <none>           <none>
2023-07-05T10:09:43.0806045Z kube-system                 cilium-qh8tz                                           1/1     Running            0                41m   172.18.0.4     cilium-testing-control-plane   <none>           <none>
2023-07-05T10:09:43.0807266Z kube-system                 coredns-5d78c9869d-8qs5n                               1/1     Running            0                44m   10.244.2.242   cilium-testing-worker          <none>           <none>
2023-07-05T10:09:43.0814981Z kube-system                 coredns-5d78c9869d-tm78j                               1/1     Running            0                44m   10.244.2.169   cilium-testing-worker          <none>           <none>
2023-07-05T10:09:43.0815820Z kube-system                 etcd-cilium-testing-control-plane                      1/1     Running            0                44m   172.18.0.4     cilium-testing-control-plane   <none>           <none>
2023-07-05T10:09:43.0839934Z kube-system                 kube-apiserver-cilium-testing-control-plane            1/1     Running            2 (15m ago)      44m   172.18.0.4     cilium-testing-control-plane   <none>           <none>
2023-07-05T10:09:43.0854489Z kube-system                 kube-controller-manager-cilium-testing-control-plane   1/1     Running            10 (2m45s ago)   44m   172.18.0.4     cilium-testing-control-plane   <none>           <none>
2023-07-05T10:09:43.0859446Z kube-system                 kube-scheduler-cilium-testing-control-plane            0/1     CrashLoopBackOff   7 (3m47s ago)    44m   172.18.0.4     cilium-testing-control-plane   <none>           <none>
2023-07-05T10:09:43.0895976Z local-path-storage          local-path-provisioner-6bc4bddd6b-w2lh4                1/1     Running            0                44m   10.244.2.2     cilium-testing-worker          <none>           <none>
2023-07-05T10:09:43.0926495Z netpol-x-1004               a                                                      1/1     Running            0                40m   10.244.1.117   cilium-testing-worker2         <none>           <none>

@aojea aojea force-pushed the netpol_legacy branch 2 times, most recently from 623f367 to 8c4c425 Compare July 6, 2023 08:58
@aojea aojea changed the title .github: don't run legacy network policy tests [WIP] debug kind job with network policy failures Jul 6, 2023
@aojea
Copy link
Contributor Author

aojea commented Jul 6, 2023

/test conformance-k8s-kind-network-policies

Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove them already given that we are still in k8s 1.27?

@aanm aanm self-requested a review July 6, 2023 15:23
@aanm aanm marked this pull request as draft July 6, 2023 15:23
@aojea
Copy link
Contributor Author

aojea commented Jul 6, 2023

Should we remove them already given that we are still in k8s 1.27?

we should, but I'm trying to repro the flake first with debug mode, I hit it without the legacy tests too so

@aojea
Copy link
Contributor Author

aojea commented Jul 6, 2023

incredible, now 3/3 without failures, one more time

@aojea aojea force-pushed the netpol_legacy branch 2 times, most recently from af21812 to fb3591f Compare July 10, 2023 21:24
@maintainer-s-little-helper
Copy link

Commit fb3591f62ee4fbcbb53d02203a8f417899eb45fd does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jul 10, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jul 11, 2023
@aojea aojea marked this pull request as ready for review July 11, 2023 12:51
Signed-off-by: Antonio Ojea <aojea@google.com>
Github runners does not have enough resources to deal with the
network policy test that runs multiple pods in parallal and
it may happen that the tests get stuck and start to timeout
causing flakiness on the CI.

Signed-off-by: Antonio Ojea <aojea@google.com>
@aojea aojea changed the title [WIP] debug kind job with network policy failures fix kind job with network policy failures Jul 11, 2023
@aanm aanm added area/CI Continuous Integration testing issue or flake needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Jul 11, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.14.0 Jul 11, 2023
@aanm
Copy link
Member

aanm commented Jul 11, 2023

/test

@aanm aanm added the release-note/misc This PR makes changes that have no direct user impact. label Jul 11, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jul 11, 2023
@aanm aanm merged commit 08196fe into cilium:main Jul 11, 2023
65 checks passed
@jibi jibi mentioned this pull request Jul 13, 2023
13 tasks
@jibi jibi added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Jul 13, 2023
@aanm aanm added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Jul 14, 2023
@aanm aanm moved this from Needs backport from main to Backport done to v1.14 in 1.14.0 Jul 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. release-note/misc This PR makes changes that have no direct user impact.
Projects
No open projects
1.14.0
Backport done to v1.14
Development

Successfully merging this pull request may close these issues.

K8sUpstreamNetConformance is hard to debug
3 participants