New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: Detect missed tail calls on upgrade/downgrade test #13097
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
pchaigno
added
area/CI
Continuous Integration testing issue or flake
release-note/ci
This PR makes changes to the CI.
ci/flake
This is a known failure that occurs in the tree. Please investigate me!
labels
Sep 4, 2020
pchaigno
force-pushed
the
pr/pchaigno/test-missed-tail-calls-upgrades
branch
from
September 4, 2020 20:51
7c8ee3b
to
a34aa92
Compare
This was referenced Sep 4, 2020
christarazi
approved these changes
Sep 4, 2020
borkmann
approved these changes
Sep 4, 2020
Connectivity disruptions caused by missed tail calls were recently reported at #13015. It was caused by an incorrect handling of a call map rename. We didn't detect it because we don't have code to specifically detect missed tail calls during the upgrade/downgrade test; the test only fails if the connectivity is broken during a long enough period. This commit adds a new function to retrieve the sum of 'Missed tail calls' metrics across all Cilium pods. It is then used in the test after both the upgrade and the subsequent downgrade to check that no drops due to missed tail calls happened. This new test was tested by: - backporting to v1.8 and checking that missed tail calls are indeed detected. - backporting the fixes on the v1.7 (#13052) and v1.8 (#13051) branches and checking that no more tail calls were detected. We need to wait for both #13052 and #13051 to be merged and backported before we can backport this test to v1.7 and v1.8, as it will otherwise fail. Related: #13015, #13051, #13052 Signed-off-by: Paul Chaignon <paul@cilium.io>
pchaigno
force-pushed
the
pr/pchaigno/test-missed-tail-calls-upgrades
branch
from
September 7, 2020 06:12
a34aa92
to
31a91aa
Compare
test-me-please |
Only the net-next build is failing, with known-flake #13062. All other required builds are passing so marking as ready to merge. |
pchaigno
added
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
needs-backport/1.7
labels
Sep 7, 2020
The v1.7 fix is merged. The v1.8 fix backport is going to be merged with #13100. So we can merge the test in v1.8 during the next round of backports. |
This was referenced Sep 9, 2020
maintainer-s-little-helper
bot
moved this from Needs backport from master
to Backport pending to v1.8
in 1.8.4
Sep 9, 2020
maintainer-s-little-helper
bot
moved this from Backport pending to v1.8
to Backport done to v1.8
in 1.8.4
Sep 9, 2020
maintainer-s-little-helper
bot
moved this from Backport pending to v1.8
to Backport done to v1.8
in 1.8.4
Sep 9, 2020
joestringer
moved this from Needs backport from master
to Backport done to v1.7
in 1.7.10
Sep 30, 2020
This was referenced Sep 30, 2020
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/CI
Continuous Integration testing issue or flake
ci/flake
This is a known failure that occurs in the tree. Please investigate me!
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
release-note/ci
This PR makes changes to the CI.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Connectivity disruptions caused by missed tail calls were recently reported at #13015. It was caused by an incorrect handling of a call map rename. We didn't detect it because we don't have code to specifically detect missed tail calls during the upgrade/downgrade test; the test only fails if the connectivity is broken during a long enough period.
This pull request adds a new function to retrieve the sum of 'Missed tail calls' metrics across all Cilium pods. It is then used in the test after both the upgrade and the subsequent downgrade to check that no drops due to missed tail calls happened.
This new test was tested by:
We need to wait for both #13052 and #13051 to be merged and backported before we can backport this test to v1.8, as it will otherwise fail.
Related: #13015, #13051, #13052