New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.14 Backports 2023-09-08 #28021
Merged
michi-covalent
merged 1 commit into
cilium:v1.14
from
margamanterola:pr/v1.14-backport-2023-09-08
Sep 8, 2023
Merged
v1.14 Backports 2023-09-08 #28021
michi-covalent
merged 1 commit into
cilium:v1.14
from
margamanterola:pr/v1.14-backport-2023-09-08
Sep 8, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
margamanterola
added
kind/backports
This PR provides functionality previously merged into master.
backport/1.14
This PR represents a backport for Cilium 1.14.x of a PR that was merged to main.
labels
Sep 8, 2023
/test-backport-1.14 |
/ci-ipsec-upgrade |
Failures are unrelated to IPSec (currently v1.14 is failing due to a recent commit that broke some tests) |
[ upstream commit c9ea7a5 ] When rolling cilium-agent or doing an upgrade while running stress test with encryption a small number of NoStateIn errors are seen. To capture the error state (a cilium_host IP without an xfrm state rule) you need to get into the pod near pod init and get somewhat lucky that init took some longer time. For example I ran `ip x s` in a pod about 15seconds after launch and captured a case with new XfrmInNoErrors, a cilium_host ip assigned, but no xfrm state rule for it. The packets received are dropped. The conclusion is remote nodes learn the new router IP before we have the xfrm state rule loaded. The remote nodes then start using that IP for the IPSec tunnel outer IP resulting in the errors when they reach the local node without the xfrm rule yet. The errors eventually resolve, but some packets are lost in the meantime. The reason this happens is because first we configure the datapath after we push node object updates. This is wrong because we need to init the ipsec code path before we teach remote nodes about the new IP. And second the configuration of the datapath does a lookup in the node objects IPAddresses{} this is only populated from the k8s watcher in the tunnel case. So we only have the fully populated node object after we receive it through the k8s watcher. Again its possible other nodes already have seen the event and started pushing traffic with the new IPs. To resolve push IPSec init code to create xfrm rules needed with the new IPs before we publish them to the k8s node object. And instead of pulling the IPs out of the node object simply pull them directly from the node module. This resolves the XfrmInNoState and XfrmIn*Policy* errors I've seen. To reproduce the errors I can consistently reproduce with about 30 nodes, with httpperf test running from a pod in all nodes, and then doing a 'rollout' of the cilium agent for awhile. Seems a 2-3 hours almost ensures errors pop up. Usually the errors happen much sooner. Initially I saw these errors on upgrade tests which is another method to reproduce. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Margarita Manterola <margamanterola@gmail.com>
margamanterola
force-pushed
the
pr/v1.14-backport-2023-09-08
branch
from
September 8, 2023 17:29
ac82863
to
baf94bb
Compare
/test-backport-1.14 |
michi-covalent
approved these changes
Sep 8, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrfastab could you also approve this 🙏
/ci-ipsec-upgrade |
jrfastab
approved these changes
Sep 8, 2023
maintainer-s-little-helper
bot
added
the
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
label
Sep 8, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backport/1.14
This PR represents a backport for Cilium 1.14.x of a PR that was merged to main.
kind/backports
This PR provides functionality previously merged into master.
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Once this PR is merged, you can update the PR labels via:
or with