Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cilium: Ensure xfrm state is initialized for route IP before publish #28012

Merged
merged 1 commit into from Sep 8, 2023

Conversation

jrfastab
Copy link
Contributor

@jrfastab jrfastab commented Sep 8, 2023

When rolling cilium-agent or doing an upgrade while running stress test with encryption a small number of NoStateIn errors are seen. To capture the error state (a cilium_host IP without an xfrm state rule) you need to get into the pod near pod init and get somewhat lucky that init took some longer time. For example I ran ip x s in a pod about 15seconds after launch and captured a case with new XfrmInNoErrors, a cilium_host ip assigned, but no xfrm state rule for it. The packets received are dropped.

The conclusion is remote nodes learn the new router IP before we have the xfrm state rule loaded. The remote nodes then start using that IP for the IPSec tunnel outer IP resulting in the errors when they reach the local node without the xfrm rule yet. The errors eventually resolve, but some packets are lost in the meantime.

The reason this happens is because first we configure the datapath after we push node object updates. This is wrong because we need to init the ipsec code path before we teach remote nodes about the new IP. And second the configuration of the datapath does a lookup in the node objects IPAddresses{} this is only populated from the k8s watcher in the tunnel case. So we only have the fully populated node object after we receive it through the k8s watcher. Again its possible other nodes already have seen the event and started pushing traffic with the new IPs.

To resolve push IPSec init code to create xfrm rules needed with the new IPs before we publish them to the k8s node object. And instead of pulling the IPs out of the node object simply pull them directly from the node module. This resolves the XfrmInNoState and XfrmInPolicy errors I've seen.

To reproduce the errors I can consistently reproduce with about 30 nodes, with httpperf test running from a pod in all nodes, and then doing a 'rollout' of the cilium agent for awhile. Seems a 2-3 hours almost ensures errors pop up. Usually the errors happen much sooner. Initially I saw these errors on upgrade tests which is another method to reproduce.

IPSec fix for race on init resulting in Xfrm*In* errors and dropped packets

When rolling cilium-agent or doing an upgrade while running stress test
with encryption a small number of NoStateIn errors are seen. To capture
the error state (a cilium_host IP without an xfrm state rule) you need
to get into the pod near pod init and get somewhat lucky that init
took some longer time. For example I ran `ip x s` in a pod about
15seconds after launch and captured a case with new XfrmInNoErrors,
a cilium_host ip assigned, but no xfrm state rule for it. The packets
received are dropped.

The conclusion is remote nodes learn the new router IP before we have
the xfrm state rule loaded. The remote nodes then start using that
IP for the IPSec tunnel outer IP resulting in the errors when they
reach the local node without the xfrm rule yet. The errors eventually
resolve, but some packets are lost in the meantime.

The reason this happens is because first we configure the datapath
after we push node object updates. This is wrong because we need
to init the ipsec code path before we teach remote nodes about the
new IP. And second the configuration of the datapath does a lookup
in the node objects IPAddresses{} this is only populated from the
k8s watcher in the tunnel case. So we only have the fully populated
node object after we receive it through the k8s watcher. Again its
possible other nodes already have seen the event and started pushing
traffic with the new IPs.

To resolve push IPSec init code to create xfrm rules needed with
the new IPs before we publish them to the k8s node object. And
instead of pulling the IPs out of the node object simply pull them
directly from the node module. This resolves the XfrmInNoState and
XfrmIn*Policy* errors I've seen.

To reproduce the errors I can consistently reproduce with about
30 nodes, with httpperf test running from a pod in all nodes, and
then doing a 'rollout' of the cilium agent for awhile. Seems
a 2-3 hours almost ensures errors pop up. Usually the errors
happen much sooner. Initially I saw these errors on upgrade tests
which is another method to reproduce.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
@jrfastab jrfastab requested review from a team as code owners September 8, 2023 00:49
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Sep 8, 2023
@jrfastab jrfastab added area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. needs-backport/1.11 backport/author The backport will be carried out by the author of the PR. needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Sep 8, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.14.2 Sep 8, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.13.7 Sep 8, 2023
@jrfastab jrfastab added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Sep 8, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.12.14 Sep 8, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Sep 8, 2023
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo the typo below.

pkg/datapath/linux/ipsec/ipsec_linux.go Outdated Show resolved Hide resolved
Copy link
Member

@jschwinger233 jschwinger233 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I left a question below regarding subnetEncryption. Besides that, I was wondering what relationship between this PR and #27724 is?

pkg/datapath/linux/ipsec/ipsec_linux.go Outdated Show resolved Hide resolved
@jrfastab jrfastab force-pushed the pr/jrfastab/fixes-ipsec-init-v2 branch from ffae1ee to 35dfdfa Compare September 8, 2023 03:54
@jrfastab
Copy link
Contributor Author

jrfastab commented Sep 8, 2023

/test

Copy link
Member

@jschwinger233 jschwinger233 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@julianwiedmann julianwiedmann added the kind/bug This is a bug in the Cilium logic. label Sep 8, 2023
@aanm aanm merged commit c9ea7a5 into main Sep 8, 2023
206 checks passed
@aanm aanm deleted the pr/jrfastab/fixes-ipsec-init-v2 branch September 8, 2023 11:45
@margamanterola margamanterola mentioned this pull request Sep 8, 2023
1 task
@margamanterola margamanterola added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Sep 8, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.14 in 1.14.2 Sep 8, 2023
@margamanterola margamanterola mentioned this pull request Sep 8, 2023
1 task
@margamanterola margamanterola added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Sep 8, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.13 in 1.13.7 Sep 8, 2023
@margamanterola margamanterola mentioned this pull request Sep 8, 2023
1 task
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.12 in 1.12.14 Sep 8, 2023
@margamanterola margamanterola removed the backport/author The backport will be carried out by the author of the PR. label Sep 8, 2023
@margamanterola margamanterola mentioned this pull request Sep 8, 2023
1 task
@margamanterola margamanterola mentioned this pull request Sep 8, 2023
1 task
@michi-covalent michi-covalent added backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.12 backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Sep 9, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.13 to Backport done to v1.13 in 1.13.7 Sep 9, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.14 to Backport done to v1.14 in 1.14.2 Sep 9, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.14 to Backport done to v1.14 in 1.14.2 Sep 9, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.12 to Backport done to v1.12 in 1.12.14 Sep 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
1.12.14
Backport done to v1.12
1.13.7
Backport done to v1.13
1.14.2
Backport done to v1.14
Development

Successfully merging this pull request may close these issues.

None yet

7 participants