Skip to content

Commit

Permalink
cilium: Ensure xfrm state is initialized for route IP before publish
Browse files Browse the repository at this point in the history
[ upstream commit c9ea7a5 ]

When rolling cilium-agent or doing an upgrade while running stress test
with encryption a small number of NoStateIn errors are seen. To capture
the error state (a cilium_host IP without an xfrm state rule) you need
to get into the pod near pod init and get somewhat lucky that init
took some longer time. For example I ran `ip x s` in a pod about
15seconds after launch and captured a case with new XfrmInNoErrors,
a cilium_host ip assigned, but no xfrm state rule for it. The packets
received are dropped.

The conclusion is remote nodes learn the new router IP before we have
the xfrm state rule loaded. The remote nodes then start using that
IP for the IPSec tunnel outer IP resulting in the errors when they
reach the local node without the xfrm rule yet. The errors eventually
resolve, but some packets are lost in the meantime.

The reason this happens is because first we configure the datapath
after we push node object updates. This is wrong because we need
to init the ipsec code path before we teach remote nodes about the
new IP. And second the configuration of the datapath does a lookup
in the node objects IPAddresses{} this is only populated from the
k8s watcher in the tunnel case. So we only have the fully populated
node object after we receive it through the k8s watcher. Again its
possible other nodes already have seen the event and started pushing
traffic with the new IPs.

To resolve push IPSec init code to create xfrm rules needed with
the new IPs before we publish them to the k8s node object. And
instead of pulling the IPs out of the node object simply pull them
directly from the node module. This resolves the XfrmInNoState and
XfrmIn*Policy* errors I've seen.

To reproduce the errors I can consistently reproduce with about
30 nodes, with httpperf test running from a pod in all nodes, and
then doing a 'rollout' of the cilium agent for awhile. Seems
a 2-3 hours almost ensures errors pop up. Usually the errors
happen much sooner. Initially I saw these errors on upgrade tests
which is another method to reproduce.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Margarita Manterola <margamanterola@gmail.com>
  • Loading branch information
jrfastab authored and michi-covalent committed Sep 8, 2023
1 parent bc4b5ac commit 69b4ceb
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 0 deletions.
10 changes: 10 additions & 0 deletions daemon/cmd/daemon.go
Original file line number Diff line number Diff line change
Expand Up @@ -1111,6 +1111,16 @@ func newDaemon(ctx context.Context, cleaner *daemonCleanup, params *daemonParams
return nil, nil, err
}

// allocateIPs got us the routerIP so now we can create ipsec endpoint
// we must do this before publishing the router IP otherwise remote
// nodes could pick up the IP and send us outer headers we do not yet
// have xfrm rules for.
if option.Config.EnableIPSec {
if err := ipsec.Init(); err != nil {
log.WithError(err).Error("IPSec init failed")
}
}

// Must occur after d.allocateIPs(), see GH-14245 and its fix.
d.nodeDiscovery.StartDiscovery()

Expand Down
17 changes: 17 additions & 0 deletions pkg/datapath/linux/ipsec/ipsec_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -1115,3 +1115,20 @@ func StartStaleKeysReclaimer(ctx context.Context) {
}
}()
}

// We need to install xfrm state for the local router (cilium_host) early
// in daemon init path. This is to ensure that we have the xfrm state in
// place before we advertise the routerIP where other nodes may potentially
// pick it up and start sending traffic to us. This was previously racing
// and creating XfrmInNoState errors because other nodes picked up node
// update before Xfrm config logic was in place. So special case init the
// rule we need early in init flow.
func Init() error {
outerLocalIP := node.GetInternalIPv4Router()
wildcardIP := net.ParseIP("0.0.0.0")
localCIDR := node.GetIPv4AllocRange().IPNet
localWildcardIP := &net.IPNet{IP: wildcardIP, Mask: net.IPv4Mask(0, 0, 0, 0)}

_, err := UpsertIPsecEndpoint(localCIDR, localWildcardIP, outerLocalIP, wildcardIP, 0, IPSecDirIn, false)
return err
}

0 comments on commit 69b4ceb

Please sign in to comment.