daemon, node: Fix faulty router IP restoration logic #16672

christarazi · 2021-06-28T17:52:39Z

When running in ENI or Alibaba IPAM mode, or any CRD-backed IPAM mode
("crd") and upon Cilium restart, it was very likely that cilium_host
was assigned an additional IP. Below is a case where Cilium was
restarted 3 times, hence getting 3 additional router IPs:

4: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default qlen 1000
    link/ether 66:03:3c:07:8c:47 brd ff:ff:ff:ff:ff:ff
    inet 192.168.35.9/32 scope link cilium_host
       valid_lft forever preferred_lft forever
    inet 192.168.34.37/32 scope link cilium_host
       valid_lft forever preferred_lft forever
    inet 192.168.57.107/32 scope link cilium_host
       valid_lft forever preferred_lft forever
    inet6 fe80::6403:3cff:fe07:8c47/64 scope link
       valid_lft forever preferred_lft forever

This was because in CRD-backed IPAM modes, we wait until we fully sync
with K8s in order to derive the VPC CIDR, which becomes the pod CIDR on
the node. Since the router IP restoration logic was using a different
pod CIDR during the router IP validation check, it was erroneously
discarding it. This was observed with:

2021-06-25T13:59:47.816069937Z level=info msg="The router IP (192.168.135.3) considered for restoration does not belong in the Pod CIDR of the node. Discarding old router IP." cidr=10.8.0.0/16 subsys=node

This is problematic because the extraneous router IPs could be also
assigned to pods, which would break pod connectivity.

The fix is to break up the router IP restoration process into 2 parts.
The first is to attempt a restoration of the IP from the filesystem
(node_config.h). We also fetch the router IPs from Kubernetes
resources since they were already retrieved prior inside
k8s.WaitForNodeInformation().

Then after the CRD-backed IPAM is initialized and started
(*Daemon).startIPAM() is called, we attempt the second part. This
includes evaluating which IPs (either from filesystem or from K8s)
should be set as the router IPs. The IPs from the filesystem take
precedence. In case the node was rebooted, the filesystem will be wiped
so then we'd rely on the IPs from the K8s resources. At this point in
the daemon initialization, we have the correct CIDR range as the pod
CIDR range to validate the chosen IP.

Fixes: beb8bde ("k8s, node: Restore router IPs (cilium_host) from
K8s resource")

Signed-off-by: Chris Tarazi chris@isovalent.com

Fixes: #16520

christarazi · 2021-06-28T18:06:55Z

test-me-please

Edit: net-next hit #12511 and 1.21-4.9 hit #14598

When running in ENI or Alibaba IPAM mode, or any CRD-backed IPAM mode ("crd") and upon Cilium restart, it was very likely that `cilium_host` was assigned an additional IP. Below is a case where Cilium was restarted 3 times, hence getting 3 additional router IPs: ``` 4: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default qlen 1000 link/ether 66:03:3c:07:8c:47 brd ff:ff:ff:ff:ff:ff inet 192.168.35.9/32 scope link cilium_host valid_lft forever preferred_lft forever inet 192.168.34.37/32 scope link cilium_host valid_lft forever preferred_lft forever inet 192.168.57.107/32 scope link cilium_host valid_lft forever preferred_lft forever inet6 fe80::6403:3cff:fe07:8c47/64 scope link valid_lft forever preferred_lft forever ``` This was because in CRD-backed IPAM modes, we wait until we fully sync with K8s in order to derive the VPC CIDR, which becomes the pod CIDR on the node. Since the router IP restoration logic was using a different pod CIDR during the router IP validation check, it was erroneously discarding it. This was observed with: ``` 2021-06-25T13:59:47.816069937Z level=info msg="The router IP (192.168.135.3) considered for restoration does not belong in the Pod CIDR of the node. Discarding old router IP." cidr=10.8.0.0/16 subsys=node ``` This is problematic because the extraneous router IPs could be also assigned to pods, which would break pod connectivity. The fix is to break up the router IP restoration process into 2 parts. The first is to attempt a restoration of the IP from the filesystem (`node_config.h`). We also fetch the router IPs from Kubernetes resources since they were already retrieved prior inside k8s.WaitForNodeInformation(). Then after the CRD-backed IPAM is initialized and started (*Daemon).startIPAM() is called, we attempt the second part. This includes evaluating which IPs (either from filesystem or from K8s) should be set as the router IPs. The IPs from the filesystem take precedence. In case the node was rebooted, the filesystem will be wiped so then we'd rely on the IPs from the K8s resources. At this point in the daemon initialization, we have the correct CIDR range as the pod CIDR range to validate the chosen IP. Fixes: beb8bde ("k8s, node: Restore router IPs (`cilium_host`) from K8s resource") Signed-off-by: Chris Tarazi <chris@isovalent.com>

christarazi · 2021-06-28T22:27:28Z

CI has passed, pushed small change

aanm

LGTM

daemon/cmd/daemon.go

aanm · 2021-06-29T20:33:45Z

Merging this PR to unblock 1.10.2 release

Purushotham233 · 2021-08-27T05:42:16Z

@christarazi,
We are using 1.9.4. And this issue is NOT observed on cilium-agent restarts.

Was this issue existed in all 1.9.x versions?

christarazi · 2021-08-30T18:21:25Z

@Purushotham233 No, because this PR fixes #16307 which was released in v1.10.x, not v1.9.x. Everything should be fixed in terms of the aforementioned PR and v1.9.

houminz · 2022-08-11T03:41:23Z

Hi @christarazi，I came across the same problem with cilium version v1.10.4

In my case, IPAM is crd mode. upon Cilium restart, it was very likely that cilium_host was assigned an additional IP.

For example, pod cidr 10.24.0.0/26 is allocated for one node, and originally cilium_host ip is 10.24.0.55

apiVersion: v1
kind: Node
metadata:
  annotations:
    io.cilium.network.ipv4-cilium-host: 10.24.0.24
    io.cilium.network.ipv4-health-ip: 10.24.0.20
    io.cilium.network.ipv4-pod-cidr: 10.24.0.0/26

After I rebuild the cilium pod, the cilium_host was assigned an additional IP.

4: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether c2:35:33:f2:ab:00 brd ff:ff:ff:ff:ff:ff
    inet 10.24.0.55/32 scope link cilium_host
       valid_lft forever preferred_lft forever
    inet 10.24.0.24/32 scope link cilium_host
       valid_lft forever preferred_lft forever
    inet6 fe80::c035:33ff:fef2:ab00/64 scope link
       valid_lft forever preferred_lft forever

As I try to read the code and cilium log, we can see that 2 parts of router ip restoration both get 10.24.0.55, i.e. routerFromK8s and routerFromFs are both 10.24.0.55.

Howerver, in crd mode, cidr is nil, thus leading the log restoration does not belong in the Pod CIDR of the node

level=info msg="Restored router IPs from node information" ipv4=10.24.0.55 ipv6="<nil>" subsys=k8s
level=info msg="k8s mode: Allowing localhost to reach local endpoints" subsys=daemon
level=info msg="Enabling k8s event listener" subsys=k8s-watcher
level=info msg="Removing stale endpoint interfaces" subsys=daemon
level=info msg="Skipping kvstore configuration" subsys=daemon
level=info msg="Restored router address from node_config" file=/var/run/cilium/state/globals/node_config.h ipv4=10.24.0.55 ipv6="<nil>" subsys=node
level=info msg="Initializing node addressing" subsys=daemon
level=info msg="Initializing CRD-based IPAM" subsys=ipam
level=info msg="Subscribed to CiliumNode custom resource" name=10.0.24.205 subsys=ipam
level=info msg="Creating or updating CiliumNode resource" node=10.0.24.205 subsys=nodediscovery
level=info msg="Waiting until all pre-existing resources related to policy have been received" subsys=k8s-watcher
level=info msg="Waiting for CiliumNode custom resource to become available..." name=10.0.24.205 subsys=ipam
level=info msg="All pre-existing resources related to policy have been received; continuing" subsys=k8s-watcher
level=info msg="Successfully synchronized CiliumNode custom resource" name=10.0.24.205 subsys=ipam
level=info msg="All required IPs are available in CRD-backed allocation pool" available=62 name=10.0.24.205 required=2 subsys=ipam
level=info msg="The router IP (10.24.0.55) considered for restoration does not belong in the Pod CIDR of the node. Discarding old router IP." cidr="<nil>" subsys=node

As the code shows, in crd mode, cidr is from IPv4NativeRoutingCIDR()

https://github.com/cilium/cilium/pull/16672/files#diff-2ed5415f522d0b72066d62527fb40dbea9bf745f45d5e2d47f7715680f165fcaR278

if ipam := option.Config.IPAMMode(); ipam == ipamOption.IPAMCRD || ipam == ipamOption.IPAMENI || ipam == ipamOption.IPAMAlibabaCloud {
	// The native routing CIDR is the pod CIDR in these IPAM modes.
	cidr = option.Config.IPv4NativeRoutingCIDR()
} else {
	cidr = node.GetIPv4AllocRange()
}

But the IPv4NativeRoutingCIDR() is only set for IPAMENI/IPAMAzure/IPAMAlibabaCloud, not in IPAMCRD. So for my scenario, cidr is nil.

if n.conf.IPAMMode() == ipamOption.IPAMENI || n.conf.IPAMMode() == ipamOption.IPAMAzure || n.conf.IPAMMode() == ipamOption.IPAMAlibabaCloud {
	if vpcCIDR := deriveVpcCIDR(n.ownNode); vpcCIDR != nil {
		if nativeCIDR := n.conf.IPv4NativeRoutingCIDR(); nativeCIDR != nil {
			logFields := logrus.Fields{
				"vpc-cidr":                   vpcCIDR.String(),
				option.IPv4NativeRoutingCIDR: nativeCIDR.String(),
			}

			ranges4, _ := ip.CoalesceCIDRs([]*net.IPNet{nativeCIDR.IPNet, vpcCIDR.IPNet})
			if len(ranges4) != 1 {
				log.WithFields(logFields).Fatal("Native routing CIDR does not contain VPC CIDR.")
			} else {
				log.WithFields(logFields).Info("Ignoring autodetected VPC CIDR.")
			}
		} else {
			log.WithFields(logrus.Fields{
				"vpc-cidr": vpcCIDR.String(),
			}).Info("Using autodetected VPC CIDR.")
			n.conf.SetIPv4NativeRoutingCIDR(vpcCIDR)
		}
	} else {
		minimumReached = false
	}
}

https://github.com/cilium/cilium/blob/v1.10.4/pkg/ipam/crd.go#L273

So maybe code here should not include condition IPAMCRD, or maybe the usage in my case is not correctly? Any help would be appreciated.

if ipam := option.Config.IPAMMode(); ipam == ipamOption.IPAMCRD || ipam == ipamOption.IPAMENI || ipam == ipamOption.IPAMAlibabaCloud {
	// The native routing CIDR is the pod CIDR in these IPAM modes.
	cidr = option.Config.IPv4NativeRoutingCIDR()
} else {
	cidr = node.GetIPv4AllocRange()
}

christarazi added area/daemon Impacts operation of the Cilium daemon. needs-backport/1.10 release-note/bug This PR fixes an issue in a previous release of Cilium. labels Jun 28, 2021

maintainer-s-little-helper bot added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Jun 28, 2021

christarazi marked this pull request as ready for review June 28, 2021 18:07

christarazi requested review from a team and jibi June 28, 2021 18:07

maintainer-s-little-helper bot assigned jibi Jun 28, 2021

christarazi added the kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. label Jun 28, 2021

christarazi mentioned this pull request Jun 28, 2021

[v1.10] daemon, node: Fix faulty router IP restoration logic #16675

Merged

christarazi added backport-pending/1.10 and removed needs-backport/1.10 labels Jun 28, 2021

christarazi mentioned this pull request Jun 28, 2021

k8s-1.13-net-next CI: DNS entry is not ready after timeout #14598

Closed

christarazi force-pushed the pr/christarazi/fix-16520 branch from 1dd77af to ee8f715 Compare June 28, 2021 22:27

aanm approved these changes Jun 29, 2021

View reviewed changes

daemon/cmd/daemon.go Show resolved Hide resolved

christarazi removed the request for review from jibi June 29, 2021 18:42

christarazi unassigned jibi Jun 29, 2021

christarazi added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. needs-backport/1.9 labels Jun 29, 2021

aanm merged commit ff63b07 into cilium:master Jun 29, 2021

christarazi deleted the pr/christarazi/fix-16520 branch June 29, 2021 21:06

christarazi removed the backport-pending/1.10 label Jun 29, 2021

christarazi added the backport-done/1.10 label Jun 29, 2021

aanm mentioned this pull request Jun 30, 2021

CI: ConformanceAKS - Error: Connectivity test failed: timeout reached waiting for pod IDs in ipcache of Cilium pod kube-system/cilium-mvvgc #16693

Closed

christarazi mentioned this pull request Jun 30, 2021

Plumb Azure interface's VPC / primary CIDR and set it as native routing CIDR in Azure IPAM mode #16696

Merged

aanm mentioned this pull request Jul 2, 2021

Prepare for release v1.10.2 #16749

Merged

christarazi added backport-pending/1.9 and removed needs-backport/1.9 labels Jul 4, 2021

christarazi mentioned this pull request Jul 4, 2021

v1.9 backports 2021-06-16 #16569

Merged

Purushotham233 mentioned this pull request Jul 7, 2021

Cilium assigns additional IP to cilium_host upon restart #16520

Closed

pchaigno added backport-done/1.9 and removed backport-pending/1.9 labels Jul 12, 2021

joestringer mentioned this pull request Jul 19, 2021

Prepare for release v1.9.9 #16932

Merged

houminz mentioned this pull request Aug 12, 2022

Cilium assigns additional IP to cilium_host upon restart for IPAMCRD mode #20879

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

daemon, node: Fix faulty router IP restoration logic #16672

daemon, node: Fix faulty router IP restoration logic #16672

christarazi commented Jun 28, 2021 •

edited

Loading

christarazi commented Jun 28, 2021 •

edited

Loading

christarazi commented Jun 28, 2021

aanm left a comment

aanm commented Jun 29, 2021

Purushotham233 commented Aug 27, 2021 •

edited

Loading

christarazi commented Aug 30, 2021

houminz commented Aug 11, 2022 •

edited

Loading

daemon, node: Fix faulty router IP restoration logic #16672

daemon, node: Fix faulty router IP restoration logic #16672

Conversation

christarazi commented Jun 28, 2021 • edited Loading

christarazi commented Jun 28, 2021 • edited Loading

christarazi commented Jun 28, 2021

aanm left a comment

Choose a reason for hiding this comment

aanm commented Jun 29, 2021

Purushotham233 commented Aug 27, 2021 • edited Loading

christarazi commented Aug 30, 2021

houminz commented Aug 11, 2022 • edited Loading

christarazi commented Jun 28, 2021 •

edited

Loading

christarazi commented Jun 28, 2021 •

edited

Loading

Purushotham233 commented Aug 27, 2021 •

edited

Loading

houminz commented Aug 11, 2022 •

edited

Loading