Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daemon, node: Fix faulty router IP restoration logic #16672

Merged
merged 1 commit into from
Jun 29, 2021

Conversation

christarazi
Copy link
Member

@christarazi christarazi commented Jun 28, 2021

When running in ENI or Alibaba IPAM mode, or any CRD-backed IPAM mode
("crd") and upon Cilium restart, it was very likely that cilium_host
was assigned an additional IP. Below is a case where Cilium was
restarted 3 times, hence getting 3 additional router IPs:

4: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default qlen 1000
    link/ether 66:03:3c:07:8c:47 brd ff:ff:ff:ff:ff:ff
    inet 192.168.35.9/32 scope link cilium_host
       valid_lft forever preferred_lft forever
    inet 192.168.34.37/32 scope link cilium_host
       valid_lft forever preferred_lft forever
    inet 192.168.57.107/32 scope link cilium_host
       valid_lft forever preferred_lft forever
    inet6 fe80::6403:3cff:fe07:8c47/64 scope link
       valid_lft forever preferred_lft forever

This was because in CRD-backed IPAM modes, we wait until we fully sync
with K8s in order to derive the VPC CIDR, which becomes the pod CIDR on
the node. Since the router IP restoration logic was using a different
pod CIDR during the router IP validation check, it was erroneously
discarding it. This was observed with:

2021-06-25T13:59:47.816069937Z level=info msg="The router IP (192.168.135.3) considered for restoration does not belong in the Pod CIDR of the node. Discarding old router IP." cidr=10.8.0.0/16 subsys=node

This is problematic because the extraneous router IPs could be also
assigned to pods, which would break pod connectivity.

The fix is to break up the router IP restoration process into 2 parts.
The first is to attempt a restoration of the IP from the filesystem
(node_config.h). We also fetch the router IPs from Kubernetes
resources since they were already retrieved prior inside
k8s.WaitForNodeInformation().

Then after the CRD-backed IPAM is initialized and started
(*Daemon).startIPAM() is called, we attempt the second part. This
includes evaluating which IPs (either from filesystem or from K8s)
should be set as the router IPs. The IPs from the filesystem take
precedence. In case the node was rebooted, the filesystem will be wiped
so then we'd rely on the IPs from the K8s resources. At this point in
the daemon initialization, we have the correct CIDR range as the pod
CIDR range to validate the chosen IP.

Fixes: beb8bde ("k8s, node: Restore router IPs (cilium_host) from
K8s resource")

Signed-off-by: Chris Tarazi chris@isovalent.com

Fixes: #16520

@christarazi christarazi added area/daemon Impacts operation of the Cilium daemon. needs-backport/1.10 release-note/bug This PR fixes an issue in a previous release of Cilium. labels Jun 28, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Jun 28, 2021
@christarazi
Copy link
Member Author

christarazi commented Jun 28, 2021

test-me-please

Edit: net-next hit #12511 and 1.21-4.9 hit #14598

@christarazi christarazi marked this pull request as ready for review June 28, 2021 18:07
@christarazi christarazi requested review from a team and jibi June 28, 2021 18:07
@christarazi christarazi added the kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. label Jun 28, 2021
When running in ENI or Alibaba IPAM mode, or any CRD-backed IPAM mode
("crd") and upon Cilium restart, it was very likely that `cilium_host`
was assigned an additional IP. Below is a case where Cilium was
restarted 3 times, hence getting 3 additional router IPs:

```
4: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default qlen 1000
    link/ether 66:03:3c:07:8c:47 brd ff:ff:ff:ff:ff:ff
    inet 192.168.35.9/32 scope link cilium_host
       valid_lft forever preferred_lft forever
    inet 192.168.34.37/32 scope link cilium_host
       valid_lft forever preferred_lft forever
    inet 192.168.57.107/32 scope link cilium_host
       valid_lft forever preferred_lft forever
    inet6 fe80::6403:3cff:fe07:8c47/64 scope link
       valid_lft forever preferred_lft forever
```

This was because in CRD-backed IPAM modes, we wait until we fully sync
with K8s in order to derive the VPC CIDR, which becomes the pod CIDR on
the node. Since the router IP restoration logic was using a different
pod CIDR during the router IP validation check, it was erroneously
discarding it. This was observed with:

```
2021-06-25T13:59:47.816069937Z level=info msg="The router IP (192.168.135.3) considered for restoration does not belong in the Pod CIDR of the node. Discarding old router IP." cidr=10.8.0.0/16 subsys=node
```

This is problematic because the extraneous router IPs could be also
assigned to pods, which would break pod connectivity.

The fix is to break up the router IP restoration process into 2 parts.
The first is to attempt a restoration of the IP from the filesystem
(`node_config.h`). We also fetch the router IPs from Kubernetes
resources since they were already retrieved prior inside
k8s.WaitForNodeInformation().

Then after the CRD-backed IPAM is initialized and started
(*Daemon).startIPAM() is called, we attempt the second part. This
includes evaluating which IPs (either from filesystem or from K8s)
should be set as the router IPs. The IPs from the filesystem take
precedence. In case the node was rebooted, the filesystem will be wiped
so then we'd rely on the IPs from the K8s resources. At this point in
the daemon initialization, we have the correct CIDR range as the pod
CIDR range to validate the chosen IP.

Fixes: beb8bde ("k8s, node: Restore router IPs (`cilium_host`) from
K8s resource")

Signed-off-by: Chris Tarazi <chris@isovalent.com>
@christarazi
Copy link
Member Author

CI has passed, pushed small change

Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

daemon/cmd/daemon.go Show resolved Hide resolved
@christarazi christarazi removed the request for review from jibi June 29, 2021 18:42
@christarazi christarazi added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. needs-backport/1.9 labels Jun 29, 2021
@aanm
Copy link
Member

aanm commented Jun 29, 2021

Merging this PR to unblock 1.10.2 release

@aanm aanm merged commit ff63b07 into cilium:master Jun 29, 2021
@christarazi christarazi deleted the pr/christarazi/fix-16520 branch June 29, 2021 21:06
@Purushotham233
Copy link

Purushotham233 commented Aug 27, 2021

@christarazi,
We are using 1.9.4. And this issue is NOT observed on cilium-agent restarts.

Was this issue existed in all 1.9.x versions?

@christarazi
Copy link
Member Author

@Purushotham233 No, because this PR fixes #16307 which was released in v1.10.x, not v1.9.x. Everything should be fixed in terms of the aforementioned PR and v1.9.

@houminz
Copy link

houminz commented Aug 11, 2022

Hi @christarazi,I came across the same problem with cilium version v1.10.4

In my case, IPAM is crd mode. upon Cilium restart, it was very likely that cilium_host was assigned an additional IP.

For example, pod cidr 10.24.0.0/26 is allocated for one node, and originally cilium_host ip is 10.24.0.55

apiVersion: v1
kind: Node
metadata:
  annotations:
    io.cilium.network.ipv4-cilium-host: 10.24.0.24
    io.cilium.network.ipv4-health-ip: 10.24.0.20
    io.cilium.network.ipv4-pod-cidr: 10.24.0.0/26

After I rebuild the cilium pod, the cilium_host was assigned an additional IP.

4: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether c2:35:33:f2:ab:00 brd ff:ff:ff:ff:ff:ff
    inet 10.24.0.55/32 scope link cilium_host
       valid_lft forever preferred_lft forever
    inet 10.24.0.24/32 scope link cilium_host
       valid_lft forever preferred_lft forever
    inet6 fe80::c035:33ff:fef2:ab00/64 scope link
       valid_lft forever preferred_lft forever

As I try to read the code and cilium log, we can see that 2 parts of router ip restoration both get 10.24.0.55, i.e. routerFromK8s and routerFromFs are both 10.24.0.55.

Howerver, in crd mode, cidr is nil, thus leading the log restoration does not belong in the Pod CIDR of the node

level=info msg="Restored router IPs from node information" ipv4=10.24.0.55 ipv6="<nil>" subsys=k8s
level=info msg="k8s mode: Allowing localhost to reach local endpoints" subsys=daemon
level=info msg="Enabling k8s event listener" subsys=k8s-watcher
level=info msg="Removing stale endpoint interfaces" subsys=daemon
level=info msg="Skipping kvstore configuration" subsys=daemon
level=info msg="Restored router address from node_config" file=/var/run/cilium/state/globals/node_config.h ipv4=10.24.0.55 ipv6="<nil>" subsys=node
level=info msg="Initializing node addressing" subsys=daemon
level=info msg="Initializing CRD-based IPAM" subsys=ipam
level=info msg="Subscribed to CiliumNode custom resource" name=10.0.24.205 subsys=ipam
level=info msg="Creating or updating CiliumNode resource" node=10.0.24.205 subsys=nodediscovery
level=info msg="Waiting until all pre-existing resources related to policy have been received" subsys=k8s-watcher
level=info msg="Waiting for CiliumNode custom resource to become available..." name=10.0.24.205 subsys=ipam
level=info msg="All pre-existing resources related to policy have been received; continuing" subsys=k8s-watcher
level=info msg="Successfully synchronized CiliumNode custom resource" name=10.0.24.205 subsys=ipam
level=info msg="All required IPs are available in CRD-backed allocation pool" available=62 name=10.0.24.205 required=2 subsys=ipam
level=info msg="The router IP (10.24.0.55) considered for restoration does not belong in the Pod CIDR of the node. Discarding old router IP." cidr="<nil>" subsys=node

As the code shows, in crd mode, cidr is from IPv4NativeRoutingCIDR()

https://github.com/cilium/cilium/pull/16672/files#diff-2ed5415f522d0b72066d62527fb40dbea9bf745f45d5e2d47f7715680f165fcaR278

if ipam := option.Config.IPAMMode(); ipam == ipamOption.IPAMCRD || ipam == ipamOption.IPAMENI || ipam == ipamOption.IPAMAlibabaCloud {
	// The native routing CIDR is the pod CIDR in these IPAM modes.
	cidr = option.Config.IPv4NativeRoutingCIDR()
} else {
	cidr = node.GetIPv4AllocRange()
}

But the IPv4NativeRoutingCIDR() is only set for IPAMENI/IPAMAzure/IPAMAlibabaCloud, not in IPAMCRD. So for my scenario, cidr is nil.

if n.conf.IPAMMode() == ipamOption.IPAMENI || n.conf.IPAMMode() == ipamOption.IPAMAzure || n.conf.IPAMMode() == ipamOption.IPAMAlibabaCloud {
	if vpcCIDR := deriveVpcCIDR(n.ownNode); vpcCIDR != nil {
		if nativeCIDR := n.conf.IPv4NativeRoutingCIDR(); nativeCIDR != nil {
			logFields := logrus.Fields{
				"vpc-cidr":                   vpcCIDR.String(),
				option.IPv4NativeRoutingCIDR: nativeCIDR.String(),
			}

			ranges4, _ := ip.CoalesceCIDRs([]*net.IPNet{nativeCIDR.IPNet, vpcCIDR.IPNet})
			if len(ranges4) != 1 {
				log.WithFields(logFields).Fatal("Native routing CIDR does not contain VPC CIDR.")
			} else {
				log.WithFields(logFields).Info("Ignoring autodetected VPC CIDR.")
			}
		} else {
			log.WithFields(logrus.Fields{
				"vpc-cidr": vpcCIDR.String(),
			}).Info("Using autodetected VPC CIDR.")
			n.conf.SetIPv4NativeRoutingCIDR(vpcCIDR)
		}
	} else {
		minimumReached = false
	}
}

https://github.com/cilium/cilium/blob/v1.10.4/pkg/ipam/crd.go#L273

So maybe code here should not include condition IPAMCRD, or maybe the usage in my case is not correctly? Any help would be appreciated.

if ipam := option.Config.IPAMMode(); ipam == ipamOption.IPAMCRD || ipam == ipamOption.IPAMENI || ipam == ipamOption.IPAMAlibabaCloud {
	// The native routing CIDR is the pod CIDR in these IPAM modes.
	cidr = option.Config.IPv4NativeRoutingCIDR()
} else {
	cidr = node.GetIPv4AllocRange()
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon Impacts operation of the Cilium daemon. kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cilium assigns additional IP to cilium_host upon restart
6 participants