-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
daemon, node: Fix faulty router IP restoration logic #16672
Conversation
When running in ENI or Alibaba IPAM mode, or any CRD-backed IPAM mode ("crd") and upon Cilium restart, it was very likely that `cilium_host` was assigned an additional IP. Below is a case where Cilium was restarted 3 times, hence getting 3 additional router IPs: ``` 4: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default qlen 1000 link/ether 66:03:3c:07:8c:47 brd ff:ff:ff:ff:ff:ff inet 192.168.35.9/32 scope link cilium_host valid_lft forever preferred_lft forever inet 192.168.34.37/32 scope link cilium_host valid_lft forever preferred_lft forever inet 192.168.57.107/32 scope link cilium_host valid_lft forever preferred_lft forever inet6 fe80::6403:3cff:fe07:8c47/64 scope link valid_lft forever preferred_lft forever ``` This was because in CRD-backed IPAM modes, we wait until we fully sync with K8s in order to derive the VPC CIDR, which becomes the pod CIDR on the node. Since the router IP restoration logic was using a different pod CIDR during the router IP validation check, it was erroneously discarding it. This was observed with: ``` 2021-06-25T13:59:47.816069937Z level=info msg="The router IP (192.168.135.3) considered for restoration does not belong in the Pod CIDR of the node. Discarding old router IP." cidr=10.8.0.0/16 subsys=node ``` This is problematic because the extraneous router IPs could be also assigned to pods, which would break pod connectivity. The fix is to break up the router IP restoration process into 2 parts. The first is to attempt a restoration of the IP from the filesystem (`node_config.h`). We also fetch the router IPs from Kubernetes resources since they were already retrieved prior inside k8s.WaitForNodeInformation(). Then after the CRD-backed IPAM is initialized and started (*Daemon).startIPAM() is called, we attempt the second part. This includes evaluating which IPs (either from filesystem or from K8s) should be set as the router IPs. The IPs from the filesystem take precedence. In case the node was rebooted, the filesystem will be wiped so then we'd rely on the IPs from the K8s resources. At this point in the daemon initialization, we have the correct CIDR range as the pod CIDR range to validate the chosen IP. Fixes: beb8bde ("k8s, node: Restore router IPs (`cilium_host`) from K8s resource") Signed-off-by: Chris Tarazi <chris@isovalent.com>
1dd77af
to
ee8f715
Compare
CI has passed, pushed small change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Merging this PR to unblock 1.10.2 release |
@christarazi, Was this issue existed in all 1.9.x versions? |
@Purushotham233 No, because this PR fixes #16307 which was released in v1.10.x, not v1.9.x. Everything should be fixed in terms of the aforementioned PR and v1.9. |
Hi @christarazi,I came across the same problem with cilium version v1.10.4 In my case, IPAM is crd mode. upon Cilium restart, it was very likely that cilium_host was assigned an additional IP. For example, pod cidr 10.24.0.0/26 is allocated for one node, and originally
After I rebuild the
As I try to read the code and cilium log, we can see that 2 parts of router ip restoration both get Howerver, in crd mode, cidr is nil, thus leading the log
As the code shows, in crd mode, cidr is from if ipam := option.Config.IPAMMode(); ipam == ipamOption.IPAMCRD || ipam == ipamOption.IPAMENI || ipam == ipamOption.IPAMAlibabaCloud {
// The native routing CIDR is the pod CIDR in these IPAM modes.
cidr = option.Config.IPv4NativeRoutingCIDR()
} else {
cidr = node.GetIPv4AllocRange()
} But the if n.conf.IPAMMode() == ipamOption.IPAMENI || n.conf.IPAMMode() == ipamOption.IPAMAzure || n.conf.IPAMMode() == ipamOption.IPAMAlibabaCloud {
if vpcCIDR := deriveVpcCIDR(n.ownNode); vpcCIDR != nil {
if nativeCIDR := n.conf.IPv4NativeRoutingCIDR(); nativeCIDR != nil {
logFields := logrus.Fields{
"vpc-cidr": vpcCIDR.String(),
option.IPv4NativeRoutingCIDR: nativeCIDR.String(),
}
ranges4, _ := ip.CoalesceCIDRs([]*net.IPNet{nativeCIDR.IPNet, vpcCIDR.IPNet})
if len(ranges4) != 1 {
log.WithFields(logFields).Fatal("Native routing CIDR does not contain VPC CIDR.")
} else {
log.WithFields(logFields).Info("Ignoring autodetected VPC CIDR.")
}
} else {
log.WithFields(logrus.Fields{
"vpc-cidr": vpcCIDR.String(),
}).Info("Using autodetected VPC CIDR.")
n.conf.SetIPv4NativeRoutingCIDR(vpcCIDR)
}
} else {
minimumReached = false
}
} https://github.com/cilium/cilium/blob/v1.10.4/pkg/ipam/crd.go#L273 So maybe code here should not include condition IPAMCRD, or maybe the usage in my case is not correctly? Any help would be appreciated. if ipam := option.Config.IPAMMode(); ipam == ipamOption.IPAMCRD || ipam == ipamOption.IPAMENI || ipam == ipamOption.IPAMAlibabaCloud {
// The native routing CIDR is the pod CIDR in these IPAM modes.
cidr = option.Config.IPv4NativeRoutingCIDR()
} else {
cidr = node.GetIPv4AllocRange()
} |
When running in ENI or Alibaba IPAM mode, or any CRD-backed IPAM mode
("crd") and upon Cilium restart, it was very likely that
cilium_host
was assigned an additional IP. Below is a case where Cilium was
restarted 3 times, hence getting 3 additional router IPs:
This was because in CRD-backed IPAM modes, we wait until we fully sync
with K8s in order to derive the VPC CIDR, which becomes the pod CIDR on
the node. Since the router IP restoration logic was using a different
pod CIDR during the router IP validation check, it was erroneously
discarding it. This was observed with:
This is problematic because the extraneous router IPs could be also
assigned to pods, which would break pod connectivity.
The fix is to break up the router IP restoration process into 2 parts.
The first is to attempt a restoration of the IP from the filesystem
(
node_config.h
). We also fetch the router IPs from Kubernetesresources since they were already retrieved prior inside
k8s.WaitForNodeInformation().
Then after the CRD-backed IPAM is initialized and started
(*Daemon).startIPAM() is called, we attempt the second part. This
includes evaluating which IPs (either from filesystem or from K8s)
should be set as the router IPs. The IPs from the filesystem take
precedence. In case the node was rebooted, the filesystem will be wiped
so then we'd rely on the IPs from the K8s resources. At this point in
the daemon initialization, we have the correct CIDR range as the pod
CIDR range to validate the chosen IP.
Fixes: beb8bde ("k8s, node: Restore router IPs (
cilium_host
) fromK8s resource")
Signed-off-by: Chris Tarazi chris@isovalent.com
Fixes: #16520