Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipam: Fix invalid PodCIDR in CiliumNode in ENI/Azure/MultiPool mode #26663

Merged
merged 2 commits into from
Aug 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
28 changes: 18 additions & 10 deletions pkg/datapath/linux/node.go
Original file line number Diff line number Diff line change
Expand Up @@ -1081,26 +1081,29 @@ func (n *linuxNodeHandler) enableIPsecIPv4(newNode *nodeTypes.Node, nodeID uint1
var spi uint8
var errs error

if !n.nodeConfig.EnableIPv4 || newNode.IPv4AllocCIDR == nil {
if !n.nodeConfig.EnableIPv4 || (newNode.IPv4AllocCIDR == nil && !n.subnetEncryption()) {
return nil
}

new4Net := newNode.IPv4AllocCIDR.IPNet
wildcardIP := net.ParseIP(wildcardIPv4)
wildcardCIDR := &net.IPNet{IP: wildcardIP, Mask: net.IPv4Mask(0, 0, 0, 0)}

err := ipsec.IPsecDefaultDropPolicy(false)
errs = errors.Join(errs, upsertIPsecLog(err, "default-drop IPv4", wildcardCIDR, wildcardCIDR, spi))

if newNode.IsLocal() {
errs = errors.Join(errs, n.replaceNodeIPSecInRoute(new4Net))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, on EKS & AKS, we would install a route using the internal auto-generated IPv4/IPv6AllocRange. Do we have an idea of the potential consequences of that? The second commit suggests that the auto-generated CIDR could clash with real CIDR, although unlikely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yes, before this PR installed a route for the auto-generated pod CIDR. It used an auto-generated pod CIDR, because that's Cilium's ancient fall-back if no pod CIDR is configured. That auto-generated pod CIDR is not used anywhere (see also #27018) and we're planning to eventually remove it, since it causes confusion for both engineers and users.

Particularly in EKS and AKS, IPs are not allocated from any pod CIDR (be it auto-generated or manually configured), and so using a node's pod CIDR in AKS or EKS is making assumptions about the IP range that does not hold true. By installing a route for each node's (auto-generated) pod CIDR (as we did before the PR), we routed all traffic belonging to those auto-generated CIDRs to the encrypt device.

I'm not sure, what the consequence of this were. We have not received bug reports about this, I think mostly because the auto-generated CIDR is by design unlikely to clash with a real CIDR. Since the encrypt device is probably also the default egress device, the route might also not have affected any issues for CIDRs that were routable on the default egress device anyway.

In any case, my primary motivation for this PR is the clean up unconditional uses of "pod CIDR" in the agent code, because such code will not be compatible with pod-CIDR-less IPAM modes such as EKS/AKS or Multi-Pool. IPSec already has "subnet encryption" as a pod CIDR replacement, but for some reason it still unconditionally used the pod CIDR, which this PR attempts to fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick test on EKS to simulate the impact of a clash between the auto-generated CIDR and the pod subnet. Pod subnet was 192.168.0.0/16 and I manually installed a route like the one removed in this PR but with CIDR 192.168.128.0/17.

That caused some existing connections to be interrupted and prevented me from running the connectivity tests. I didn't debug further. There were no drops reported by Cilium or XFRM errors.

I have no idea how unlikely it is for the CIDRs to clash, but maybe we need to add a release note (something like below or can we make the "potential" more precise?) and backport to just v1.14?

Fix potential cross-node connectivity issue when IPsec is enabled with ENI or Azure IPAM modes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for that test 🚀

I have no idea how unlikely it is for the CIDRs to clash, but maybe we need to add a release note (something like below or can we make the "potential" more precise?) and backport to just v1.14?

In the end, it depends on user configuration. The auto-generated CIDR is of the form "10.x.0.0/16" where x is derived from the node's IP. So I think clashes can happen - but I'm not aware of any bug reports, probably because most users are using IPs in the range of "10.0.0.0/16".

I think we can absolutely backport this to v1.14. I removed the backport label because there have not been bug reports and this has likely been present for years, but given the risk should be low, it does fit our backporting policy.


localIP := newNode.GetCiliumInternalIP(false)
if localIP == nil {
return errs
}

if n.subnetEncryption() {
// FIXME: Remove the following four lines in Cilium v1.16
if localCIDR := n.nodeAddressing.IPv4().AllocationCIDR(); localCIDR != nil {
// This removes a bogus route that Cilium installed prior to v1.15
_ = route.Delete(n.createNodeIPSecInRoute(localCIDR.IPNet))
}

localNodeInternalIP, err := getV4LinkLocalIP()
if err != nil {
log.WithError(err).Error("Failed to get local IPv4 for IPsec configuration")
Expand All @@ -1126,6 +1129,7 @@ func (n *linuxNodeHandler) enableIPsecIPv4(newNode *nodeTypes.Node, nodeID uint1
}

localCIDR := n.nodeAddressing.IPv4().AllocationCIDR().IPNet
errs = errors.Join(errs, n.replaceNodeIPSecInRoute(localCIDR))
spi, err = ipsec.UpsertIPsecEndpoint(localCIDR, wildcardCIDR, localIP, wildcardIP, 0, ipsec.IPSecDirIn, false)
errs = errors.Join(errs, upsertIPsecLog(err, "in IPv4", localCIDR, wildcardCIDR, spi))
}
Expand Down Expand Up @@ -1154,7 +1158,7 @@ func (n *linuxNodeHandler) enableIPsecIPv4(newNode *nodeTypes.Node, nodeID uint1
}
} else {
remoteCIDR := newNode.IPv4AllocCIDR.IPNet
n.replaceNodeIPSecOutRoute(new4Net)
n.replaceNodeIPSecOutRoute(remoteCIDR)
spi, err = ipsec.UpsertIPsecEndpoint(wildcardCIDR, remoteCIDR, localIP, remoteIP, nodeID, ipsec.IPSecDirOut, false)
errs = errors.Join(errs, upsertIPsecLog(err, "out IPv4", wildcardCIDR, remoteCIDR, spi))
}
Expand All @@ -1166,11 +1170,10 @@ func (n *linuxNodeHandler) enableIPsecIPv6(newNode *nodeTypes.Node, nodeID uint1
var errs error
var spi uint8

if !n.nodeConfig.EnableIPv6 || newNode.IPv6AllocCIDR == nil {
if !n.nodeConfig.EnableIPv6 || (newNode.IPv6AllocCIDR == nil && !n.subnetEncryption()) {
return nil
}

new6Net := newNode.IPv6AllocCIDR.IPNet
wildcardIP := net.ParseIP(wildcardIPv6)
wildcardCIDR := &net.IPNet{IP: wildcardIP, Mask: net.CIDRMask(0, 128)}

Expand All @@ -1181,14 +1184,18 @@ func (n *linuxNodeHandler) enableIPsecIPv6(newNode *nodeTypes.Node, nodeID uint1
errs = errors.Join(errs, upsertIPsecLog(err, "default-drop IPv6", wildcardCIDR, wildcardCIDR, spi))

if newNode.IsLocal() {
errs = errors.Join(errs, n.replaceNodeIPSecInRoute(new6Net))

localIP := newNode.GetCiliumInternalIP(true)
if localIP == nil {
return errs
}

if n.subnetEncryption() {
// FIXME: Remove the following four lines in Cilium v1.16
if localCIDR := n.nodeAddressing.IPv6().AllocationCIDR(); localCIDR != nil {
// This removes a bogus route that Cilium installed prior to v1.15
_ = route.Delete(n.createNodeIPSecInRoute(localCIDR.IPNet))
}

localNodeInternalIP, err := getV6LinkLocalIP()
if err != nil {
log.WithError(err).Error("Failed to get local IPv6 for IPsec configuration")
Expand All @@ -1204,6 +1211,7 @@ func (n *linuxNodeHandler) enableIPsecIPv6(newNode *nodeTypes.Node, nodeID uint1
}
} else {
localCIDR := n.nodeAddressing.IPv6().AllocationCIDR().IPNet
errs = errors.Join(errs, n.replaceNodeIPSecInRoute(localCIDR))
spi, err = ipsec.UpsertIPsecEndpoint(localCIDR, wildcardCIDR, localIP, wildcardIP, 0, ipsec.IPSecDirIn, false)
errs = errors.Join(errs, upsertIPsecLog(err, "in IPv6", localCIDR, wildcardCIDR, spi))
}
Expand Down Expand Up @@ -1233,7 +1241,7 @@ func (n *linuxNodeHandler) enableIPsecIPv6(newNode *nodeTypes.Node, nodeID uint1
}
} else {
remoteCIDR := newNode.IPv6AllocCIDR.IPNet
n.replaceNodeIPSecOutRoute(new6Net)
n.replaceNodeIPSecOutRoute(remoteCIDR)
spi, err := ipsec.UpsertIPsecEndpoint(wildcardCIDR, remoteCIDR, localIP, remoteIP, nodeID, ipsec.IPSecDirOut, false)
errs = errors.Join(errs, upsertIPsecLog(err, "out IPv6", wildcardCIDR, remoteCIDR, spi))
}
Expand Down
21 changes: 10 additions & 11 deletions pkg/nodediscovery/nodediscovery.go
Original file line number Diff line number Diff line change
Expand Up @@ -507,17 +507,16 @@ func (n *NodeDiscovery) mutateNodeResource(nodeResource *ciliumv2.CiliumNode) er
}
pchaigno marked this conversation as resolved.
Show resolved Hide resolved
}

switch option.Config.IPAM {
case ipamOption.IPAMClusterPool, ipamOption.IPAMClusterPoolV2:
// We want to keep the podCIDRs untouched in these IPAM modes because
// the operator will verify if it can assign such podCIDRs.
// If the user was running in non-IPAM Operator mode and then switched
// to IPAM Operator, then it is possible that the previous cluster CIDR
// from the old IPAM mode differs from the current cluster CIDR set in
// the operator.
// There is a chance that the operator won't be able to allocate these
// podCIDRs, resulting in an error in the CiliumNode status.
default:
if option.Config.IPAM == ipamOption.IPAMKubernetes {
// We only want to copy the PodCIDR from the Kubernetes Node resource to
// the CiliumNode resource in IPAM Kubernetes mode. In other PodCIDR
// based IPAM modes (such as ClusterPool or MultiPool), the operator
// will set the PodCIDRs of the CiliumNode and those might be different
// from the ones assigned by Kubernetes.
// For non-podCIDR based IPAM modes (e.g. ENI, Azure, AlibabaCloud), there
// is no such thing as a podCIDR to begin with. In those cases, the
// IPv4/IPv6AllocRange is auto-generated and otherwise unused, so it does not
// make sense to copy it into the CiliumNode it either.
nodeResource.Spec.IPAM.PodCIDRs = []string{}
if cidr := node.GetIPv4AllocRange(); cidr != nil {
nodeResource.Spec.IPAM.PodCIDRs = append(nodeResource.Spec.IPAM.PodCIDRs, cidr.String())
Expand Down