Issues with EKS module bootstrap and Cilium with newly released EKS AL2023 #1678

ajvn · 2024-02-20T09:04:33Z

What happened:
Hello folks, we've been testing recently released EKS optimized AL2023 AMI, and have some feedback to provide.
While I'm aware that these might not be strictly issue with AMI itself, I believe large enough part of your user base has similar setup, so it's probably beneficial to let you know about issues we've encountered.

Issue with user data part of launch configuration using EKS Terraform module v19.20.0

When provided as part of pre_bootstrap_user_data with enable_bootstrap_user_data set to false, it seems like nodeadm cannot parse it, and it fails during boot with Could not find NodeConfig within UserData.
This can be sorted by using pure YAML config file provided via user_data_template_path and it looks something like this:

---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    name: ${cluster_name}
    apiServerEndpoint: ${cluster_endpoint}
    certificateAuthority: ${cluster_auth_base64}
    cidr: ${cluster_service_ipv4_cidr}

Cilium agent fails to start due to 2 default routes
There are 2 default routes using both ens5 and ens6 interfaces, and Cilium fails with something like this:

level=error msg="Start hook failed" error="daemon creation failed: error while initializing daemon: failed while reinitializing datapath: unable to install ip rule for ENI multi-node NodePort: failed to find interface with default route: Found multiple default routes with the same priority: {Ifindex: 2 Dst: 0.0.0.0/0 Src: <x.x.x.x> Gw: <x.x.x.x> Flags: [] Table: 254 Realm: 0} vs {Ifindex: 3 Dst: 0.0.0.0/0 Src: <x.x.x.x> Gw: <x.x.x.x> Flags: [] Table: 254 Realm: 0}" function="cmd.newDaemonPromise.func1 (daemon_main.go:1663)" subsys=hive

When locally testing bare bone AL2023 VDI image, there's only one default route, granted underlying virtualization and hardware are vastly different.

Solution to this is dropping second default route on ens6 interface:

$ sudo ip route delete default via <x.x.x.x>

After these 2 conditions are handled, node can join the cluster, is Ready and Cilium agent is running.

What you expected to happen:
As this is Tech Preview, it's already in pretty good state.

How to reproduce it (as minimally and precisely as possible):
Use EKS optimized AMI from initial announcement for one of the nodegroups in cluster where Cilium is used.

Anything else we need to know?:
I don't think so.

Environment:

AWS Region: eu-west-1
Instance Type(s): m6i-xlarge
EKS Platform version (use aws eks describe-cluster --name <name> --query cluster.platformVersion): eks.12
Kubernetes version (use aws eks describe-cluster --name <name> --query cluster.version): 1.26
AMI Version: ami-0bec930579ece14e9
Kernel (e.g. uname -a): 6.1.75-99.163.amzn2023.x86_64
Release information (run cat /etc/eks/release on a node):

BASE_AMI_ID="ami-0334b53123ef7ceb5"
BUILD_TIME="Tue Feb 13 04:50:11 UTC 2024"
BUILD_KERNEL="6.1.75-99.163.amzn2023.x86_64"
ARCH="x86_64"

The text was updated successfully, but these errors were encountered:

bryantbiggs · 2024-02-20T12:49:14Z

For 1 - I have it on my list to add AL2023 support to the EKS module so that it provides similar functionality like the other OSes we support, but glad you were able to sort it out with the generic user_data_template_path! cc @ptailor1193

jdn5126 · 2024-02-23T20:15:29Z

@ajvn I can take a look at number 2. There should be a default route via the primary ENI, which I assume is ens5 in your case, right? That will already be configured before any CNI/IPAM is launched. Then it sounds like you are saying there are additional default routes via each secondary ENI, which does seem wrong.

Since you are running Cilium, any secondary ENIs would be attached due to EC2 API calls from Cilium. So I am confused about the ordering here. Was the AWS VPC CNI running before you tried to run Cilium?

It is possible that some other entity present in AL2023 is installing these routes, but the AMI should be disabling all udev triggers and systemd-networkd should not be getting involved here.

ajvn · 2024-02-24T15:43:41Z

I've added a new node group to test out AL2023 to an existing cluster which is already using Cilium. Other node groups are using Ubuntu take on EKS, there's single default route in those nodes, only for ens5 interface. Something is adding route for ens6 interface on AL2023, and also making it default, I don't know what.

ajvn · 2024-02-29T15:15:08Z

Quick update, Cilium failure seems to happen only if node fails to join the cluster initially due to some error (e.g. wrongly parsed nodeadm configuration). Although there are still two default routes on both ens5 and ens6, Cilium works correctly.
However, if Cilium pod is deleted, and starts again, it fails with aforementioned message.

This leads me to believe that perhaps second route is getting added after kubelet is started and Cilium agent pod starts. I'm still not sure what's adding it.

Edit:
Another update after small testing, it seems to be added by Cilium agent pod. I'm investigating further.

bryantbiggs · 2024-03-01T00:15:14Z

point 1 Is now resolved in v20.5.0 of the terraform-aws-eks module FYI

ajvn · 2024-03-01T07:41:30Z

point 1 Is not resolved in v20.5.0 of the terraform-aws-eks module FYI

No worries, and thank you for tackling it this quick. We have to address breaking changes for v20.x upgrade anyway, and for the time being user_data_template_path works great, especially with this clarification on expected format. It's easy to chain multiple bootstrap files/scripts.

bryantbiggs · 2024-03-01T12:35:28Z

whoops - there was a typo there not -> now 😅

ajvn · 2024-03-01T13:18:12Z

Ah, alright 😄 . I'll let you know once we get around upgrading our module and testing this functionality out.

ajvn · 2024-03-04T15:57:53Z

@bryantbiggs Gave it a go, only "problem" is if cluster_service_ipv4_cidr is not specified (as it wasn't needed during cluster creation when this cluster was created, I haven't checked if it's now mandatory), nodeadm-config.service will fail as cidr key is not specified in the configuration file.

If it's specified in module itself, it will also be added to non-AL2023 nodes user data that are part of that module block in a form of exported environment variable which is not used as bootstrap script argument, but I haven't checked if it's used somewhere else.
user_data_template_path still works and allows us to hardcode cidr value for that node group only, so for now we'll keep using that.

Doing the same as part of

cloudinit_pre_nodeadm = [{
        content = <<-EOT
        ...
        EOT
        content_type = "application/node.eks.aws"
}]
cloudinit_post_nodeadm = [{
        content = <<-EOT
        ...
        EOT
        content_type = "text/x-shellscript; charset=\"us-ascii\""
}]

should yield same results, but still haven't tested it, will do soon.

bryantbiggs · 2024-03-04T21:57:44Z

thank you for that info - I assume you are using self-managed nodegroups? For managed nodegroups we'll need to wait for the next provider release which has the new AMI types #1696 (comment)

In terms of the CIDR, are you saying that nodeadm-config.service is failing to start because its looking for a cidr value? If you have a repro that I can look at, along with what you are expecting to see I can take a deeper look

ajvn · 2024-03-05T09:29:04Z

We are using managed nodegroups, but with a custom AMI (for now using one obtained via aws ssm ... command provided in #1672).

You can reproduce it like this:

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "20.5.0"
  cluster_name        = local.cluster_name
  cluster_version     = local.cluster_version
  cluster_auth_base64 = local.cluster_auth_base64

  eks_managed_node_group_defaults = {
    # settings such as taints, timeouts, block device mappings etc...
  }
  
  eks_managed_node_groups = {

  # more node groups

    general-m6i-al2023 = {
      ami_id                     = "<id-obtained-by-aws-ssm>"
      create_launch_template     = true
      platform                   = "al2023"
      enable_bootstrap_user_data = true
      desired_size               = 1
      min_size                   = 1
      max_size                   = 2
      instance_types             = ["m6i.xlarge"]
    }
  }
}

So if cluster_service_ipv4_cidr is not specified, cidr field won't be present in the template, nodeadm-config.service will fail to start because of that, and node won't be able to join the cluster.

I tried cloudinit_pre_nodeadm and cloudinit_post_nodeadm options, however cloudinit_post_nodeadm section is not being added to user-data and it won't get executed, but that's another issue.
Perhaps it's not supported when node groups are configured similarly to those in this example?

bryantbiggs · 2024-03-05T15:04:38Z

@cartermckinnon / @ndbaker1 does nodeadm require a value for the cidr field?

Smana · 2024-03-05T16:21:50Z

I got the same behavior, my repository is here. I'm gonna roll back this PR because cilium isn't starting with the error

failed to start: daemon creation failed: error while initializing daemon: failed while reinitializing datapath: unable to install ip rule for ENI multi-node NodePort: failed to find interface with default route: Found multiple default routes with the same priority: {Ifindex: 2 Dst: 0.0.0.0/0 Src: 10.0.26.111 Gw: 10.0.16.1 Flags: [] Table: 254 Realm: 0} vs {Ifindex: 16 Dst: 0.0.0.0/0 Src: 10. │
│ 0.29.146 Gw: 10.0.16.1 Flags: [] Table: 254 Realm: 0}" subsys=daemon

PS: Just a precision, the error only appears on Karpenter nodes... I might be missing something there. digging...

Let me know if you want me to run a few tests :)

ndbaker1 · 2024-03-05T23:11:28Z

@cartermckinnon / @ndbaker1 does nodeadm require a value for the cidr field?

it does, takes the cluster service cidr (IPv4/v6)

ajvn · 2024-03-07T15:17:59Z

After more testing, issue appears to be due to both default interface, and Cilium created interface create route with same priority.

Because AL2023 uses systemd-networkd configuration, [DHCP]RouteMetric= defaults to 1024, meaning that it applies same metric (priority) to all of the interfaces.
This is problematic for Cilium, hence it complains, and it also explains why dropping route created by Cilium also fixes the issue.

There's this configuration that seems to address all of the interfaces /usr/lib/systemd/network/80-ec2.network, so changing priority there wouldn't be helpful in this case.

The way we worked around this is by adding following to launch template:

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="BOUNDARY"

--BOUNDARY
Content-Type: application/node.eks.aws

---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    name: ${cluster_name}
    apiServerEndpoint: ${cluster_endpoint}
    certificateAuthority: ${cluster_auth_base64}
    cidr: ${cluster_service_ipv4_cidr}

--BOUNDARY
Content-Type: text/x-shellscript;

# more bootstrap content

--BOUNDARY
Content-Type: text/x-shellscript;

#!/usr/bin/env bash
cat > /etc/systemd/network/05-ens5.network << EOF
[Match]
Name=ens5

[Network]
DHCP=yes

[DHCP]
RouteMetric=1000
EOF

systemctl restart systemd-networkd.service

--BOUNDARY--

You might want to adjust [DHCP] to [DHCPv4] or [DHCPv6], but in our environment it works fine with just [DHCP].

With default values, after Cilium creates and attaches new interface, you'd see something like this:

$ sudo ip route
default via x.x.x.x dev ens5 proto dhcp src x.x.x.x metric 1024
default via y.y.y.y dev ens6 proto dhcp src y.y.y.y metric 1024
x.x.x.x via x.x.x.x dev ens5 proto dhcp src x.x.x.x metric 1024
y.y.y.y via y.y.y.y dev ens6 proto dhcp src y.y.y.y metric 1024
...

And then Cilium would report original issue and go into crashing loop.
With adjustment to default interface, which I've posted above, you'll get something like this:

$ sudo ip route
default via x.x.x.x dev ens5 proto dhcp src x.x.x.x metric 1000
default via y.y.y.y dev ens6 proto dhcp src y.y.y.y metric 1024
x.x.x.x via x.x.x.x dev ens5 proto dhcp src x.x.x.x metric 1000
y.y.y.y via y.y.y.y dev ens6 proto dhcp src y.y.y.y metric 1024
...

Cilium is happy, and everything works with all of the benefits of AL2023.

In our environment, ens5 is default interface, and ens6 is one created and attached by Cilium operator.

Proper solution to this would either be on AWS side, where default (user attached interface(s)) would default to lower priority than default 1024, or on Cilium side, where there's a configuration option to change route metric of Cilium created interface.

fstr · 2024-03-20T16:35:39Z

@ajvn Thank you, this also fixed it for me.

Note that Cilium docs about AWS ENI config also mention a systemd-networkd config in https://docs.cilium.io/en/latest/network/concepts/ipam/eni/#node-configuration

# cat <<EOF >/etc/systemd/network/99-unmanaged-devices.network
[Match]
Name=eth[1-9]*

[Link]
Unmanaged=yes
EOF
# systemctl restart systemd-networkd

I have implemented this as:

    cat <<EOF >/etc/systemd/network/99-unmanaged-devices.network
    [Match]
    Name=!ens5

    [Link]
    Unmanaged=yes
    EOF

A second observation: I had a single m7i-flex.xlarge which used enp39s0 as it's main network device. This broke all my ens*, ens+ and ens5 patterns. I probably have to change it to en or exclude the m7i-flex for now.

taliesins · 2024-06-06T16:16:19Z

Something like this?

eth+,ens+,enp+,pod-id-link+

pod-id-link+ is for access to things like aws's metadata service [fd00:ec2::23]

bryantbiggs mentioned this issue Feb 22, 2024

feat: Add support for AL2023 nodeadm user data terraform-aws-modules/terraform-aws-eks#2942

Merged

6 tasks

Smana mentioned this issue Mar 5, 2024

Revert "feat(eks): use of AL2023 AMIs" Smana/cloud-native-ref#158

Merged

ajvn mentioned this issue Mar 7, 2024

Verify cilium compatibility with Amazon Linux 2023 cilium/cilium#24397

Open

bryantbiggs mentioned this issue Mar 8, 2024

fix: Update AWS provider version to support AL2023_* AMI types; ensure AL2023 user data receives cluster service CIDR terraform-aws-modules/terraform-aws-eks#2960

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with EKS module bootstrap and Cilium with newly released EKS AL2023 #1678

Issues with EKS module bootstrap and Cilium with newly released EKS AL2023 #1678

ajvn commented Feb 20, 2024

bryantbiggs commented Feb 20, 2024

jdn5126 commented Feb 23, 2024

ajvn commented Feb 24, 2024 •

edited

Loading

ajvn commented Feb 29, 2024 •

edited

Loading

bryantbiggs commented Mar 1, 2024 •

edited

Loading

ajvn commented Mar 1, 2024

bryantbiggs commented Mar 1, 2024

ajvn commented Mar 1, 2024

ajvn commented Mar 4, 2024 •

edited

Loading

bryantbiggs commented Mar 4, 2024

ajvn commented Mar 5, 2024

bryantbiggs commented Mar 5, 2024

Smana commented Mar 5, 2024 •

edited

Loading

ndbaker1 commented Mar 5, 2024

ajvn commented Mar 7, 2024

fstr commented Mar 20, 2024

taliesins commented Jun 6, 2024 •

edited

Loading

Issues with EKS module bootstrap and Cilium with newly released EKS AL2023 #1678

Issues with EKS module bootstrap and Cilium with newly released EKS AL2023 #1678

Comments

ajvn commented Feb 20, 2024

bryantbiggs commented Feb 20, 2024

jdn5126 commented Feb 23, 2024

ajvn commented Feb 24, 2024 • edited Loading

ajvn commented Feb 29, 2024 • edited Loading

bryantbiggs commented Mar 1, 2024 • edited Loading

ajvn commented Mar 1, 2024

bryantbiggs commented Mar 1, 2024

ajvn commented Mar 1, 2024

ajvn commented Mar 4, 2024 • edited Loading

bryantbiggs commented Mar 4, 2024

ajvn commented Mar 5, 2024

bryantbiggs commented Mar 5, 2024

Smana commented Mar 5, 2024 • edited Loading

ndbaker1 commented Mar 5, 2024

ajvn commented Mar 7, 2024

fstr commented Mar 20, 2024

taliesins commented Jun 6, 2024 • edited Loading

ajvn commented Feb 24, 2024 •

edited

Loading

ajvn commented Feb 29, 2024 •

edited

Loading

bryantbiggs commented Mar 1, 2024 •

edited

Loading

ajvn commented Mar 4, 2024 •

edited

Loading

Smana commented Mar 5, 2024 •

edited

Loading

taliesins commented Jun 6, 2024 •

edited

Loading