Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with EKS module bootstrap and Cilium with newly released EKS AL2023 #1678

Open
ajvn opened this issue Feb 20, 2024 · 17 comments
Open

Comments

@ajvn
Copy link

ajvn commented Feb 20, 2024

What happened:
Hello folks, we've been testing recently released EKS optimized AL2023 AMI, and have some feedback to provide.
While I'm aware that these might not be strictly issue with AMI itself, I believe large enough part of your user base has similar setup, so it's probably beneficial to let you know about issues we've encountered.

  1. Issue with user data part of launch configuration using EKS Terraform module v19.20.0

When provided as part of pre_bootstrap_user_data with enable_bootstrap_user_data set to false, it seems like nodeadm cannot parse it, and it fails during boot with Could not find NodeConfig within UserData.
This can be sorted by using pure YAML config file provided via user_data_template_path and it looks something like this:

---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    name: ${cluster_name}
    apiServerEndpoint: ${cluster_endpoint}
    certificateAuthority: ${cluster_auth_base64}
    cidr: ${cluster_service_ipv4_cidr}
  1. Cilium agent fails to start due to 2 default routes
    There are 2 default routes using both ens5 and ens6 interfaces, and Cilium fails with something like this:
level=error msg="Start hook failed" error="daemon creation failed: error while initializing daemon: failed while reinitializing datapath: unable to install ip rule for ENI multi-node NodePort: failed to find interface with default route: Found multiple default routes with the same priority: {Ifindex: 2 Dst: 0.0.0.0/0 Src: <x.x.x.x> Gw: <x.x.x.x> Flags: [] Table: 254 Realm: 0} vs {Ifindex: 3 Dst: 0.0.0.0/0 Src: <x.x.x.x> Gw: <x.x.x.x> Flags: [] Table: 254 Realm: 0}" function="cmd.newDaemonPromise.func1 (daemon_main.go:1663)" subsys=hive

When locally testing bare bone AL2023 VDI image, there's only one default route, granted underlying virtualization and hardware are vastly different.

Solution to this is dropping second default route on ens6 interface:

$ sudo ip route delete default via <x.x.x.x>

After these 2 conditions are handled, node can join the cluster, is Ready and Cilium agent is running.

What you expected to happen:
As this is Tech Preview, it's already in pretty good state.

How to reproduce it (as minimally and precisely as possible):
Use EKS optimized AMI from initial announcement for one of the nodegroups in cluster where Cilium is used.

Anything else we need to know?:
I don't think so.

Environment:

  • AWS Region: eu-west-1
  • Instance Type(s): m6i-xlarge
  • EKS Platform version (use aws eks describe-cluster --name <name> --query cluster.platformVersion): eks.12
  • Kubernetes version (use aws eks describe-cluster --name <name> --query cluster.version): 1.26
  • AMI Version: ami-0bec930579ece14e9
  • Kernel (e.g. uname -a): 6.1.75-99.163.amzn2023.x86_64
  • Release information (run cat /etc/eks/release on a node):
BASE_AMI_ID="ami-0334b53123ef7ceb5"
BUILD_TIME="Tue Feb 13 04:50:11 UTC 2024"
BUILD_KERNEL="6.1.75-99.163.amzn2023.x86_64"
ARCH="x86_64"
@bryantbiggs
Copy link
Contributor

For 1 - I have it on my list to add AL2023 support to the EKS module so that it provides similar functionality like the other OSes we support, but glad you were able to sort it out with the generic user_data_template_path! cc @ptailor1193

@jdn5126
Copy link
Contributor

jdn5126 commented Feb 23, 2024

@ajvn I can take a look at number 2. There should be a default route via the primary ENI, which I assume is ens5 in your case, right? That will already be configured before any CNI/IPAM is launched. Then it sounds like you are saying there are additional default routes via each secondary ENI, which does seem wrong.

Since you are running Cilium, any secondary ENIs would be attached due to EC2 API calls from Cilium. So I am confused about the ordering here. Was the AWS VPC CNI running before you tried to run Cilium?

It is possible that some other entity present in AL2023 is installing these routes, but the AMI should be disabling all udev triggers and systemd-networkd should not be getting involved here.

@ajvn
Copy link
Author

ajvn commented Feb 24, 2024

I've added a new node group to test out AL2023 to an existing cluster which is already using Cilium. Other node groups are using Ubuntu take on EKS, there's single default route in those nodes, only for ens5 interface. Something is adding route for ens6 interface on AL2023, and also making it default, I don't know what.

@ajvn
Copy link
Author

ajvn commented Feb 29, 2024

Quick update, Cilium failure seems to happen only if node fails to join the cluster initially due to some error (e.g. wrongly parsed nodeadm configuration). Although there are still two default routes on both ens5 and ens6, Cilium works correctly.
However, if Cilium pod is deleted, and starts again, it fails with aforementioned message.

This leads me to believe that perhaps second route is getting added after kubelet is started and Cilium agent pod starts. I'm still not sure what's adding it.

Edit:
Another update after small testing, it seems to be added by Cilium agent pod. I'm investigating further.

@bryantbiggs
Copy link
Contributor

bryantbiggs commented Mar 1, 2024

point 1 Is now resolved in v20.5.0 of the terraform-aws-eks module FYI

@ajvn
Copy link
Author

ajvn commented Mar 1, 2024

point 1 Is not resolved in v20.5.0 of the terraform-aws-eks module FYI

No worries, and thank you for tackling it this quick. We have to address breaking changes for v20.x upgrade anyway, and for the time being user_data_template_path works great, especially with this clarification on expected format. It's easy to chain multiple bootstrap files/scripts.

@bryantbiggs
Copy link
Contributor

whoops - there was a typo there not -> now 😅

@ajvn
Copy link
Author

ajvn commented Mar 1, 2024

Ah, alright 😄 . I'll let you know once we get around upgrading our module and testing this functionality out.

@ajvn
Copy link
Author

ajvn commented Mar 4, 2024

@bryantbiggs Gave it a go, only "problem" is if cluster_service_ipv4_cidr is not specified (as it wasn't needed during cluster creation when this cluster was created, I haven't checked if it's now mandatory), nodeadm-config.service will fail as cidr key is not specified in the configuration file.

If it's specified in module itself, it will also be added to non-AL2023 nodes user data that are part of that module block in a form of exported environment variable which is not used as bootstrap script argument, but I haven't checked if it's used somewhere else.
user_data_template_path still works and allows us to hardcode cidr value for that node group only, so for now we'll keep using that.

Doing the same as part of

cloudinit_pre_nodeadm = [{
        content = <<-EOT
        ...
        EOT
        content_type = "application/node.eks.aws"
}]
cloudinit_post_nodeadm = [{
        content = <<-EOT
        ...
        EOT
        content_type = "text/x-shellscript; charset=\"us-ascii\""
}]

should yield same results, but still haven't tested it, will do soon.

@bryantbiggs
Copy link
Contributor

thank you for that info - I assume you are using self-managed nodegroups? For managed nodegroups we'll need to wait for the next provider release which has the new AMI types #1696 (comment)

In terms of the CIDR, are you saying that nodeadm-config.service is failing to start because its looking for a cidr value? If you have a repro that I can look at, along with what you are expecting to see I can take a deeper look

@ajvn
Copy link
Author

ajvn commented Mar 5, 2024

We are using managed nodegroups, but with a custom AMI (for now using one obtained via aws ssm ... command provided in #1672).

You can reproduce it like this:

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "20.5.0"
  cluster_name        = local.cluster_name
  cluster_version     = local.cluster_version
  cluster_auth_base64 = local.cluster_auth_base64

  eks_managed_node_group_defaults = {
    # settings such as taints, timeouts, block device mappings etc...
  }
  
  eks_managed_node_groups = {

  # more node groups

    general-m6i-al2023 = {
      ami_id                     = "<id-obtained-by-aws-ssm>"
      create_launch_template     = true
      platform                   = "al2023"
      enable_bootstrap_user_data = true
      desired_size               = 1
      min_size                   = 1
      max_size                   = 2
      instance_types             = ["m6i.xlarge"]
    }
  }
}

So if cluster_service_ipv4_cidr is not specified, cidr field won't be present in the template, nodeadm-config.service will fail to start because of that, and node won't be able to join the cluster.

I tried cloudinit_pre_nodeadm and cloudinit_post_nodeadm options, however cloudinit_post_nodeadm section is not being added to user-data and it won't get executed, but that's another issue.
Perhaps it's not supported when node groups are configured similarly to those in this example?

@bryantbiggs
Copy link
Contributor

@cartermckinnon / @ndbaker1 does nodeadm require a value for the cidr field?

@Smana
Copy link

Smana commented Mar 5, 2024

I got the same behavior, my repository is here. I'm gonna roll back this PR because cilium isn't starting with the error

failed to start: daemon creation failed: error while initializing daemon: failed while reinitializing datapath: unable to install ip rule for ENI multi-node NodePort: failed to find interface with default route: Found multiple default routes with the same priority: {Ifindex: 2 Dst: 0.0.0.0/0 Src: 10.0.26.111 Gw: 10.0.16.1 Flags: [] Table: 254 Realm: 0} vs {Ifindex: 16 Dst: 0.0.0.0/0 Src: 10. │
│ 0.29.146 Gw: 10.0.16.1 Flags: [] Table: 254 Realm: 0}" subsys=daemon

PS: Just a precision, the error only appears on Karpenter nodes... I might be missing something there. digging...

Let me know if you want me to run a few tests :)

@ndbaker1
Copy link
Member

ndbaker1 commented Mar 5, 2024

@cartermckinnon / @ndbaker1 does nodeadm require a value for the cidr field?

it does, takes the cluster service cidr (IPv4/v6)

@ajvn
Copy link
Author

ajvn commented Mar 7, 2024

After more testing, issue appears to be due to both default interface, and Cilium created interface create route with same priority.

Because AL2023 uses systemd-networkd configuration, [DHCP]RouteMetric= defaults to 1024, meaning that it applies same metric (priority) to all of the interfaces.
This is problematic for Cilium, hence it complains, and it also explains why dropping route created by Cilium also fixes the issue.

There's this configuration that seems to address all of the interfaces /usr/lib/systemd/network/80-ec2.network, so changing priority there wouldn't be helpful in this case.

The way we worked around this is by adding following to launch template:

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="BOUNDARY"

--BOUNDARY
Content-Type: application/node.eks.aws

---
apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
  cluster:
    name: ${cluster_name}
    apiServerEndpoint: ${cluster_endpoint}
    certificateAuthority: ${cluster_auth_base64}
    cidr: ${cluster_service_ipv4_cidr}

--BOUNDARY
Content-Type: text/x-shellscript;

# more bootstrap content

--BOUNDARY
Content-Type: text/x-shellscript;

#!/usr/bin/env bash
cat > /etc/systemd/network/05-ens5.network << EOF
[Match]
Name=ens5

[Network]
DHCP=yes

[DHCP]
RouteMetric=1000
EOF

systemctl restart systemd-networkd.service

--BOUNDARY--

You might want to adjust [DHCP] to [DHCPv4] or [DHCPv6], but in our environment it works fine with just [DHCP].

With default values, after Cilium creates and attaches new interface, you'd see something like this:

$ sudo ip route
default via x.x.x.x dev ens5 proto dhcp src x.x.x.x metric 1024
default via y.y.y.y dev ens6 proto dhcp src y.y.y.y metric 1024
x.x.x.x via x.x.x.x dev ens5 proto dhcp src x.x.x.x metric 1024
y.y.y.y via y.y.y.y dev ens6 proto dhcp src y.y.y.y metric 1024
...

And then Cilium would report original issue and go into crashing loop.
With adjustment to default interface, which I've posted above, you'll get something like this:

$ sudo ip route
default via x.x.x.x dev ens5 proto dhcp src x.x.x.x metric 1000
default via y.y.y.y dev ens6 proto dhcp src y.y.y.y metric 1024
x.x.x.x via x.x.x.x dev ens5 proto dhcp src x.x.x.x metric 1000
y.y.y.y via y.y.y.y dev ens6 proto dhcp src y.y.y.y metric 1024
...

Cilium is happy, and everything works with all of the benefits of AL2023.

In our environment, ens5 is default interface, and ens6 is one created and attached by Cilium operator.

Proper solution to this would either be on AWS side, where default (user attached interface(s)) would default to lower priority than default 1024, or on Cilium side, where there's a configuration option to change route metric of Cilium created interface.

@fstr
Copy link

fstr commented Mar 20, 2024

@ajvn Thank you, this also fixed it for me.

Note that Cilium docs about AWS ENI config also mention a systemd-networkd config in https://docs.cilium.io/en/latest/network/concepts/ipam/eni/#node-configuration

# cat <<EOF >/etc/systemd/network/99-unmanaged-devices.network
[Match]
Name=eth[1-9]*

[Link]
Unmanaged=yes
EOF
# systemctl restart systemd-networkd

I have implemented this as:

    cat <<EOF >/etc/systemd/network/99-unmanaged-devices.network
    [Match]
    Name=!ens5

    [Link]
    Unmanaged=yes
    EOF

A second observation: I had a single m7i-flex.xlarge which used enp39s0 as it's main network device. This broke all my ens*, ens+ and ens5 patterns. I probably have to change it to en or exclude the m7i-flex for now.

@taliesins
Copy link

taliesins commented Jun 6, 2024

Something like this?

eth+,ens+,enp+,pod-id-link+

pod-id-link+ is for access to things like aws's metadata service [fd00:ec2::23]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants