Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aws-vpc-cni] EKS-preinstalled VPC CNI switched to being deployed by Helm without a way to determine that the deployment is AWS-managed #2763

Closed
AndiDog opened this issue Jan 18, 2024 · 8 comments
Labels

Comments

@AndiDog
Copy link

AndiDog commented Jan 18, 2024

Describe the bug

I'm describing EKS behavior here. Didn't know where else to put this in order to bring it to the attention of AWS EKS engineers. Please move the issue if desired.

The AWS VPC CNI is preinstalled by default with a newly-created EKS cluster, and currently there's no way to make its installation optional in order to install only a custom CNI (e.g. Cilium, Calico, ...). Now it seems that EKS behavior of the CNI deployment has changed from kubectl apply style to being installed by Helm (incl. the "managed by" label). This is a problem for Kubernetes Cluster API Provider AWS (issue) because that operators wants to delete the AWS VPC CNI if a custom CNI is desired. It therefore has to determine whether the preinstalled AWS VPC CNI resources are AWS-managed, and if yes, deletes them. Since the preinstalled stuff is now Helm-managed, there's no way to tell if the chart was deployed by AWS automatically, or by a cluster admin.

One solution could be to add a certain label that states "this Helm chart is managed / pre-installed by AWS".

This breaking change in behavior even applies to older EKS Kubernetes versions such as 1.24, and therefore led to disruption of the described feature.

Steps to reproduce

  • Use Cluster API Provider AWS (CAPA) to create a new EKS cluster with default settings. It should also be the same without that operator, e.g. via the AWS Console, but I didn't test that.
  • Log into the new cluster and run kubectl get ds -n kube-system aws-node -o yaml. The "managed by" Helm label is there.

Environment

  • Chart name: aws-vpc-cni
  • Chart version: As pre-installed for a new EKS cluster
  • Kubernetes version: 1.24, likely also all others
  • Using EKS (yes/no), if so version? – yes
@AndiDog AndiDog added the bug label Jan 18, 2024
@jdn5126
Copy link
Contributor

jdn5126 commented Jan 24, 2024

Moving to AWS VPC CNI repo

@jdn5126 jdn5126 transferred this issue from aws/eks-charts Jan 24, 2024
@jdn5126
Copy link
Contributor

jdn5126 commented Jan 24, 2024

@AndiDog the default addon versions that are installed on cluster creation use Kubernetes server-side apply, i.e. something like:

helm template aws-vpc-cni --include-crds --namespace kube-system eks/aws-vpc-cni --set originalMatchLabels=true | kubectl apply --server-side --force-conflicts --field-manager eks -f -

It sounds like you are asking for the EKS API to provide the ability to create the cluster without any addons being installed, right?

@AndiDog
Copy link
Author

AndiDog commented Jan 25, 2024

Having the API option to not pre-install the VPC CNI would be the best, non-hacky solution indeed.

@TechnoTaff
Copy link

We found the only way to fully take control of the VPC-CNI deployment via helm is to delete all the resources (including the CRDs) before the Helm deployment. Otherwise some fields remain owned by the "eks" manager.

This is more of a problem where your helm deployment is a subset of what is deployed by default vpc-cni:

  1. Default vpc-cni is deployed with agent sidecar.
  2. We added the helm labels and annotations (as the docs suggested) for Helm ownership
  3. We "upgraded" the default vpc-cni with Helm to 1.16.0, but with "enabled: false" for the agent sidecar.
  4. The sidecar remains, due to those fields being owned by the "eks" manager.

Literally the only way to guarantee ownership is to delete everything before Helm touches the deployment.

I would also welcome the option to choose which addons (but not really an addon) get deployed by default.

@jdn5126
Copy link
Contributor

jdn5126 commented Jan 25, 2024

@AndiDog that support is tracked by aws/containers-roadmap#923. There is a ton of engagement and upvotes on this issue, so I assume the work is in progress, but I am not certain. It would help them prioritize if you upvote this issue as well.

@jdn5126
Copy link
Contributor

jdn5126 commented Jan 25, 2024

@TechnoTaff typically field management is not a problem, but I tried the following script and it seems to adopt all resources to helm and remove the network policy agent:

#!/usr/bin/env bash

set -euo pipefail

for kind in daemonSet clusterRole clusterRoleBinding serviceAccount; do
  echo "setting annotations and labels on $kind/aws-node"
  kubectl -n kube-system annotate --overwrite $kind aws-node meta.helm.sh/release-name=aws-vpc-cni
  kubectl -n kube-system annotate --overwrite $kind aws-node meta.helm.sh/release-namespace=kube-system
  kubectl -n kube-system label --overwrite $kind aws-node app.kubernetes.io/managed-by=Helm
done

kubectl -n kube-system annotate --overwrite configmap amazon-vpc-cni meta.helm.sh/release-name=aws-vpc-cni
kubectl -n kube-system annotate --overwrite configmap amazon-vpc-cni meta.helm.sh/release-namespace=kube-system
kubectl -n kube-system label --overwrite configmap amazon-vpc-cni app.kubernetes.io/managed-by=Helm

helm template aws-vpc-cni --include-crds --namespace kube-system eks/aws-vpc-cni --set originalMatchLabels=true,nodeAgent.enabled=false | kubectl apply --server-side --force-conflicts --field
-manager Helm -f -

Edit: hmm... I actually had to run the script twice to get it to properly remove the network policy agent. This seems like a strange strategy used by kubectl server-side apply

@jdn5126
Copy link
Contributor

jdn5126 commented Jan 30, 2024

Closing this as the true fix (outside of the script to adopt resource in Helm) will be tracked by aws/containers-roadmap#923

@jdn5126 jdn5126 closed this as completed Jan 30, 2024
Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants