-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kops v1.26 upgrade fails due to serviceAccountIssuer changes #16488
Comments
@elliotdobson You definitely need to specify |
Thanks for your response @hakman. We decided to move forward with the change to Based on this comment we were able to come up with the following procedure that allowed us to migrate the Service Account Issuer (SAI) non-disruptively on two clusters:
hooks:
- name: modify-kube-api-manifest
before:
- kubelet.service
manifest: |
User=root
Type=oneshot
ExecStart=/bin/bash -c "until [ -f /etc/kubernetes/manifests/kube-apiserver.manifest ];do sleep 5;done;sed -i '/- --service-account-issuer=https:\/\/api.internal.cluster.domain/i\ \ \ \ - --service-account-issuer=https:\/\/master.cluster.domain' /etc/kubernetes/manifests/kube-apiserver.manifest"
hooks:
- name: modify-kube-api-manifest
before:
- kubelet.service
manifest: |
User=root
Type=oneshot
ExecStart=/bin/bash -c "until [ -f /etc/kubernetes/manifests/kube-apiserver.manifest ];do sleep 5;done;sed -i '/- --service-account-issuer=https:\/\/api.internal.cluster.domain/a\ \ \ \ - --service-account-issuer=https:\/\/master.cluster.domain' /etc/kubernetes/manifests/kube-apiserver.manifest"
Care needs to be taken around the I think the same procedure could be used to enable IRSA non-disruptively (which we will try in the future). Do you think it is worth adding this to the kops documentation and/or updating kops cluster spec to allow multiple |
Thank you for the detailed guide @elliotdobson. This would be useful to be added to the kOps docs. |
Where would a procedure like that live in the kOps docs? Under Operations perhaps? Yes #16497 is the addition to the cluster spec that would've been required for this migration. Will that feature will be back-ported to previous kOps releases? |
Yes, under Operations.
Seems pretty simple to back-port. Probably it will be in kOps 1.28+. |
Closing this issue as we've worked around the issue and documented the workaround. |
/kind bug
1. What
kops
version are you running? The commandkops version
, will displaythis information.
Client version: 1.26.6 (git-v1.26.6)
or
Client version: 1.25.4 (git-v1.25.4)
2. What Kubernetes version are you running?
kubectl version
will print theversion if a cluster is running or provide the Kubernetes version specified as
a
kops
flag.Server Version: v1.25.16
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
Upgrading from kops v1.25.4 to v1.26.6:
masterInternalName
in the kops cluster spec.kops update cluster
.kops rolling-update cluster
.5. What happened after the commands executed?
The cluster fails to validate since
calico-node
fails to start on the new control-plane instance.The
install-cni
init container in thecalico-node
pod fails with an error message:Other pods on the new control-plane node that depend on the k8s-api also have similar unauthorized logs.
6. What did you expect to happen?
kops to be upgraded from v1.25.4 to v1.26.6 successfully.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest.You may want to remove your cluster name and other sensitive information.
The cluster spec update diff from
kops update cluster
shows themasterInternalName
being removed and changes to theserviceAccountIssuer
&serviceAccountJWKSURI
fields:8. Please run the commands with most verbose logging by adding the
-v 10
flag.Paste the logs into this report, or in a gist and provide the gist link here.
9. Anything else do we need to know?
It seems that in kops v1.26 the ability to specify
masterInternalName
was removed (#14507) which is why we are noticing this problem during upgrade.However I can reproduce the same issue as above by simply removing
masterInternalName
from our cluster spec, apply it to the cluster and roll the first control-plane node (all using kops v1.25.4). This update also produces the same update diff as posted above.To move forward it seems like changing the
serviceAccountIssuer
&serviceAccountJWKSURI
fields is inevitable but it is not clear what steps to take in order to roll this change out. Is anyone able to advise?The text was updated successfully, but these errors were encountered: