-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EOF Error from AWS api while validating cluster which was in running state #16548
Comments
@teocrispy91 Could you share why using kOps v1.15.2 which is 4-5 years old when creating new clusters? |
@hakman the cluster was created 4 years back and was running without much problems. |
The title says "while validating new cluster" 😄 |
@hakman sorry for the typo i have edited the same |
No worries, the suggestion still stands, you need to look on the master nodes for logs. |
@hakman since i am new to kops willa restart to master node cause any issues? Also are you talking about api server cert? |
I don't think that restarting the master node will do any damage, but probably it will not help much either. |
@hakman i just logged into my master node and while doing kubectl get ns or pods it's showing connection to server localhsot was refused port8080. when i do netstat i can see niether 443 or 8080 is opened in my master node will it be because of that. when running docker logs i could see my api-server pod restarting and going to exited state continously. This is some log i can see inside the api-server pod.i have checked the cert they are valid |
Most likely the etcd certs expired and API server cannot connect to it anymore. |
@hakman But etcd container seems to be running will it run if the cert has expired. When i run the below command i can see it's showing up to march28th 2024. But the image version seems to be kopeio/etcd-manager:3.0.20200429 which is higher than the one mentioned. find /mnt/ -type f -name me.crt -print -exec openssl x509 -enddate -noout -in {} ; |
Seems so, but you have to do rolling updates from time to time on the cluster. |
when i ran this command find /mnt/ -type f -name me.crt -print -exec openssl x509 -enddate -noout -in {} ; i could see that the certs has expired. This is the result i get find /mnt/ -type f -name me.crt -print -exec openssl x509 -enddate -noout -in {} ; So how can i renew this what would be the next steps. |
This may work:
|
@hakman this will recreate a new master node right doesn't upgrade the cluster? Also from where i need to run this in master i doubt whether kops command will work. |
You need to run it from your computer that has admin permissions on the AWS account that hosts the cluster, using the kOps v1.15.2 binary. It will destroy and re-create the master. |
@hakman so terminating controlplane ec2 instance and it will be created automatically right by the autoscaling |
yes |
@hakman Thanks a ton. After running the command you mentioned cluster seems to be up now. also in kube-system my aws-iam-authenticator pod is in imgpullback (do i need to update with latest image)also metrics pod is in crashloopback any idea why? |
We have a kops cluster with version 1.15.2 and everything was working fine until i did a helm upgrade deployment to one of my namespace in the cluster after that i can't run kubectl commands when i run it's showing "unable to connect to server EOF". also i have a dashboard hosted for kubernetes like example.com/dashboard that page is also showing 502 nginx error. when i checked the elb in aws its showing out of servivce but my master node is running. since we are not able to connect to cluster we couldn't identify the issue.
when i run kops validate cluster i am getting the below error.
unexpected error during validation: error listing nodes: Get https://MY_LOAD_BALANCER_DNS_NAME.us-west-2.elb.amazonaws.com/api/v1/nodes: EOF
with MY_LOAD_BALANCER_DNS_NAME replaced by the value under the "DNS name" field on the AWS console
also i can browse my applications hosted in the kops cluster not sure if apiserver is down or some issue with master.
It would be great help if someone could really help on this.
The text was updated successfully, but these errors were encountered: