Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

[kubernetes] k8s cluster starts working after nodes are restarted #123

Closed
kim0 opened this issue Nov 28, 2016 · 10 comments
Closed

[kubernetes] k8s cluster starts working after nodes are restarted #123

kim0 opened this issue Nov 28, 2016 · 10 comments

Comments

@kim0
Copy link
Contributor

kim0 commented Nov 28, 2016

I created a sp through az ad sp create-for-rbac --role contributor --scopes /subscriptions/xxx-yyy-zzz, then I deployed a k8s cluster through the portal UI. After the boxes were up, I ssh'ed into master node and:

Unable to connect to the server: dial tcp 40.68.165.173:443: i/o timeout

but az login was working fine with my sp account! Confused, I tried restarting the k8s api server using docker restart foo, and suddenly the k8s api server was responding. Albeit all nodes were not ready!

NAME                    STATUS                     AGE
k8s-agent-a21727d1-0    NotReady                   27s
k8s-agent-a21727d1-1    NotReady                   30s
k8s-agent-a21727d1-2    NotReady                   31s
k8s-master-a21727d1-0   Ready,SchedulingDisabled   27s

I rebooted agent-1 from the web portal UI .. a minute later

NAME                    STATUS                     AGE
k8s-agent-a21727d1-0    NotReady                   6m
k8s-agent-a21727d1-1    Ready                      6m
k8s-agent-a21727d1-2    NotReady                   7m
k8s-master-a21727d1-0   Ready,SchedulingDisabled   6m

I didn't yet reboot the rest of nodes in case anyone wants to take a look. If I were to guess, It seems k8s cluster was up before AAD had fully replicated the sp account? and surprisingly, k8s does not auto-retry, but somehow gets stuck!

@colemickens
Copy link
Contributor

How much time elapsed between the SP creation and the reboot?

Also, I know it's a weird question, but what timezone did you execute the SP creation command in?

@kim0
Copy link
Contributor Author

kim0 commented Dec 1, 2016

@colemickens .. Let me answer the second question first. I executed sp creation commands, on my laptop in UTC+2 TZ!

For the first question, it's not entirely clear to me. I created the SP using the new python az cli. Then, I submitted the k8s deployment through the azure portal. I guess it took like 5 mins to deploy. After that, I ssh'ed into master. It and all agents were not working kubectl get nodes not returning. Restart api server docker container, made it responsive. Rebooting agent-1 made it ready! I still did not reboot agents-{2&3}, and they are still in the "Not-ready" state, if you want to ssh to them, I'm Ok. Otherwise, if I'm free to reboot them, let me know as well. Thanks!

@kim0
Copy link
Contributor Author

kim0 commented Dec 9, 2016

Rebooted remaining nodes, and they came up in ready state

@olostan
Copy link

olostan commented Dec 27, 2016

@kim0 How you get api server restarted? I've ssh-ed to master, tried to do same as you, but:

root@k8s-master-F02F8C45-0:~# docker ps
CONTAINER ID        IMAGE                                             COMMAND                  CREATED             STATUS              PORTS               NAMES
9fb65bdb36c6        gcr.io/google_containers/hyperkube-amd64:v1.4.6   "/hyperkube kubelet -"   19 minutes ago      Up 19 minutes                           angry_thompson
root@k8s-master-F02F8C45-0:~# kubectl get pods
^C
root@k8s-master-F02F8C45-0:~# docker restart 9fb65bdb36c6
9fb65bdb36c6
root@k8s-master-F02F8C45-0:~# docker ps
CONTAINER ID        IMAGE                                             COMMAND                  CREATED             STATUS              PORTS               NAMES
9fb65bdb36c6        gcr.io/google_containers/hyperkube-amd64:v1.4.6   "/hyperkube kubelet -"   20 minutes ago      Up 5 seconds                            angry_thompson
root@k8s-master-F02F8C45-0:~# kubectl get pods
Unable to connect to the server: dial tcp 13.95.157.85:443: i/o timeout

@colemickens
Copy link
Contributor

@olostan Your SP is very likely misconfigured. Please check the troubleshooting steps here: https://github.com/Azure/acs-engine/blob/master/docs/kubernetes.md#troubleshooting

@olostan
Copy link

olostan commented Dec 27, 2016

@colemickens wow... thnx. That helps - seems I really have

Unable to construct api.Node object for kubelet:
 failed to get external ID from cloud provider: compute.VirtualMachinesClient#Get:
 Failure responding to request: StatusCode=403
 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" 
Message="The client '<guid>' with object id '<guid>' does not have authorization to
 perform action 'Microsoft.Compute/virtualMachines/read' over scope '/subscriptions/<guid>/resourceGroups/edicircle/providers/Microsoft.Compute/virtualMachines/k8s-master-f02f8c45-0'."

Sorry if it is not correct place (however could be useful for others if they have same problem, but is there any clue how to add those permissions?

Actually I have created cluster exactly one-to-one as on this video: https://www.youtube.com/watch?v=nhY9XdzNbbY with az acs create ... that should should create SP as I understand....

@colemickens
Copy link
Contributor

@olostan If you used az acs create, you are correct, you shouldn't be experiencing this issue.

Can you please detail the exact command you ran (possibly looking through your shell history) and also paste the full output from az --version? Thanks!

(We might end up moving this over to https://github.com/Azure/azure-cli ...)

@olostan
Copy link

olostan commented Dec 27, 2016

Created Azure/azure-cli#1620
@colemickens if you have any ideas how I can manually add authorisation rights please share! thnx

@anhowe
Copy link
Contributor

anhowe commented Jan 23, 2017

auth rights to resource groups should be able to be added through the Azure Portal.

@anhowe
Copy link
Contributor

anhowe commented Jan 23, 2017

Please re-open if you encounter this issue again, since the latest az should fix this.

@anhowe anhowe closed this as completed Jan 23, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants