az aks and kubectl issue: Unable to connect to the server: net/http: TLS handshake timeout #164

chetanku · 2018-02-04T19:47:54Z

I am trying to connect to my existing Kuberentes cluster from Windows 2016 Server.
1st issue:
On windows 2016 server, I run az aks get-credentials with resource group and cluster name, I see an issue with the case.
If I specify my cluster name as --name=testscluster and run it,
It somehow changes the case to testScluster.

2nd Issue:
Then after I run kubectl get nodes I get Unable to connect to the server: net/http: TLS handshake timeout . I manually changed the case and tried but still the same issue.

When I try to do the same from Ubuntu server it works completely fine.
Question: Does aks support windows ? or is it for only Linux?
I am on latest az version of 2.0.26.

Any help will be great!!

matkam · 2018-02-06T19:40:50Z

Running into the same issue on Mac OS. The kubectl command was working fine just yesterday, and now throws an error:

~ $ kubectl get nodes
Unable to connect to the server: net/http: TLS handshake timeout

I've tried upgrading the cluster, but that also throws an error and puts the cluster into a failed state:

~ $ az aks upgrade --resource-group my-resource-group --name my-aks-cluster --kubernetes-version 1.8.7
Kubernetes may be unavailable during cluster upgrades.
Are you sure you want to perform this operation? (y/n): y
Deployment failed. Correlation ID: some-id-here. Internal server error

chetanku · 2018-02-06T19:46:46Z

~ $ kubectl get nodes
Unable to connect to the server: net/http: TLS handshake timeout
Can you try deleting the context and getting it again? this helped me fix it.

matkam · 2018-02-06T19:53:53Z

Still the same:

~ $ rm /Users/matt/.kube/config
~ $ az aks get-credentials --resource-group xxx --name xxx
Merged "xxx" as current context in /Users/xxx/.kube/config
~ $ kubectl get nodes
Unable to connect to the server: net/http: TLS handshake timeout

chetanku · 2018-02-06T19:56:37Z

Is your cluster up and functional or it is in a failed state?
Can you also check the fqdn of the cluster in the config and match it with azure portal?

@jchauncey

matkam · 2018-02-06T19:58:33Z

It is in a failed state. But I can't tell if that's from the failed upgrade, or if it was failing earlier. The previously set up services/pods seem to work still.

jchauncey · 2018-02-06T20:00:10Z

@matkam what region is this in?

cc @mboersma

matkam · 2018-02-06T20:01:35Z

@jchauncey Central US

chetanku · 2018-02-06T20:02:26Z

can you try upgrading it again?

https://kubernetes.io/docs/tasks/administer-cluster/kubeadm-upgrade-1-8/#recovering-from-a-bad-state

If kubeadm upgrade somehow fails and fails to roll back, due to an unexpected shutdown during execution for instance, you may run kubeadm upgrade again as it is idempotent and should eventually make sure the actual state is the desired state you are declaring.

You can use kubeadm upgrade to change a running cluster with x.x.x --> x.x.x with --force, which can be used to recover from a bad state.

matkam · 2018-02-06T20:31:55Z

How does kubeadm work with AKS? I have been using az aks upgrade to perform upgrades, but there is no --force option there.

chetanku · 2018-02-06T22:19:46Z

Sorry, Can you try upgrading it again with az aks upgrade?

az aks upgrade --name myAKSCluster --resource-group myResourceGroup --kubernetes-version 1.8.7

matkam · 2018-02-06T22:20:34Z

I've tried the upgrade several times with the same result:

~ $ az aks upgrade --resource-group xxx --name xxx --kubernetes-version 1.8.7
Kubernetes may be unavailable during cluster upgrades.
Are you sure you want to perform this operation? (y/n): y
Deployment failed. Correlation ID: 068b7fae-6d54-41d7-8b2d-0ee308e22674. Internal server error

chetanku · 2018-02-06T22:23:04Z

Is your subscription good? do you have enough credits?

https://stackoverflow.com/questions/48443320/upgrade-failed-azure-aks-from-1-8-1-to-1-8-6

matkam · 2018-02-06T22:30:34Z

My subscription is good with plenty of credits. I was able to start another AKS cluster.

Output from your linked stackoverflow:

~ $ az aks show --name xxx --resource-group xxx --output table
Name            Location    ResourceGroup              KubernetesVersion    ProvisioningState    Fqdn
--------------  ----------  -------------------------  -------------------  -------------------  -------------------------------------------------------------------
xxx  centralus   xxx  1.8.7                Failed               xxx.hcp.centralus.azmk8s.io
~ $ az aks get-versions --name xxx --resource-group xxx --output table
Name     ResourceGroup              MasterVersion    MasterUpgrades    NodePoolVersion    NodePoolUpgrades
-------  -------------------------  ---------------  ----------------  -----------------  ------------------
default  xxx  1.8.7            1.8.7             1.8.7              1.8.7

chetanku · 2018-02-06T22:35:55Z

can you try this on powershell and see if there are any logs.
Get-AzureRmLog -CorrelationId "068b7fae-6d54-41d7-8b2d-0ee308e22674"

matkam · 2018-02-06T22:51:38Z

Unfortunately not:

PS Azure:\> Get-AzureRmLog -CorrelationId "068b7fae-6d54-41d7-8b2d-0ee308e22674"
WARNING: [Get-AzureRmLog] Parameter deprecation: The DetailedOutput parameter will be deprecated in a future breaking change release.
WARNING: [Get-AzureRmLog] Parameter name change: The parameter plural names for the parameters will be deprecated in a future breaking change release in favor of the singular versions of the same names.
WARNING: [Get-AzureRmLog] Output change: The field EventChannels from the EventData object is being deprecated in the release 5.0.0 - November 2017 - since it now returns a constant value (Admin,Operation)
Azure:\

matkam · 2018-02-07T00:25:58Z

I found this on one of the Kubernetes nodes, /var/log/syslog:

Feb  6 23:53:29 aks-nodepool1-12029634-1 docker[1781]: I0206 23:53:29.889440    1867 kubelet_node_status.go:83] Attempting to register node aks-nodepool1-12029634-1
Feb  6 23:53:31 aks-nodepool1-12029634-1 docker[1781]: E0206 23:53:31.329906    1867 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://evieaksclu-microservicereso-51c2f9-4f6fe944.hcp.centralus.azmk8s.io:443/api/v1/pods?fieldSelector=spec.nodeName%3Daks-nodepool1-12029634-1&resourceVersion=0: net/http: TLS handshake timeout
Feb  6 23:53:31 aks-nodepool1-12029634-1 docker[1781]: E0206 23:53:31.330167    1867 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:422: Failed to list *v1.Node: Get https://evieaksclu-microservicereso-51c2f9-4f6fe944.hcp.centralus.azmk8s.io:443/api/v1/nodes?fieldSelector=metadata.name%3Daks-nodepool1-12029634-1&resourceVersion=0: net/http: TLS handshake timeout
Feb  6 23:53:31 aks-nodepool1-12029634-1 docker[1781]: E0206 23:53:31.332125    1867 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:413: Failed to list *v1.Service: Get https://evieaksclu-microservicereso-51c2f9-4f6fe944.hcp.centralus.azmk8s.io:443/api/v1/services?resourceVersion=0: net/http: TLS handshake timeout
Feb  6 23:53:32 aks-nodepool1-12029634-1 docker[1781]: E0206 23:53:32.964445    1867 eviction_manager.go:238] eviction manager: unexpected err: failed to get node info: node 'aks-nodepool1-12029634-1' not found
Feb  6 23:53:32 aks-nodepool1-12029634-1 docker[1781]: E0206 23:53:32.985870    1867 kubelet.go:2095] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: Kubenet does not have netConfig. This is most likely due to lack of PodCIDR
Feb  6 23:53:37 aks-nodepool1-12029634-1 docker[1781]: E0206 23:53:37.986996    1867 kubelet.go:2095] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: Kubenet does not have netConfig. This is most likely due to lack of PodCIDR
Feb  6 23:53:39 aks-nodepool1-12029634-1 docker[1781]: E0206 23:53:39.894470    1867 kubelet_node_status.go:107] Unable to register node "aks-nodepool1-12029634-1" with API server: Post https://evieaksclu-microservicereso-51c2f9-4f6fe944.hcp.centralus.azmk8s.io:443/api/v1/nodes: net/http: TLS handshake timeout

chetanku · 2018-02-07T00:36:07Z

what version of az are you using? Can you update it to latest and try?

matkam · 2018-02-07T00:39:58Z

Version 2.0.26 installed via homebrew

chetanku · 2018-02-07T01:24:09Z

throwing in some thoughts:
what does az aks list give? are you able to telnet to fqdn:443 ?

@jchauncey @mboersma

matkam · 2018-02-07T01:32:08Z

~ $ az aks list
[
  {
    "agentPoolProfiles": [
      {
        "count": 3,
        "name": "nodepool1",
        "osType": "Linux",
        "storageProfile": "ManagedDisks",
        "vmSize": "Standard_D1_v2"
      }
    ],
    "dnsPrefix": "evieAksClu-microserviceReso-51c2f9",
    "fqdn": "evieaksclu-microservicereso-51c2f9-4f6fe944.hcp.centralus.azmk8s.io",
    "id": "/subscriptions/xxx/resourcegroups/microserviceResourceGroup/providers/Microsoft.ContainerService/managedClusters/evieAksCluster",
    "kubernetesVersion": "1.8.7",
    "linuxProfile": {
      "adminUsername": "xxx",
      "ssh": {
        "publicKeys": [
          {
            "keyData": "xxx"
          }
        ]
      }
    },
    "location": "centralus",
    "name": "evieAksCluster",
    "provisioningState": "Failed",
    "resourceGroup": "microserviceResourceGroup",
    "servicePrincipalProfile": {
      "clientId": "a4c334fc-c226-4e6d-a43d-050e60c6a23e"
    },
    "type": "Microsoft.ContainerService/ManagedClusters"
  }
]

~ $ curl -v https://evieaksclu-microservicereso-51c2f9-4f6fe944.hcp.centralus.azmk8s.io:443
* Rebuilt URL to: https://evieaksclu-microservicereso-51c2f9-4f6fe944.hcp.centralus.azmk8s.io:443/
*   Trying 52.173.89.237...
* TCP_NODELAY set
* Connected to evieaksclu-microservicereso-51c2f9-4f6fe944.hcp.centralus.azmk8s.io (52.173.89.237) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to evieaksclu-microservicereso-51c2f9-4f6fe944.hcp.centralus.azmk8s.io:443 
* stopped the pause stream!
* Closing connection 0
curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to evieaksclu-microservicereso-51c2f9-4f6fe944.hcp.centralus.azmk8s.io:443

danielcoman · 2018-02-07T17:09:14Z

Same issue... Created a cluster yesterday in West Europe. Can't connect.
Unable to connect to the server: net/http: TLS handshake timeout

matkam · 2018-02-07T17:35:51Z

Now I'm not able to create a new cluster at all in CentralUS. The deployment fails without creating the second resource group (or any AKS nodes).

kamoljan · 2018-02-10T02:17:41Z

Having the same issue in Central US :(
If I remember, I had the same issue in other US region as well. ;(

matkam · 2018-02-13T18:41:48Z

After a week of contacting Azure support, they could not tell what was wrong and recommended deleting and recreating the AKS cluster 👎

amanohar · 2018-02-13T19:42:50Z

@matkam @kamoljan regarding deployment failing on new creates while creating second RG. Can you share your resource group and resource name?

matkam · 2018-02-13T19:52:07Z

@amanohar upgraded and new deployments seem to not fail any more. However, upgraded and newly deployed clusters are not functioning.

Old upgraded cluster, still shows TLS handshake timeout:
RG: microserviceResourceGroup
Name: evieAksCluster

Newly created cluster info, shows new error Error: forwarding ports: error upgrading connection: error dialing backend: dial tcp 10.240.0.4:10250: getsockopt: connection timed out:
RG: galleon-group
Name: galleon-aks-cluster

joukojoutomies · 2018-02-13T20:11:25Z

I had that same issues among others. again, this is preview and MS is fixing bugs. Kubernetes looks to be designed, in the beginning, to have only one master node. To have HA master you have to do some extra work and this #164 may well be a bug in that process. Almost smells like there is active-standby master and the DNS (or whatever MS may use) does not follow cluster state and kubectl tries to connect to standby node and times out. Timeout tells me that there is TCP port listening, but API processing is sleeping, others we would get immediate TCP reset. I deployed kubernetes 1.9 IaaS style with 2n-1 masters. works well with azure, AWS and on-prem load balancers. LB is only component that does not come with kubernetes package and ha-proxy or nginx (or any cloud provided LB) will work to have HA master. to me, today, AKS is not production ready yet so we decided to give MS some time to stabilize their AKS offering and try again after GA. Jouni

…

________________________________ From: Mathew Kamkar <notifications@github.com> Sent: Tuesday, February 13, 2018 12:41 To: Azure/AKS Cc: Subscribed Subject: Re: [Azure/AKS] az aks and kubectl issue: Unable to connect to the server: net/http: TLS handshake timeout (#164) After a week of contacting Azure support, they could not tell what was wrong and recommended deleting and recreating the AKS cluster 👎 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure%2FAKS%2Fissues%2F164%23issuecomment-365362957&data=02%7C01%7C%7C73ee9637c6a842383f2708d57311728d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636541441122039543&sdata=gcKUsXgnXWbqDP%2BvQg9buFYDaak3Cad7A7%2FCBCpXw4E%3D&reserved=0>, or mute the thread<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAGQ_3DYdc1D0EUa2WWX7hn72HHqrSTLTks5tUddugaJpZM4R4sdl&data=02%7C01%7C%7C73ee9637c6a842383f2708d57311728d%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636541441122039543&sdata=XzEUyaRpG%2FgkLg6ylvh%2FXxErlUXye%2FaCKRTTRieFFsc%3D&reserved=0>.

amanohar · 2018-02-13T21:07:41Z

@matkam thanks for providing the details. A follow up question to failure on:

RG: galleon-group
Name: galleon-aks-cluster

Were you able to connect to the cluster after create and it stopped working after scale? Or was it not working after create as well?

matkam · 2018-02-13T21:51:30Z

@amanohar It was not working after a create

shrutir25 · 2018-02-14T07:19:21Z

@matkam - I have resolved the tls handshake timeout error on your old cluster with resource group microserviceResourceGroup. I am still looking into the new cluster.

danielcoman · 2018-02-14T09:44:43Z

My issue went away after about one day in West Europe. Is this still related to capacity?

matkam · 2018-02-14T17:45:33Z

Thanks for looking into it @shrutir25 but I'm still seeing the same error

~ $ az aks get-credentials --resource-group microserviceResourceGroup --name evieAksCluster
Merged "evieAksCluster" as current context in /Users/matt/.kube/config
~ $ kubectl get nodes
Unable to connect to the server: net/http: TLS handshake timeout

As for the new cluster, it looks like the provisioning scripts are either exiting early, or not provisioning everything they need to. I'm seeing missing routes on the routing table.

mjrousos · 2018-02-16T17:20:29Z

Same issue in East US for me, too. Worked a couple days ago, doesn't work now. Hosted services appear to still be up but kubectl commands (get nodes, proxy, etc.) are failing as described above.

In case it's useful:
RG: mjr-aks
Name: mjr-aks
Location: eastus

mjrousos · 2018-02-17T01:54:52Z

My East US AKS instance seems to be healthy again. I didn't do anything so I'm guessing it was just some transient issue with the master

derekperkins · 2018-02-19T22:16:26Z

@matkam Did you ever figure out this? Error: forwarding ports: error upgrading connection: error dialing backend: dial tcp 10.240.0.4:10250: getsockopt: connection timed out

I have a cluster that has been working for a month, and I can run kubectl commands, but I can't use kubectl proxy or use helm without seeing that error.

matkam · 2018-02-19T22:20:58Z

@derekperkins I have not figured it out. Azure support could only recommend starting a new AKS cluster (which I am now able to do).

derekperkins · 2018-02-19T22:50:52Z

@matkam thanks for the quick response. I tried doing that, but I'm currently running on 1.9.1, but I can only create a 1.8.x cluster, and my app requires 1.9. Hopefully they make that available soon. :(

Nirmalyasen · 2018-03-23T18:30:11Z

Is there a solution around:
Error: forwarding ports: error upgrading connection: error dialing backend: dial tcp 10.240.0.4:10250: getsockopt: connection timed out

I have started getting it since yesterday and getting it even in newly created clusters.

jchauncey · 2018-03-23T18:37:24Z

submit a support ticket so our on calls can take a look. This error can be caused by different problems so we need to investigate whats happening before providing a mitigation.

kkorsakov · 2018-04-10T06:46:29Z

The same thing today.
TB_RG_KUBE_PROD
containerservice-TB_RG_KUBE_PROD

dharmeshkakadia · 2018-04-11T00:51:50Z

I am seeing the same thing in East US. Our cluster has been running fine for sometime and suddenly seeing "nable to connect to the server: net/http: TLS handshake timeout" today.

Jagdeep1 · 2018-04-24T18:22:13Z

Unable to connect to the server: net/http: TLS handshake timeout for west europe

jchauncey · 2018-04-24T18:23:13Z

If you are seeing issues please submit a support ticket so our oncall engineers can take a look

hholle · 2018-04-24T18:49:45Z

We encouter exactly the same issue (Unable to connect to the server: net/http: TLS handshake timeout) / west europe). Our subscription does not allow us to open support tickets. Do we have any other option?

We already connected via SSH to a cluster node. The managed control/master node seems to be down/broken.

dazdaz · 2018-04-25T06:09:55Z

Re-deployed AKS which was in eastus and deployed in centralus and all works again.

hholle · 2018-04-26T07:09:37Z

Azure Support fixed the "Unable to connect to the server: net/http: TLS handshake timeout" error for our cluster :-)

snekcz · 2018-05-23T11:16:09Z

We have the same problem with Aks cluster in West Europe today. It worked in the morning, but currently it returns the "unable to connect message" for several hours.

chetanku · 2018-05-23T14:33:31Z

I am facing the same issue with my aks cluster in East US. It is intermittent.

necevil · 2018-06-06T18:07:04Z

@chetanku @snekcz @hholle @dazdaz @Jagdeep1 @dharmeshkakadia @Nirmalyasen @kkorsakov @matkam @mjrousos @danielcoman I am starting to collect info on this issue in a question over on StackOverflow:
https://stackoverflow.com/questions/50726534/why-cant-kubectl-connect-to-the-azure-aks-server-managing-my-cluster-net-http

If you guys could answer a couple of these (here or above on SOF)

Have you had the TLS issue impact your Clusters more than once?
How long did it take before your Cluster was back up and could be connected to (what was the outage duration)?
Did it 'heal' itself / resolve / come back to life on it's own or did you have to post a support ticket — and wait for MS to fix it?
If support, how long was that ticket posted for before the problem was solved on your Cluster?
Did you notice anything weird with your CPU / Network IO dropping but your disk usage jumping higher?
Would you be in favor of allowing AKS preview customers to create higher Severity Help Tickets (For this specific issue only) regardless of their support plan in order to achieve a more timely solution?

Feel free to respond here or (as mentioned) hit StackOverflow and I will collect the responses.

qiangli · 2018-06-12T18:03:26Z

Just for info here. for me, it is a combination of problems of corporate firewall, Azure service, and golang http library support for self signed certificate (kubectl depends on)

"-v=9" will print more debugging info. e.g.: kubectl -v=9 get nodes
in the output, there is a curl equivalent of the failing request, you can simply run the curl command and investigate.

seanknox · 2018-08-02T19:12:30Z

Closing due inactivity. Feel free to re-open if still an issue.

tbithell · 2018-08-21T18:51:25Z

I was able to get this to work after running az aks get-credentials --resource-group rsgnamehere --name clusternamehere - this was my error though... Unable to connect to the server: dial tcp [::1]:8080: connectex: No connection could be made because the target machine actively refused it

Really just putting this here in case someone else runs into it.

miketrebilcock mentioned this issue Feb 7, 2018

503 gateway timeouts #172

Closed

scotta01 mentioned this issue Feb 9, 2018

CronJob doesn't work #176

Closed

matkam mentioned this issue Feb 9, 2018

kubectl get pods: Unable to connect to the server: net/http: TLS handshake timeout #177

Closed

seanknox closed this as completed Aug 2, 2018

Azure locked as resolved and limited conversation to collaborators Aug 11, 2020

az aks and kubectl issue: Unable to connect to the server: net/http: TLS handshake timeout #164

az aks and kubectl issue: Unable to connect to the server: net/http: TLS handshake timeout #164

Comments

chetanku commented Feb 4, 2018

matkam commented Feb 6, 2018

chetanku commented Feb 6, 2018 • edited

matkam commented Feb 6, 2018

chetanku commented Feb 6, 2018 • edited

matkam commented Feb 6, 2018

jchauncey commented Feb 6, 2018

matkam commented Feb 6, 2018

chetanku commented Feb 6, 2018

matkam commented Feb 6, 2018

chetanku commented Feb 6, 2018

matkam commented Feb 6, 2018

chetanku commented Feb 6, 2018

matkam commented Feb 6, 2018 • edited

chetanku commented Feb 6, 2018

matkam commented Feb 6, 2018

matkam commented Feb 7, 2018

chetanku commented Feb 7, 2018 • edited

matkam commented Feb 7, 2018

chetanku commented Feb 7, 2018

matkam commented Feb 7, 2018

danielcoman commented Feb 7, 2018

matkam commented Feb 7, 2018

kamoljan commented Feb 10, 2018

matkam commented Feb 13, 2018

amanohar commented Feb 13, 2018

matkam commented Feb 13, 2018

joukojoutomies commented Feb 13, 2018 via email

amanohar commented Feb 13, 2018

matkam commented Feb 13, 2018

shrutir25 commented Feb 14, 2018

danielcoman commented Feb 14, 2018

matkam commented Feb 14, 2018

mjrousos commented Feb 16, 2018

mjrousos commented Feb 17, 2018

derekperkins commented Feb 19, 2018

matkam commented Feb 19, 2018

derekperkins commented Feb 19, 2018

Nirmalyasen commented Mar 23, 2018

jchauncey commented Mar 23, 2018

kkorsakov commented Apr 10, 2018 • edited

dharmeshkakadia commented Apr 11, 2018

Jagdeep1 commented Apr 24, 2018

jchauncey commented Apr 24, 2018

hholle commented Apr 24, 2018 • edited

dazdaz commented Apr 25, 2018

hholle commented Apr 26, 2018

snekcz commented May 23, 2018

chetanku commented May 23, 2018

necevil commented Jun 6, 2018

If you guys could answer a couple of these (here or above on SOF)

qiangli commented Jun 12, 2018

seanknox commented Aug 2, 2018

tbithell commented Aug 21, 2018

chetanku commented Feb 6, 2018 •

edited

chetanku commented Feb 6, 2018 •

edited

matkam commented Feb 6, 2018 •

edited

chetanku commented Feb 7, 2018 •

edited

kkorsakov commented Apr 10, 2018 •

edited

hholle commented Apr 24, 2018 •

edited