New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Busy azure-disk regularly fail to mount causing K8S Pod deployments to halt. #12

Open
dbalaouras opened this Issue Mar 14, 2017 · 91 comments

Comments

Projects
None yet
@dbalaouras

dbalaouras commented Mar 14, 2017

I've setup Azure Container Service with Kubernetes and I use dynamic provisioning of volumes (see details below) when deploying new Pods. Quite frequently (about 10%) I get the following error which halts the deployment:

14h 1m 439 {controller-manager } Warning FailedMount Failed to attach volume "pvc-95aa8dbf-082e-11e7-af1a-000d3a2735d9" on node "k8s-agent-1da8a8df-2" with: Attach volume "clst-west-eu-dev-dynamic-pvc-95aa8dbf-082e-11e7-af1a-000d3a2735d9.vhd" to instance "k8s-agent-1DA8A8DF-2" failed with compute.VirtualMachinesClient#CreateOrUpdate: Failure responding to request: StatusCode=409 -- Original Error: autorest/azure: Service returned an error. Status=409 Code="AttachDiskWhileBeingDetached" Message="Cannot attach data disk 'clst-west-eu-dev-dynamic-pvc-f843f8fa-0663-11e7-af1a-000d3a2735d9.vhd' to VM 'k8s-agent-1DA8A8DF-2' because the disk is currently being detached. Please wait until the disk is completely detached and then try again."

The Pod deployment then halts forever, or until I delete the Pod and let the ReplicationController create a new one.

Any idea what is causing this?

Workflow

I have created the following StorageClass:

Name:		azure-disk
IsDefaultClass:	No
Annotations:	<none>
Provisioner:	kubernetes.io/azure-disk
Parameters:	location=westeu,skuName=Standard_LRS,storageAccount=<<storageaccount>>

The storageaccount does contain a Blob service named vhds.

When deploying a new Pod, I create a PVC that looks like this:

{
  "apiVersion": "v1",
  "kind": "PersistentVolumeClaim",
  "Provisioner": "kubernetes.io/azure-disk",
  "metadata": {
    "name": "test-deployment-pvc",
    "annotations": {
      "volume.beta.kubernetes.io/storage-class": "azure-disk"
    },
    "labels": {
      "org": "somelabel"
    }
  },
  "spec": {
    "accessModes": [
      "ReadWriteOnce"
    ],
    "resources": {
      "requests": {
        "storage": "1Gi"
      }
    }
  }
}

and finally use the PVC in the pods:

{
  "volumes": [
    {
      "persistentVolumeClaim": {
        "claimName": "test-deployment-pvc"
      },
      "name": "storage"
    }
  ]
}

@dbalaouras dbalaouras changed the title from Busy azure-disk regularly fail to mount causing Pod deployments to halt. to Busy azure-disk regularly fail to mount causing K8S Pod deployments to halt. Mar 14, 2017

@JackQuincy

This comment has been minimized.

Member

JackQuincy commented Mar 22, 2017

Sorry for the slow response time. There is a known performance issue here with the azure linux drivers. the kubernetes repo might have more expertise on this. But I do know that Kubernetes 1.6 has a speed up for attaching the disks. A work around we have asked people to use in the short term is to preformat the disks to ext4. Now how to do that in the PVC I'm not the most familiar with. @colemickens did I get this right?

@colemickens

This comment has been minimized.

Contributor

colemickens commented Mar 22, 2017

It's not the ext4 format being slow (which is addressed in 1.5.3+) since the PV was provisioned and in use, but rather wasn't able to detached and reattached when rescheduled.

@dbalaouras, I assume this is continuing to happen, even after 4-5 minutes? If not, as @JackQuincy indicated, the attach code has seen improvements in recent releases (1.5.3+, 1.6), and a full rewrite is coming in the future: kubernetes/kubernetes#41950

This should probably be filed as an upstream Issue on Kubernetes and diagnosed further there. @dbalaouras can you do that and link it here? It would be helpful if you can repro and include these things:

  • The instanceView of the VM where the Pod was originally scheduled (where the detach is occurring) (hostA)
  • The instanceView of the VM where the Pod has been rescheduled (where the attach is occurring) (hostB)
  • Confirm that just deleting the Pod is sufficient to repair. In this case, does it work because the Pod lands back on (hostA) when rescheduled?

The instanceViews can be collected from https://resources.azure.com. Navigate through the hierarchy to the VM and then go to the child node "instanceView". You can also get it from the CLI: az vm get-instance-view --resource-group $RG --name $VM

@dbalaouras

This comment has been minimized.

dbalaouras commented Mar 27, 2017

Hi @colemickens and @JackQuincy - thanks for the feedback.

Indeed, this error happens every 4-5 minutes; i.e. I see the following log:

Failed to attach volume "pvc-c209421d-12dd-11e7-af1a-000d3a2735d9" on node "k8s-agent-1da8a8df-2" with: Attach volume "clst-west-eu-dev-dynamic-pvc-c209421d-12dd-11e7-af1a-000d3a2735d9.vhd" to instance "k8s-agent-1DA8A8DF-2" failed with compute.VirtualMachinesClient#CreateOrUpdate: Failure responding to request: StatusCode=409 -- Original Error: autorest/azure: Service returned an error. Status=409 Code="AttachDiskWhileBeingDetached" Message="Cannot attach data disk 'clst-west-eu-dev-dynamic-pvc-f843f8fa-0663-11e7-af1a-000d3a2735d9.vhd' to VM 'k8s-agent-1DA8A8DF-2' because the disk is currently being detached. Please wait until the disk is completely detached and then try again."

I can also verify that the Pod is deployed successfully on a different node, right after I delete it and let the ReplicationController reschedule it.

With respect to the "VM where the Pod was originally scheduled", I don't see the Pod being re-scheduled when this error occurs. Here's the event log of a problematic Pod, 20 seconds after I launched it:

Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath	Type		Reason		Message
  ---------	--------	-----	----			-------------	--------	------		-------
  20s		20s		1	{default-scheduler }			Normal		Scheduled	Successfully assigned bytecode-ghost-29-rc-tttss to k8s-agent-1da8a8df-2
  20s		3s		6	{controller-manager }			Warning		FailedMount	Failed to attach volume "pvc-2e1a5f47-131e-11e7-af1a-000d3a2735d9" on node "k8s-agent-1da8a8df-2" with: Attach volume "clst-west-eu-dev-dynamic-pvc-2e1a5f47-131e-11e7-af1a-000d3a2735d9.vhd" to instance "k8s-agent-1DA8A8DF-2" failed with compute.VirtualMachinesClient#CreateOrUpdate: Failure responding to request: StatusCode=409 -- Original Error: autorest/azure: Service returned an error. Status=409 Code="AttachDiskWhileBeingDetached" Message="Cannot attach data disk 'clst-west-eu-dev-dynamic-pvc-f843f8fa-0663-11e7-af1a-000d3a2735d9.vhd' to VM 'k8s-agent-1DA8A8DF-2' because the disk is currently being detached. Please wait until the disk is completely detached and then try again."

And the full description 10 minutes later:

Name:		bytecode-ghost-29-rc-tttss
Namespace:	default
Node:		k8s-agent-1da8a8df-2/10.240.0.10
Start Time:	Mon, 27 Mar 2017 21:50:03 +0300
Labels:		app_name=ghost
		deployment_id=bytecode-ghost-29
		org=bytecode
Status:		Pending
IP:
Controllers:	ReplicationController/bytecode-ghost-29-rc
Containers:
  bytecode-ghost-29:
    Container ID:
    Image:		ghost
    Image ID:
    Port:		2368/TCP
    Limits:
      cpu:	1
      memory:	1Gi
    Requests:
      cpu:		600m
      memory:		768Mi
    State:		Waiting
      Reason:		ContainerCreating
    Ready:		False
    Restart Count:	0
    Liveness:		http-get http://:2368/ delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:		http-get http://:2368/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Volume Mounts:
      /var/lib/ghost from ghost-files (rw)
      /var/lib/ghost/config.js from ghost-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-8dmpk (ro)
    Environment Variables:	<none>
Conditions:
  Type		Status
  Initialized 	True
  Ready 	False
  PodScheduled 	True
Volumes:
  ghost-files:
    Type:	PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:	bytecode-ghost-29-pvc
    ReadOnly:	false
  ghost-config:
    Type:	ConfigMap (a volume populated by a ConfigMap)
    Name:	bytecode-ghost-29-cm
  default-token-8dmpk:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-8dmpk
QoS Class:	Burstable
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From				SubObjectPath	Type		Reason		Message
  ---------	--------	-----	----				-------------	--------	------		-------
  10m		10m		1	{default-scheduler }				Normal		Scheduled	Successfully assigned bytecode-ghost-29-rc-tttss to k8s-agent-1da8a8df-2
  8m		1m		4	{kubelet k8s-agent-1da8a8df-2}			Warning		FailedMount	Unable to mount volumes for pod "bytecode-ghost-29-rc-tttss_default(2ff0e683-131e-11e7-af1a-000d3a2735d9)": timeout expired waiting for volumes to attach/mount for pod "bytecode-ghost-29-rc-tttss"/"default". list of unattached/unmounted volumes=[ghost-files]
  8m		1m		4	{kubelet k8s-agent-1da8a8df-2}			Warning		FailedSync	Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "bytecode-ghost-29-rc-tttss"/"default". list of unattached/unmounted volumes=[ghost-files]
  10m		3s		13	{controller-manager }				Warning		FailedMount	Failed to attach volume "pvc-2e1a5f47-131e-11e7-af1a-000d3a2735d9" on node "k8s-agent-1da8a8df-2" with: Attach volume "clst-west-eu-dev-dynamic-pvc-2e1a5f47-131e-11e7-af1a-000d3a2735d9.vhd" to instance "k8s-agent-1DA8A8DF-2" failed with compute.VirtualMachinesClient#CreateOrUpdate: Failure responding to request: StatusCode=409 -- Original Error: autorest/azure: Service returned an error. Status=409 Code="AttachDiskWhileBeingDetached" Message="Cannot attach data disk 'clst-west-eu-dev-dynamic-pvc-f843f8fa-0663-11e7-af1a-000d3a2735d9.vhd' to VM 'k8s-agent-1DA8A8DF-2' because the disk is currently being detached. Please wait until the disk is completely detached and then try again."

BTW, I just noticed that all failing Pods I see in my cluster having the same error, try to use the same volume: clst-west-eu-dev-dynamic-pvc-2e1a5f47-131e-11e7-af1a-000d3a2735d9.vhd.

Then I inspected the VM in the Azure Portal and saw this:
screenshot 2017-03-27 22 37 24

screenshot 2017-03-27 22 35 07

I'm also attaching an Instance View of the VM where the failing Pods were scheduled (with some data removed): k8s-agent-1DA8A8DF-2-instance-view.json.zip

At this point, I'm not quite sure if this is a k8s issue; not sure what to report to Kubernetes. Looks more like an issue w/ the AzureDisk Volume Plugin I guess?

@theobolo

This comment has been minimized.

theobolo commented Mar 28, 2017

Exactly the same problem here !

@colemickens

This comment has been minimized.

Contributor

colemickens commented Mar 28, 2017

What version of Kubernetes are your clusters running?

@colemickens

This comment has been minimized.

Contributor

colemickens commented Mar 28, 2017

I need the instanceView from the machine where the disk was previously attached, otherwise I have no idea what's going on - the error message indicates that the detach never finished from the other node - I need the instanceView to know if that's true.

Also, please let me know the Kubernetes cluster version.

I'm still confused about timeline of events. Are the error messages about "DiskBlobNotFound" occurring at the exact same time as the "failed to attaches"? Is the disk actually detached from wherever it was scheduled? Please review my post above and provide all of the requested information. I can not troubleshoot this any further without it, especially since I can't repro.

This also should be filed as an Issue against Kubernetes repo.


Wait... are you attempting to run multiple pods with the same disk right now???

@dbalaouras

This comment has been minimized.

dbalaouras commented Mar 28, 2017

@colemickens no, I am not attempting to run multiple pods with the same disk. I know this is not possible, plus I need a separate disk mounted to each single Pod and I am using dynamic provisioning.

For each deployment, I create a brand new (with a different name each time) PersistentVolumeClaim using a StorageClass with Provisioner kubernetes.io/azure-disk. See my first post for more details.

After a few deployments, I start seeing this error...I can't be sure in which pod (if any) the disk was originally attached to. During my initial (and quick) investigation, I could not find the disk mentioned in the error msg, mounted to any other node.

With respect to the DiskBlobNotFound error, I noticed that way after I saw the errors in the Pods. So again, can't be sure if that happened before or after the AttachDiskWhileBeingDetached error.

So let me do this maybe: I will try to reproduce in a fresh cluster and try to keep records of the disks that get created and the nodes they get mounted to. In the meanwhile, if you need me to collect any additional info or if you have any hints on what could cause this issue, please let me know.

@colemickens

This comment has been minimized.

Contributor

colemickens commented Mar 28, 2017

"After a few deployments" means what? You spun up a bunch more PVC and Pods and then ran into it? Or you did a rolling update, and the Pod got rescheduled?

If you're in Slack, please ping me, might be able to dig in a bit faster.

@dbalaouras

This comment has been minimized.

dbalaouras commented Apr 2, 2017

@colemickens thx much for the help so far. Posting some updates here to keep this thread synced:

Each "deployment" in my use case includes:

  • 1 ConfigMap
  • 1 PVC
  • 1 RC with a single POD which uses the PVC
  • 1 service
  • 1 Ingress

PVs usually get deleted after I uninstall the deployment (i.e. delete all above resources ^^). I did notice the following tho:

  • in the Azure portal, several disks (VHDs) with the same URL were shown as mounted to a single VM. That VM was of the 2 Nodes which failed to host any Pod.
  • I looked into that node and found no VHD mounted.
  • after "umounting" the disks from the Portal, almost everything got back to normal.

Finally: I am trying to reproduce the issue in a new cluster that runs k8s 1.5.3 but everything seems very stable so far! I will post here any updates.

@dbalaouras dbalaouras closed this Apr 2, 2017

@dbalaouras dbalaouras reopened this Apr 2, 2017

@colemickens

This comment has been minimized.

Contributor

colemickens commented Apr 4, 2017

Hi @dbalaouras Does this still need to be kept open? If you've not repro'd please close, otherwise give an update. Otherwise I'll go ahead and close this out again. Thanks!

@dbalaouras

This comment has been minimized.

dbalaouras commented Apr 4, 2017

@colemickens sure, I'll close it now and re-open it only if it appears again in 1.5.3. Thanks!

@dbalaouras dbalaouras closed this Apr 4, 2017

@zecke

This comment has been minimized.

zecke commented Apr 18, 2017

I have this issue right now. The blob is not mounted on any agent but it fails to mount for 8h hours straight.

Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.3", GitCommit:"029c3a408176b55c30846f0faedf56aae5992e9b", GitTreeState:"clean", BuildDate:"2017-02-15T06:34:56Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Volumes:
  azure:
    Type:		AzureDisk (an Azure Data Disk mount on the host and bind mount to the pod)
    DiskName:		DISK-2.vhd
    DiskURI:		https://ACCOUNT.blob.core.windows.net/galera/DISK-2.vhd
    FSType:		ext4
    CachingMode:	ReadWrite
    ReadOnly:		false
  default-token-g95bh:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-g95bh
QoS Class:	Burstable
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From				SubObjectPath	Type		Reason		Message
  ---------	--------	-----	----				-------------	--------	------		-------
  46m		46m		1	{default-scheduler }				Normal		Scheduled	Successfully assigned galera-node3-38x9r to k8s-agent-6dabf06d-0
  45m		1m		23	{controller-manager }				Warning		FailedMount	Failed to attach volume "azure" on node "k8s-agent-6dabf06d-0" with: Attach volume "DISK-2.vhd" to instance "k8s-agent-6DABF06D-0" failed with compute.VirtualMachinesClient#CreateOrUpdate: Failure responding to request: StatusCode=409 -- Original Error: autorest/azure: Service returned an error. Status=409 Code="AttachDiskWhileBeingDetached" Message="Cannot attach data disk 'DISK-2.vhd' to VM 'k8s-agent-6DABF06D-0' because the disk is currently being detached. Please wait until the disk is completely detached and then try again."
  44m		33s		21	{kubelet k8s-agent-6dabf06d-0}			Warning		FailedMount	Unable to mount volumes for pod "galera-node3-38x9r_default(0dff91b7-23f3-11e7-b279-000d3a28cac3)": timeout expired waiting for volumes to attach/mount for pod "default"/"galera-node3-38x9r". list of unattached/unmounted volumes=[azure]
  44m		33s		21	{kubelet k8s-agent-6dabf06d-0}			Warning		FailedSync	Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"galera-node3-38x9r". list of unattached/unmounted volumes=[azure]
@colemickens

This comment has been minimized.

Contributor

colemickens commented Apr 18, 2017

@zecke Would you be able to use https://resources.azure.com to look at the VMs in your cluster. Particularly to look at the instanceView tab for each VM and see if the disk is still attached to one of those VMs?

I'm wondering if a detach operation failed, leaving the disk leased out, but without Kubernetes having any way of knowing that it needs to try to re-detach from the VM? If you do find such a VM, obviously please let us know, but you should be able to mitigate by doing any sort of modification to the VM. Something as simple as adding/removing a tag in the Portal can trigger it to complete the detachment operation.

If this is the case, we can try to get some details from you privately to understand why the detach failed initially. Thank you very much for any information you can provide.

@zecke

This comment has been minimized.

zecke commented Apr 18, 2017

Feel free to ping me out of band, a support request has been filed too. It is a galera cluster with three disks, three k8 agents and three RCs/SVCs. Two disks are attached from the resources view point of view while nothing is mounted on the agents. The specific disk above is not mounted and can't be mounted.

So apparently the storage with k8/Linux is not reliable. What other options do I have?

xyz@k8s-agent-6DABF06D-0:~$ mount | grep sdc
xyz@k8s-agent-6DABF06D-0:~$ 
xyz@k8s-agent-6DABF06D-1:~$ mount | grep sdc
xyz@k8s-agent-6DABF06D-1:~$ 
xyz@k8s-agent-6DABF06D-2:~$ mount | grep sdc
xyz@k8s-agent-6DABF06D-2:~$ 
@zecke

This comment has been minimized.

zecke commented Apr 18, 2017

Just for the record due out of band communication: The upstream developer(s) have no interest in fixing it. To me it seems if you want kubernetes with persistent storage, then right now don't use Azure. I will not comment on this issue anymore.

@rossedman

This comment has been minimized.

rossedman commented Apr 18, 2017

@colemickens @zecke I am having this problem as well. I am using the default StorageClass. I can access the blob store and see that the vhd is created. I can also see the drive is mounted on one of the host nodes under the instanceView mentioned above.

However, when I deploy my pod I get this error:

Failed to attach volume "pvc-67a61ade-2476-11e7-be9d-000d3a18eadc" on node 
"k8s-agent-8492d20d-1" with: Attach volume "<vhd name>.vhd" 
to instance "k8s-agent-8492D20D-1" failed with compute.VirtualMachinesClient#CreateOrUpdate: 
Failure sending request: StatusCode=200 -- Original Error: Long running operation terminated with 
status 'Failed': Code="AcquireDiskLeaseFailed" Message="Failed to acquire lease while creating 
disk '<vhd name>.vhd' using blob with URI https://<storage account>.blob.core.windows.net/vhds/<vhd name>.vhd. 
Blob is already in use."

The reason for this is a FailedMount and then followed by a FailedSync. If I try to redeploy this pod it fails because the first deployment is still holding a lease to it or something.

This blob isn't in use. I deleted the PVC, the PV, the Deployment and then the Page Blob from the storage account as well.

Any advice on what to do?

Here are versions I am using as well

Client Version: v1.6.1
Server Version: v1.5.3
@colemickens

This comment has been minimized.

Contributor

colemickens commented Apr 18, 2017

I can assure everyone we're very interested in this being a smooth experience. I apologize for any miscommunication, I simply wanted to ensure the discussion was here on GitHub instead of in personal email. I will personally try to assist with driving this issue.

@zecke Unfortunately, without the information I've requested above, I will be unable to make any further guesses or recommendations without access to your account. If you have opened an Issue with Azure Support, it will eventually make its way to our team where we can potentially look into your subscription on your behalf. If you'd like me to try to assist sooner, please try to collect the requested information from https://resources.azure.com.

@zecke To give a bit more detail on my suspicions... I believe that the disk is being unmounted but not detached. This would explain the behavior you're seeing - the disk is not mounted, but Azure complains that a lease is taken, or that a detach is currently in progress. Unforunately, as I mentioned, I can only confirm this if you collect information from the instanceView. Please let me know if you'd like to look into that, or if you'd like to just have Azure Support drive this discussion going forward.

(Getting your server version would be helpful too. Newer versions have a number of stability improvements.)

@colemickens

This comment has been minimized.

Contributor

colemickens commented Apr 18, 2017

@rossedman Good information, let me ask some followup questions:

  1. What happened after you deleted the PV + page blob? I guess it sounds like you deleted it all at once.

  2. You said you looked at instanceView and saw the disk attached... Was that on the correct VM where the Pod was trying to be launched (k8s-agent-8492d20d-1)? And did it show up in the regular VM GET as well?

  3. How easily/reliably can you repro? Are you willing to work with someone once you have repro'd? (By interactively debugging or granting someone access to the target resource group)

This one is more surprising, I've never had someone have the PV provisioned and then immediately say it's in use.

For everyone's knowledge, there is a large re-write coming for Kubernetes 1.7 that should improve the handling of a variety of edge-cases. That having been said, if we can identify issues in 1.6, we can try to get those fixes as well for future 1.6.x releases.

@colemickens

This comment has been minimized.

Contributor

colemickens commented Apr 18, 2017

And re-opening this since people are clearly having some issues with 1.5.3.

@colemickens colemickens reopened this Apr 18, 2017

@rossedman

This comment has been minimized.

rossedman commented Apr 19, 2017

@colemickens will get some more information today. actually, we are migrating our k8s cluster out of dev/test and into a prod setup so the dev/test cluster we could give access to and try to do some troubleshooting. let me try to do this process again and I will provide answers to the questions above.

My suspicions are that the VHD mount is on the wrong node. The pod may be rescheduling to other nodes due to memory constraints or something else and its staying attached to the original host node. Can't confirm yet but will have more soon.

@colemickens

This comment has been minimized.

Contributor

colemickens commented Apr 19, 2017

@rossedman That is also my suspicion. There is code specifically built to retry the attachment operation, but I fear the same may not exist for detach. (If the detach fails, but isn't specifically handled, when K8s calls to retry the detach, I believe Azure will return immediate success...)

This is why I'm so interested in seeing the instanceView (in @zecke's case as well)...

Thanks!

@rossedman

This comment has been minimized.

rossedman commented Apr 20, 2017

@colemickens So here are a couple findings that I have to report:

Files I'm Working With

This is just a PersistentVolumeClaim a single postgres container. Nothing special. I have 3 agents nodes and 1 master. In the instance below the PV mounts to the third host agent.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: sonarqube-vc
  annotations:
    volume.beta.kubernetes.io/storage-class: "default"
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: sonarqube
  labels:
    app: sonarqube
    version: v1
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: sonarqube
    spec:
      volumes:
      - name: sonarqube-db
        persistentVolumeClaim:
          claimName: sonarqube-vc

      imagePullSecrets:
      - name: acrsecret

      containers:
      - name: postgres
        image: postgres:alpine
        volumeMounts:
        - mountPath: "/var/lib/postgresql/data"
          name: sonarqube-db
        env:
        - name: POSTGRES_USER
          value: whatever
        - name: POSTGRES_PASSWORD
          value: whatever
        - name: POSTGRES_DB
          value: sonarqube
        - name: PGDATA
          value: "/var/lib/postgresql/data/pgdata"
        ports:
        - containerPort: 5432

After the claim is deployed:

kubectl get pvc

NAME           STATUS    VOLUME                                     CAPACITY   ACCESSMODES   STORAGECLASS   AGE
sonarqube-vc   Bound     pvc-0535bb1d-25ea-11e7-86f9-000d3a19ae39   20Gi       RWO           default        17s

kubectl get pv

NAME                                       CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS    CLAIM                  STORAGECLASS   REASON    AGE
pvc-0535bb1d-25ea-11e7-86f9-000d3a19ae39   20Gi       RWO           Delete          Bound     default/sonarqube-vc   default                  56s

InstanceView

  "disks": [
    {
      "name": "k8s-agent-2D24F7F6-2-osdisk",
      "statuses": [
        {
          "code": "ProvisioningState/succeeded",
          "level": "Info",
          "displayStatus": "Provisioning succeeded",
          "time": "2017-04-20T16:57:06.3153019+00:00"
        }
      ]
    },
    {
      "name": "absg-prod-acs-build-absg-prod-acs--pvc-0535bb1d-25ea-11e7-86f9-000d3a19ae39.vhd",
      "statuses": [
        {
          "code": "ProvisioningState/succeeded",
          "level": "Info",
          "displayStatus": "Provisioning succeeded",
          "time": "2017-04-20T16:57:06.7840702+00:00"
        }
      ]
    }
  ]

When deploying the above it times out and fails to mount:

If I do a kubectl describe pods/<pod-name> I see that the pod was scheduled on the same k8s agent that the disk is mounted to:

Node:		k8s-agent-2d24f7f6-2/10.240.0.6

Here is the error message from the diskmount:

  32s		32s		1	kubelet, k8s-agent-2d24f7f6-2			Warning		FailedMount	Unable to mount volumes for pod "sonarqube-2247042819-v0271_default(63086199-25ea-11e7-86f9-000d3a19ae39)": timeout expired waiting for volumes to attach/mount for pod "default"/"sonarqube-2247042819-v0271". list of unattached/unmounted volumes=[sonarqube-db]
  32s		32s		1	kubelet, k8s-agent-2d24f7f6-2			Warning		FailedSync	Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"sonarqube-2247042819-v0271". list of unattached/unmounted volumes=[sonarqube-db]

When deleting a PersistentVolume it does not remove it from the blob storage but does seem to unmount from host node:

If I run kubectl delete pv/<some-generated-name> && kubectl delete pvc/sonarqube-vc this does not result in the VHD being removed from the storage that is in the ACS resource group even thought the ReclaimPolicy is set to Delete. I am using the default StorageClass for this. However, it does not show up in the InstanceView after I do this, which suggests it is not mounted

@rossedman

This comment has been minimized.

rossedman commented Apr 20, 2017

@colemickens As a temporary fix I ended up using your kubernetes container that can create VHDs and I mounted that into my container. That process still works.

@rossedman

This comment has been minimized.

rossedman commented Apr 26, 2017

@colemickens Hey, wondering if there is any update on this? Was this what you needed? Thanks

@ComeMaes

This comment has been minimized.

ComeMaes commented May 24, 2017

Hi everyone,
Similar situation here, been troubleshooting it for a while now.

setup info:

Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-03T20:44:38Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.3", GitCommit:"029c3a408176b55c30846f0faedf56aae5992e9b", GitTreeState:"clean", BuildDate:"2017-02-15T06:34:56Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

Cluster deployed via ACS.

I have some database server k8s deployments (single pod per deployment, each having their own PVC, dynamically provisioned).

Problems arose after this specific event, as shown by the logs of the kube controller manager:

I0524 00:36:58.826461       1 nodecontroller.go:608] NodeController detected that zone westeurope::1 is now in state FullDisruption.
I0524 00:36:58.826549       1 nodecontroller.go:608] NodeController detected that zone westeurope::2 is now in state FullDisruption.

So judging by the logs that followed, some nodes went down, came back up, k8s rescheduled pods and we have the issue that was documented extensively above.

We tried to force pod deletion in some cases to try to understand what was going on, which resulted in them being recreated (as they're managed by the deployments), and some volumes were correctly reattached.
However even this was not always successful, and we have pods that are stuck in the ContainerCreating state:

Events:
  FirstSeen     LastSeen        Count   From                            SubObjectPath   Type            Reason          Message
  ---------     --------        -----   ----                            -------------   --------        ------          -------
  1h            3s              36      kubelet, k8s-agent-4ba79e32-1                   Warning         FailedMount   Unable to mount volumes for pod "mongodb-deployment-1225361271-d1z50_default(b39cc50b-4098-11e7-beb6-000d3a290a1b)": timeout expired waiting for volumes to
attach/mount for pod "default"/"mongodb-deployment-1225361271-d1z50". list of unattached/unmounted volumes=[mongo-data]
  1h            3s              36      kubelet, k8s-agent-4ba79e32-1                   Warning         FailedSync      Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"mongodb-deployment-1225361271-d1z50". list of unattached
/unmounted volumes=[mongo-data]

This appears to be the same issue: the blob lease is locked and the volume cannot be attached to the rescheduled pod.
The message in the Azure portal VM activity log:

Operation name
Write VirtualMachines
Time stamp
Wed May 24 2017 19:12:23 GMT+0200 (CEST)
Event initiated by
XXXXX-acs-cluster
Error code
AttachDiskWhileBeingDetached
Message
Cannot attach data disk 'XXXXX-dynamic-pvc-f71bb8b5-25dd-11e7-beb6-000d3a290a1b.vhd' to VM 'k8s-agent-4BA79E32-1' because the disk is currently being detached. Please wait until the disk is completely detached and then try again.

Appears repeatedly, forever:
image

InstanceView of the VM k8s-agent-4BA79E32-1:

{
  "platformUpdateDomain": 2,
  "platformFaultDomain": 2,
  "vmAgent": {
    "vmAgentVersion": "2.1.3",
    "statuses": [
      {
        "code": "ProvisioningState/succeeded",
        "level": "Info",
        "displayStatus": "Ready",
        "message": "Guest Agent is running",
        "time": "2017-05-24T16:32:23+00:00"
      }
    ],
    "extensionHandlers": [
      {
        "type": "Microsoft.OSTCExtensions.CustomScriptForLinux",
        "typeHandlerVersion": "1.5.2.1",
        "status": {
          "code": "ProvisioningState/succeeded",
          "level": "Info",
          "displayStatus": "Ready",
          "message": "Plugin enabled"
        }
      }
    ]
  },
  "disks": [
    {
      "name": "k8s-agent-4BA79E32-1-osdisk",
      "statuses": [
        {
          "code": "ProvisioningState/succeeded",
          "level": "Info",
          "displayStatus": "Provisioning succeeded",
          "time": "2017-05-24T15:49:53.2215883+00:00"
        }
      ]
    },
    {
      "name": "XXXXX-dynamic-pvc-f71bb8b5-25dd-11e7-beb6-000d3a290a1b.vhd",
      "statuses": [
        {
          "code": "ProvisioningState/deleting",
          "level": "Info",
          "displayStatus": "Deleting"
        }
      ]
    },
    {
      "name": "XXXXX-dynamic-pvc-eeaf97ab-25e6-11e7-beb6-000d3a290a1b.vhd",
      "statuses": [
        {
          "code": "ProvisioningState/failed/AcquireDiskLeaseFailed",
          "level": "Error",
          "displayStatus": "Provisioning failed",
          "message": "Failed to acquire lease while creating disk 'XXXXX-dynamic-pvc-eeaf97ab-25e6-11e7-beb6-000d3a290a1b.vhd' using blob with URI https://XXXXXstandardstorage.blob.core.windows.net/vhds/XXXXX-dynamic-pvc-eeaf97ab-25e6-11e7-beb6-000d3a290a1b.vhd. Blob is already in use.",
          "time": "2017-05-24T15:49:53.518542+00:00"
        }
      ]
    }
  ],
  "extensions": [
    {
      "name": "cse1",
      "type": "Microsoft.OSTCExtensions.CustomScriptForLinux",
      "typeHandlerVersion": "1.5.2.1",
      "statuses": [
        {
          "code": "ProvisioningState/succeeded",
          "level": "Info",
          "displayStatus": "Provisioning succeeded",
          "message": "Command is finished.\n---stdout---\n\n---errout---\n\n"
        }
      ]
    }
  ],
  "statuses": [
    {
      "code": "ProvisioningState/failed/AcquireDiskLeaseFailed",
      "level": "Error",
      "displayStatus": "Provisioning failed",
      "message": "Failed to acquire lease while creating disk 'XXXXX-dynamic-pvc-eeaf97ab-25e6-11e7-beb6-000d3a290a1b.vhd' using blob with URI https://XXXXXstandardstorage.blob.core.windows.net/vhds/XXXXX-dynamic-pvc-eeaf97ab-25e6-11e7-beb6-000d3a290a1b.vhd. Blob is already in use.",
      "time": "2017-05-24T15:49:53.5341685+00:00"
    },
    {
      "code": "PowerState/running",
      "level": "Info",
      "displayStatus": "VM running"
    }
  ]
}

So in the instanceView the disk XXXXX-dynamic-pvc-f71bb8b5-25dd-11e7-beb6-000d3a290a1b.vhd is being deleted. The message from the Activity Log of the portal UI says that it cannot be attached because it's being detached... apparently from the same VM.

Finally "statuses" message of the instanceView is the same as the one appearing in the Overview tab of the portal UI :

Provisioning state 
Provisioning failed. Failed to acquire lease while creating disk 'XXXXX-dynamic-pvc-eeaf97ab-25e6-11e7-beb6-000d3a290a1b.vhd' using blob with URI https://XXXXXstandardstorage.blob.core.windows.net/vhds/XXXXX-dynamic-pvc-eeaf97ab-25e6-11e7-beb6-000d3a290a1b.vhd. Blob is already in use.. AcquireDiskLeaseFailed

Which refers to another disk that is actually bound to another pod running in another instance without any problem. I don't understand why that node would be trying to acquire the lease of this volume...

This is a heavy and unexpected blocker in the process of moving to k8s on Azure for us. Happy to discuss and help troubleshooting this further (slack, email, on here or wherever), as it's a critical point to elude before moving forward with Azure. @colemickens Please let me know.

@theobolo

This comment has been minimized.

theobolo commented Jan 5, 2018

@teeterc It's exactly the same for me since yesterday....

Azure/acs-engine#2002

Only diffirence is that my disks are up on Azure portal and marked as "Unattached" but still, it can't be mounted on the worker.

@andyzhangx

This comment has been minimized.

andyzhangx commented Jan 5, 2018

@teeterc could you run kubectl getpvc and kubectl describe pvc PVC-NAME to get the status of PVC first?

@andyzhangx

This comment has been minimized.

andyzhangx commented Jan 5, 2018

@teeterc for your case, restart controller-manager in master may resolve your issue

@ajhewett

This comment has been minimized.

ajhewett commented Jan 5, 2018

@teeterc I'm also seeing the same errors with PVC/PV and Statefulsets on multiple Kubernetes clusters since the maintenance reboots. My clusters were created using acs-engine with Kubernetes v1.7.5.

AttachVolume.Attach failed for volume "pvc-f9ae8b26-f21a-11e7-8424-000d3a2aee28" : Attach volume "stage-eu-west-es5-kube-dynamic-pvc-f9ae8b26-f21a-11e7-8424-000d3a2aee28" to instance "k8s-agent-82840771-0" failed with compute.VirtualMachinesClient#CreateOrUpdate: Failure responding to request: StatusCode=409 -- Original Error: autorest/azure: Service returned an error. Status=409 Code="AttachDiskWhileBeingDetached" Message="Cannot attach data disk 'stage-eu-west-es5-kube-dynamic-pvc-928b8f62-ba15-11e7-b2a4-000d3a2aee28' to VM 'k8s-agent-82840771-0' because the disk is currently being detached or the last detach operation failed. Please wait until the disk is completely detached and then try again or delete/detach the disk explicitly again."

also with the mismatch between requested and error volume id.

The suggestion from @andyzhangx to restart the controller-manager has not helped. Nor has redeploying the k8s-master VMs or the affected k8s-agent VMs.

@rcconsult

This comment has been minimized.

rcconsult commented Jan 5, 2018

Hi,
I have tested two options:

  1. List the PVCs and PVs stuck after the reboots and re-create them, which means loosing existing data, because you may also need to destroy VHDs in storage account as well.
  2. In Azure portal stop your affected VM, edit disks and remove all stuck PVCs mounts, start VM and delete/refresh affected pods stuck in ContainerCrating stage, your data should stay preserved if your PV policy was set to "Retain".

Good luck

Radovan

@theobolo

This comment has been minimized.

theobolo commented Jan 5, 2018

@ajhewett @teeterc You can follow the debugging on the other issue Azure/acs-engine#2002 and compare to your own errors.

But it seems that something went wrong yesterday.

@rcconsult It's a way if your disks are already mounted, mine are brand new so basically i can't create mount new disks on workers even if they weren't mounted anywhere previously.

There is definitely a problem, mounting Managed and Non-Managed disks since yesterday.

It's really critical for production clusters guys... Yesterday was a crazy day ...

@ajhewett

This comment has been minimized.

ajhewett commented Jan 5, 2018

@rcconsult I have performed the same as your option 1 which has worked in some clusters (fortunately, the data in my volumes is replicated). However, I have a cluster where the problem is very persistent and k8s keeps trying to attach a deleted volume.

@andyzhangx

This comment has been minimized.

andyzhangx commented Jan 5, 2018

@ajhewett could you check the VM status of k8s-agent-82840771-0 and other VMs in azure portal, is it in failed status?
Update:
If it's in failed state, use this solution to fix it first.

@ajhewett

This comment has been minimized.

ajhewett commented Jan 5, 2018

@andyzhangx the k8s agents and master are not now in a failed state. Some were in a failed state right after the maintenance reboots but I performed a "Redeploy" from the portal on the failed VMs several hours ago. The errors attaching volumes have been occuring before and after my redeploy.

@andyzhangx

This comment has been minimized.

andyzhangx commented Jan 5, 2018

@ajhewett could you check the disk stage-eu-west-es5-kube-dynamic-pvc-928b8f62-ba15-11e7-b2a4-000d3a2aee28 status, is it attached to a VM? You may detach that disk manually and wait to check.
Also you could get more debugging info from Azure/acs-engine#2002
Update:
also could you provide k8s-agent-82840771-0 VM status from https://resources.azure.com ?

@ajhewett

This comment has been minimized.

ajhewett commented Jan 5, 2018

@andyzhangx the disk stage-eu-west-es5-kube-dynamic-pvc-928b8f62-ba15-11e7-b2a4-000d3a2aee28 does not exist. I had deleted it (+ PVC & PV) after redeploying the k8s-agent VM did not help. Similar to option 1 from @rcconsult which fixed a problems in other clusters after the maintenance.
Disk stage-eu-west-es5-kube-dynamic-pvc-f9ae8b26-f21a-11e7-8424-000d3a2aee28 does exist and is not attached to any VM. This is the disk that should be mounted.

@ajhewett

This comment has been minimized.

ajhewett commented Jan 5, 2018

@andyzhangx the VM status is:

{
  "properties": {
    "vmId": "ca92b311-2219-4bb9-8d2a-1911ec7888d6",
    "availabilitySet": {
      "id": "/subscriptions/REDACTED/resourceGroups/stage-eu-west-es5-kube/providers/Microsoft.Compute/availabilitySets/AGENT-AVAILABILITYSET-82840771"
    },
    "hardwareProfile": {
      "vmSize": "Standard_DS12_v2"
    },
    "storageProfile": {
      "imageReference": {
        "publisher": "Canonical",
        "offer": "UbuntuServer",
        "sku": "16.04-LTS",
        "version": "16.04.201706191"
      },
      "osDisk": {
        "osType": "Linux",
        "name": "k8s-agent-82840771-0_OsDisk_1_955d458bd14341a29c286e26ad08e8fd",
        "createOption": "FromImage",
        "caching": "ReadWrite",
        "managedDisk": {
          "storageAccountType": "Premium_LRS",
          "id": "/subscriptions/REDACTED/resourceGroups/STAGE-EU-WEST-ES5-KUBE/providers/Microsoft.Compute/disks/k8s-agent-82840771-0_OsDisk_1_955d458bd14341a29c286e26ad08e8fd"
        },
        "diskSizeGB": 128
      },
      "dataDisks": []
    },
    "osProfile": {
      "computerName": "k8s-agent-82840771-0",
      "adminUsername": "azureuser",
      "linuxConfiguration": {
        "disablePasswordAuthentication": true,
        "ssh": {
          "publicKeys": [
            {
              "path": "/home/azureuser/.ssh/authorized_keys",
              "keyData": "REDACTED"
            }
          ]
        }
      },
      "secrets": []
    },
    "networkProfile": {
      "networkInterfaces": [
        {
          "id": "/subscriptions/REDACTED/resourceGroups/stage-eu-west-es5-kube/providers/Microsoft.Network/networkInterfaces/k8s-agent-82840771-nic-0"
        }
      ]
    },
    "provisioningState": "Succeeded"
  },
  "resources": [
    {
      "properties": {
        "publisher": "Microsoft.Azure.Extensions",
        "type": "CustomScript",
        "typeHandlerVersion": "2.0",
        "autoUpgradeMinorVersion": true,
        "settings": {},
        "provisioningState": "Succeeded"
      },
      "type": "Microsoft.Compute/virtualMachines/extensions",
      "location": "westeurope",
      "id": "/subscriptions/REDACTED/resourceGroups/stage-eu-west-es5-kube/providers/Microsoft.Compute/virtualMachines/k8s-agent-82840771-0/extensions/cse0",
      "name": "cse0"
    },
    {
      "properties": {
        "publisher": "Microsoft.EnterpriseCloud.Monitoring",
        "type": "OmsAgentForLinux",
        "typeHandlerVersion": "1.0",
        "autoUpgradeMinorVersion": true,
        "settings": {
          "workspaceId": "c6e477f5-722a-4cd9-8953-635f21709e94",
          "azureResourceId": "/subscriptions/REDACTED/resourcegroups/stage-eu-west-es5-kube/providers/microsoft.compute/virtualmachines/k8s-agent-82840771-0",
          "stopOnMultipleConnections": true
        },
        "provisioningState": "Succeeded"
      },
      "type": "Microsoft.Compute/virtualMachines/extensions",
      "location": "westeurope",
      "id": "/subscriptions/REDACTED/resourceGroups/stage-eu-west-es5-kube/providers/Microsoft.Compute/virtualMachines/k8s-agent-82840771-0/extensions/OmsAgentForLinux",
      "name": "OmsAgentForLinux"
    }
  ],
  "type": "Microsoft.Compute/virtualMachines",
  "location": "westeurope",
  "tags": {
    "creationSource": "acsengine-k8s-agent-82840771-0",
    "orchestrator": "Kubernetes:1.7.5",
    "poolName": "agent",
    "resourceNameSuffix": "82840771"
  },
  "id": "/subscriptions/REDACTED/resourceGroups/stage-eu-west-es5-kube/providers/Microsoft.Compute/virtualMachines/k8s-agent-82840771-0",
  "name": "k8s-agent-82840771-0"
}
@rocketraman

This comment has been minimized.

rocketraman commented Jan 5, 2018

Same here... a 1.7.5 cluster with all of the stateful sets borked after the security updates. 3 of 6 VMs are stuck in "Failed" state (though active nodes in k8s), and the other 3 are stuck in "VM Stopping" state. Attempting the instructions @ https://blogs.technet.microsoft.com/mckittrick/azure-vm-stuck-in-failed-state-arm/ for the Failed VMs just results in a 409 conflict error saying that the disk cannot detached because it is in use. Attempting it on the "VM Stopping" state VMs just freezes the powershell script indefinitely.

It's kind of crazy and scary how poorly stateful sets on Azure work. Its doubly annoying because when everything is fine they work well -- they just can't survive any sort of unusual / exceptional situations.

@IainColledge

This comment has been minimized.

IainColledge commented Jan 5, 2018

Have now got multiple clusters with multiple containers with problems due to nodes in failed states and disk attach errors, have got to say this really needs looking at to make it robust as is very fragile.

@andyzhangx

This comment has been minimized.

andyzhangx commented Jan 6, 2018

@ajhewett thanks for providing the info. Could you use https://blogs.technet.microsoft.com/mckittrick/azure-vm-stuck-in-failed-state-arm/ to update VM k8s-agent-82840771-0 even if's in healthy status? And let me know the Powershell script result. This could be a VM level issue, thanks.

@ajhewett

This comment has been minimized.

ajhewett commented Jan 6, 2018

@andyzhangx many, many thanks!

I executed

PS C:\> Get-AzureRmVM -Name "k8s-agent-82840771-0" -ResourceGroupName 'stage-eu-west-es5-kube' | Update-AzureRmVM

RequestId IsSuccessStatusCode StatusCode ReasonPhrase
--------- ------------------- ---------- ------------
                         True         OK OK     

and without any futher manual intervention the pod that was stuck in ContainerCreating sucessfully mounted the correct volume ("pvc-f9ae8b26-f21a-11e7-8424-000d3a2aee28") and is now Running.

@andyzhangx

This comment has been minimized.

andyzhangx commented Jan 6, 2018

@ajhewett good to know. One thing to confirm, some of your VMs are in Failed state, and then you

  1. use Redeploy in azure portal which could bring VM state to Running.
  2. And then use https://blogs.technet.microsoft.com/mckittrick/azure-vm-stuck-in-failed-state-arm/ to update agent VM which has the issue.
    These two steps are the workaround for this issue? right?
@ajhewett

This comment has been minimized.

ajhewett commented Jan 6, 2018

@andyzhangx

Several VMs rebooted without any problems. Step 1 helped repair some failed VMs but not all. Step 2 was additionally needed for some VMs even though they seemed healthy.

However, I cannot be sure that only these 2 steps are sufficient because I tried several other things before step 2. e.g. reducing the number of replicas in the statefulsets, deleting PVCs, deleting the stuck pods so that the statefulset recreated them.

BTW: I still have 7 k8s clusters with VMs that have not (yet) been rebooted. If a similar problem occurs I will first try step 1 and 2 before anything else.

@andyzhangx

This comment has been minimized.

andyzhangx commented Jan 6, 2018

@ajhewett thanks for the info.
Another side quesiton: are all your statefulsets using azure disk, since azure disk only supports RWO which means if replica >=2 and the two pods are in two different agent nodes, they could not use same azure disk(PVC)

@ajhewett

This comment has been minimized.

ajhewett commented Jan 6, 2018

@andyzhangx the statefulsets use dynamically provisioned volumes (PVCs) with managed azure disks (storageclass managed-premium or managed-standard). Each pod gets its own disk, there is no disk sharing. The workload using statefulsets and PVCs is Elasticsearch.

@rocketraman

This comment has been minimized.

rocketraman commented Jan 7, 2018

@andyzhangx I've tried your workaround @ #12 (comment) for my situation. However, it was not sufficient. The fundamental problem seems to be a disconnect between the Kubernetes view of each node (node fully up and in "Running" state) and the Azure view of each node VM (VM in "VM Stopping" or in "Failed" state), which never resolves because of the persistent disk leases not being cleanly released and re-acquired.

The final solution that worked for me was to simply do the following for every agent node in my cluster one at a time:

  1. kubectl cordon <node>
  2. delete any pods on node with stateful sets
  3. kubectl drain <node>
  4. restart the Azure VM for node via the API or portal, wait untli VM is "Running"
  5. kubectl uncordon <node>

While doing this, the newly restarted nodes may still fail to mount persistent disks as one of the subsequent un-restarted VMs may still be holding a lock on a disk. This resolves itself once all the VMs are restarted and are in "Running" state.

@teeterc

This comment has been minimized.

teeterc commented Jan 7, 2018

I did find a workaround (sorry i didn't post earlier). I noticed that the failing pods were being deployed the failed Azure nodes. So, i:

1: Scaled the the kube cluster + sum(failed nodes)
2: Deleted the failed nodes in Azure Portal (this removed any disk mounting issues when the pods came up on other cluster nodes.)

Everything came back online in about 10m. It seems that the Volumes that are failing to mount are isolated to the failing pods.... still strange, however, that Pods were trying to mount unrelated stateful volumes.

@andyzhangx

This comment has been minimized.

andyzhangx commented Feb 25, 2018

Update this thread:
Recently I fixed a race condition issue which could cause attach disk error, the fix has been merged into v1.10, I am trying to cherry-pick this fix to other k8s version, you could find details about this issue here:
https://github.com/andyzhangx/demo/blob/master/issues/azuredisk-issues.md#1-disk-attach-error

1. disk attach error

Issue details:

In some corner case(detaching multiple disks on a node simultaneously), when scheduling a pod with azure disk mount from one node to another, there could be lots of disk attach error(no recovery) due to the disk not being released in time from the previous node. This issue is due to lack of lock before DetachDisk operation, actually there should be a central lock for both AttachDisk and DetachDisk opertions, only one AttachDisk or DetachDisk operation is allowed at one time.

The disk attach error could be like following:

Cannot attach data disk 'cdb-dynamic-pvc-92972088-11b9-11e8-888f-000d3a018174' to VM 'kn-edge-0' because the disk is currently being detached or the last detach operation failed. Please wait until the disk is completely detached and then try again or delete/detach the disk explicitly again.

Related issues

Mitigation:

  • option#1: Update every agent node that has attached or detached the disk in problem
in Azure cloud shell, run
$vm = Get-AzureRMVM -ResourceGroupName $rg -Name $vmname  
Update-AzureRmVM -ResourceGroupName $rg -VM $vm -verbose -debug
in Azure cli, run
az vm update -g <group> -n <name>
  • option#2:
  1. kubectl cordon node #make sure no scheduling on this node
  2. kubectl drain node #schedule pod in current node to other node
  3. restart the Azure VM for node via the API or portal, wait untli VM is "Running"
  4. kubectl uncordon node

Fix

k8s version fixed version
v1.6 no fix since v1.6 does not accept any cherry-pick
v1.7 1.7.14
v1.8 1.8.9
v1.9 1.9.5
v1.10 1.10.0
@rocketraman

This comment has been minimized.

rocketraman commented Mar 3, 2018

@andyzhangx I also note that most of my nodes, even though there are no visible issues with stateful sets right now, are getting errors like this every second or so in my logs:

Mar 03 08:56:50 k8s-agentpool1-18117938-0 docker[460]: E0303 08:56:50.123499     493 kubelet_volumes.go:128] Orphaned pod "0f2fbe5f-0004-11e8-8438-000d3af4357e" found, but volume paths are still present on disk. : There were a total of 3 errors similar to this.  Turn up verbosity to see them.                                                                                                                                                                 

Investigating that pod:

root@k8s-agentpool1-18117938-0:~# ls -lR /var/lib/kubelet/pods/0f2fbe5f-0004-11e8-8438-000d3af4357e/volumes/
/var/lib/kubelet/pods/0f2fbe5f-0004-11e8-8438-000d3af4357e/volumes/:
total 8
drwxr-x--- 3 root root 4096 Jan 23 06:14 kubernetes.io~azure-disk
drwxr-xr-x 2 root root 4096 Jan 23 06:33 kubernetes.io~secret

/var/lib/kubelet/pods/0f2fbe5f-0004-11e8-8438-000d3af4357e/volumes/kubernetes.io~azure-disk:
total 4
drwxr-x--- 2 root root 4096 Jan 23 06:14 pvc-768de031-9e81-11e7-a717-000d3af4357e

/var/lib/kubelet/pods/0f2fbe5f-0004-11e8-8438-000d3af4357e/volumes/kubernetes.io~azure-disk/pvc-768de031-9e81-11e7-a717-000d3af4357e:
total 0

/var/lib/kubelet/pods/0f2fbe5f-0004-11e8-8438-000d3af4357e/volumes/kubernetes.io~secret:
total 0
root@k8s-agentpool1-18117938-0:~# mount | grep 0f2fbe5f-0004-11e8-8438-000d3af4357e
root@k8s-agentpool1-18117938-0:~# 

Note, that pod is not running on this machine any more. So it seems like somewhere along the way these volume directories did not get cleaned up properly.

@andyzhangx

This comment has been minimized.

andyzhangx commented Mar 3, 2018

@rocketraman this could be related to pod deletion error due to race condition, if error only exists in kubelet logs, that would be ok according to my experience.

@rocketraman

This comment has been minimized.

rocketraman commented Mar 3, 2018

@rocketraman this could be related to pod deletion error due to race condition, if error only exists in kubelet logs, that would be ok according to my experience.

So after your fix is deployed, this should not occur again right? The log messages themselves seem easy to "solve" -- it appears that removing the orphaned pod directory with rm -fr /var/lib/kubelet/pods/0f2fbe5f-0004-11e8-8438-000d3af4357e does work. I'm not quite sure if there are any negatives to this but everything in that directory is zero-length files or empty directories anyway, so I suspect none.

@andyzhangx

This comment has been minimized.

andyzhangx commented May 3, 2018

The original issue is fixed by kubernetes/kubernetes#60183
you could find details here:
https://github.com/andyzhangx/demo/blob/master/issues/azuredisk-issues.md#1-disk-attach-error

Let me know if you have any question, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment