Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod volume mounting failing even after PV is bound and attached to pod #49926

Closed
amolsh opened this issue Aug 1, 2017 · 77 comments
Closed

Pod volume mounting failing even after PV is bound and attached to pod #49926

amolsh opened this issue Aug 1, 2017 · 77 comments
Assignees
Labels
area/provider/aws Issues or PRs related to aws provider kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/storage Categorizes an issue or PR as relevant to SIG Storage.

Comments

@amolsh
Copy link

amolsh commented Aug 1, 2017

kubectl version:
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1", GitCommit:"82450d03cb057bab0950214ef122b67c83fb11df", GitTreeState:"clean", BuildDate:"2016-12-14T00:57:05Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.4+coreos.0", GitCommit:"97c11b097b1a2b194f1eddca8ce5468fcc83331c", GitTreeState:"clean", BuildDate:"2017-03-08T23:54:21Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

yml file:

---
apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  ports:
  - port: 80
    name: web
  clusterIP: None
  selector:
    app: nginx
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: web
spec:
  serviceName: "nginx"
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: gcr.io/google_containers/nginx-slim:0.8
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: www
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: www
      annotations:
         volume.beta.kubernetes.io/storage-class: default
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi

List of bound PVs:

image

image

ERROR:
image

relevant kubelet logs:

image

@k8s-github-robot k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Aug 1, 2017
@xiangpengzhao
Copy link
Contributor

/sig storage

@k8s-ci-robot k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Aug 1, 2017
@k8s-github-robot k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Aug 1, 2017
@huangjiasingle
Copy link

@amolsh you can get the pv? you storage is use storage-class, if you create pvc ,it's will auto create pv. you can see the pv exist or not

@amolsh
Copy link
Author

amolsh commented Aug 1, 2017

yes i can get it
image

Also forgot to mention my cluster is in AWS
Also, I am getting above error for all deployments which i tried. None of them is able to mount volume to attached ebs pv.

@huangjiasingle
Copy link

@amolsh did you created a storage-class was named default? cany you show the storage-class? l think the reason is miss storage-class of named default. do you have any other yaml?

@amolsh
Copy link
Author

amolsh commented Aug 2, 2017

@huangjiasingle This is my storage class

apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
  name: default
  annotations:
    storageclass.beta.kubernetes.io/is-default-class: "true"
  labels:
    kubernetes.io/cluster-service: "true"
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2

image

@jingxu97
Copy link
Contributor

jingxu97 commented Aug 2, 2017

@amolsh do you have the master log we can take a look? You can email us if your prefer. The kubelet log shows that the volume is not attached to the node yet.

@amolsh
Copy link
Author

amolsh commented Aug 3, 2017

master kube-controll-manager is not generating any logs related to above statefulset(it did not generate any logs when I created above statefulset). Also I couldn't find anything in api-server logs also. One more thing I forgot to mention my cluster is on CoreOs machines

@amolsh
Copy link
Author

amolsh commented Aug 7, 2017

@huangjiasingle
In logs it gives following error. Volume is not able to attach, because of some authorization issue. Is it related to aws IAM policy:

Failed to attach volume "pvc-fff84c66-7b35-11e7-b125-02f3f42ec6aa" on node "ip-100-x-x-x.us-west-2.compute.internal" with: Error attaching EBS volume "vol-00cbb836374d1b37b" to instance "i-03d4ba6b17ab9cf5f": UnauthorizedOperation: You are not authorized to perform this operation. Encoded authorization failure message: 4nedpChQKhsxXs......

@amolsh
Copy link
Author

amolsh commented Aug 7, 2017

I updated IAM policy and added ec2:AttachVolume and ec2:DetachVolume. It resolved authorization issue. But now it is giving another issue. Even though volume is available attached to instance :

Failed to attach volume "pvc-79e4c457-7b57-11e7-a96e-0698361089de" on node "ip-100-x-x-x.us-west-2.compute.internal" with: Error attaching EBS volume "vol-00da392489f12395f" to instance "i-02bbbc571c95b69fd": IncorrectState: vol-00da392489f12395f is not 'available'.

@amolsh
Copy link
Author

amolsh commented Aug 8, 2017

Now volume attachement is successfull, but still getting error

image

master kube-controller logs:


I0808 10:56:41.980229       1 event.go:217] Event(api.ObjectReference{Kind:"StatefulSet", Namespace:"default", Name:"web", UID:"42d0dd3d-7c28-11e7-a194-0acdd643d1f4", APIVersion:"apps", ResourceVersion:"32766795", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' pet: web-0
I0808 10:56:42.010206       1 pet_set.go:332] StatefulSet web blocked from scaling on pod web-0
I0808 10:56:42.028325       1 pet_set.go:332] StatefulSet web blocked from scaling on pod web-0
I0808 10:56:42.045299       1 pet_set.go:332] StatefulSet web blocked from scaling on pod web-0
I0808 10:56:42.074867       1 reconciler.go:213] Started AttachVolume for volume "kubernetes.io/aws-ebs/aws://us-west-2c/vol-041e0445484858feb" to node "ip-100-x-x-216.us-west-2.compute.internal"

I0808 10:56:52.643138       1 operation_executor.go:620] AttachVolume.Attach succeeded for volume "kubernetes.io/aws-ebs/aws://us-west-2c/vol-041e0445484858feb" (spec.Name: "pvc-40b565bc-7c26-11e7-b125-02f3f42ec6aa") from node "ip-100-x-x-216.us-west-2.compute.internal".
I0808 10:56:58.393077       1 pet_set.go:332] StatefulSet web blocked from scaling on pod web-0
I0808 10:57:28.393299       1 pet_set.go:332] StatefulSet web blocked from scaling on pod web-0
I0808 10:57:58.393600       1 pet_set.go:332] StatefulSet web blocked from scaling on pod web-0
I0808 10:58:28.393739       1 pet_set.go:332] StatefulSet web blocked from scaling on pod web-0
I0808 10:58:58.394266       1 pet_set.go:332] StatefulSet web blocked from scaling on pod web-0
I0808 10:59:28.393965       1 pet_set.go:332] StatefulSet web blocked from scaling on pod web-0
I0808 10:59:58.394304       1 pet_set.go:332] StatefulSet web blocked from scaling on pod web-0

@msau42
Copy link
Member

msau42 commented Nov 9, 2017

@amolsh are you still seeing this issue?

@amolsh
Copy link
Author

amolsh commented Nov 9, 2017 via email

@msau42
Copy link
Member

msau42 commented Nov 9, 2017

This seems like a possible issue related to EBS volumes. Can @kubernetes/sig-aws-misc help out?

@mattcamp
Copy link

I'm also having this issue: "AttachVolume.Attach failed for volume "pvc-5aa1db99-d04b-11e7-96e1-0a328f684a08" : Error attaching EBS volume "vol-026ef055a9d715c14" to instance "i-0e6f2c5d8edd415e4": IncorrectState: vol-026ef055a9d715c14 is not 'available'. status code: 400"

Using a single master, 3 node cluster in AWS (deployed via KOPS)... trying to deploy influxdb via helm.

It worked the first time I deployed it but after deleting the deployment (to change collectd config) I can't get it to work... it doesn't seem to matter which node, none of them work.

Grafana is deployed on the same cluster just fine (but have only deployed it once).

helm install stable/influxdb --name influxdb --set persistence.enabled=true --set config.collectd.enabled=true --namespace poke

Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.2", GitCommit:"922a86cfcd65915a9b2f69f3f193b8907d741d9c", GitTreeState:"clean", BuildDate:"2017-07-21T08:23:22Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.10", GitCommit:"bebdeb749f1fa3da9e1312c4b08e439c404b3136", GitTreeState:"clean", BuildDate:"2017-11-03T16:31:49Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}

kubectl.log
node.log

@gnufied
Copy link
Member

gnufied commented Nov 27, 2017

The IncorrectState: vol-026ef055a9d715c14 is not 'available'. status code: 400" is a transient error that happens when you try to attach a volume immediately after creating it, it goes away in half a second or so and controller will keep retrying attach.

@mattcamp are you seeing "Not available" error more than once in controller logs? Looking at kubectl output I see following message:

  3m            3m      1       kubelet, ip-172-20-36-162.eu-west-1.compute.internal                                            Normal  SuccessfulMountVolume   MountVolume.SetUp succeeded for volume "pvc-5aa1db99-d04b-11e7-96e1-0a328f684a08"

Which indicates attach succeeded and volume was attached to the node. I think something else happened when container was started:

3m            4s      21      kubelet, ip-172-20-36-162.eu-west-1.compute.internal    spec.containers{influxdb-influxdb}      Warning BackOff                 Back-off restarting failed container

You may want to look into it.

@neverfox
Copy link

neverfox commented Nov 30, 2017

Ran into "IncorrectState: ... is not 'available'. status code: 400" when trying a stateful set in 1.8 that works great in 1.7. It does retry but keeps failing and the pod stays in a CreateContainerConfigError state (something that's new to me).

@feffi
Copy link

feffi commented Jan 7, 2018

Hi, i got the same (?) error when trying 'helm install'

$ kubectl get storageclass
NAME            PROVISIONER             AGE
default         kubernetes.io/aws-ebs   4d
gp2 (default)   kubernetes.io/aws-ebs   4d
$ kubectl get persistentvolume
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                          STORAGECLASS   REASON    AGE
pvc-8b23869c-f196-11e7-b93f-020c644f020c   8Gi        RWO            Delete           Bound     default/sonarqube-postgresql   gp2                      3d
pvc-c4707186-f3f0-11e7-b93f-020c644f020c   20Gi       RWO            Delete           Bound     default/git-minio              gp2                      21m
pvc-c4710474-f3f0-11e7-b93f-020c644f020c   10Gi       RWO            Delete           Bound     default/git-postgresql         gp2                      21m
pvc-c471bf13-f3f0-11e7-b93f-020c644f020c   10Gi       RWO            Delete           Bound     default/git-redis              gp2                      21m
pvc-c47273d1-f3f0-11e7-b93f-020c644f020c   10Gi       RWO            Delete           Bound     default/git-registry-data      gp2                      21m
pvc-c4732d10-f3f0-11e7-b93f-020c644f020c   10Gi       RWO            Delete           Bound     default/git-gitlab-data        gp2                      21m
$ kubectl get persistentvolumeclaim
NAME                   STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
git-gitlab-data        Bound     pvc-c4732d10-f3f0-11e7-b93f-020c644f020c   10Gi       RWO            gp2            21m
git-minio              Bound     pvc-c4707186-f3f0-11e7-b93f-020c644f020c   20Gi       RWO            gp2            21m
git-postgresql         Bound     pvc-c4710474-f3f0-11e7-b93f-020c644f020c   10Gi       RWO            gp2            21m
git-redis              Bound     pvc-c471bf13-f3f0-11e7-b93f-020c644f020c   10Gi       RWO            gp2            21m
git-registry-data      Bound     pvc-c47273d1-f3f0-11e7-b93f-020c644f020c   10Gi       RWO            gp2            21m
sonarqube-postgresql   Bound     pvc-8b23869c-f196-11e7-b93f-020c644f020c   8Gi        RWO            gp2            3d
Events:
  Type     Reason                 Age                 From                                                    Message
  ----     ------                 ----                ----                                                    -------
  Warning  FailedScheduling       24m (x2 over 24m)   default-scheduler                                       PersistentVolumeClaim is not bound: "git-registry-data" (repeated 3 times)
  Normal   Scheduled              24m                 default-scheduler                                       Successfully assigned git-registry-7b784bb55d-lvhbk to ip-172-20-34-53.eu-central-1.compute.internal
  Warning  FailedMount            24m                 attachdetach                                            AttachVolume.Attach failed for volume "pvc-c47273d1-f3f0-11e7-b93f-020c644f020c" : Error attaching EBS volume "vol-0e7150ac4806512a8" to instance "i-0ee36c5c18824a89e": "IncorrectState: vol-0e7150ac4806512a8 is not 'available'.\n\tstatus code: 400, request id: c8cc4773-af8a-402c-b08b-fb280b4fb3da"
  Warning  FailedMount            23m                 attachdetach                                            AttachVolume.Attach failed for volume "pvc-c47273d1-f3f0-11e7-b93f-020c644f020c" : Error attaching EBS volume "vol-0e7150ac4806512a8" to instance "i-0ee36c5c18824a89e": "IncorrectState: vol-0e7150ac4806512a8 is not 'available'.\n\tstatus code: 400, request id: 1e0a0209-147c-458f-9fbf-7c45ef885e7c"
  Normal   SuccessfulMountVolume  23m                 kubelet, ip-172-20-34-53.eu-central-1.compute.internal  MountVolume.SetUp succeeded for volume "default-token-z2zbm"
  Normal   SuccessfulMountVolume  23m                 kubelet, ip-172-20-34-53.eu-central-1.compute.internal  MountVolume.SetUp succeeded for volume "pvc-c47273d1-f3f0-11e7-b93f-020c644f020c"
  Warning  FailedMount            12m (x5 over 21m)   kubelet, ip-172-20-34-53.eu-central-1.compute.internal  Unable to mount volumes for pod "git-registry-7b784bb55d-lvhbk_default(c4a17555-f3f0-11e7-b93f-020c644f020c)": timeout expired waiting for volumes to attach/mount for pod "default"/"git-registry-7b784bb55d-lvhbk". list of unattached/unmounted volumes=[certs]
  Warning  FailedMount            11m (x14 over 23m)  kubelet, ip-172-20-34-53.eu-central-1.compute.internal  MountVolume.SetUp failed for volume "certs" : secrets "registry-server-tls" not found
  Warning  FailedSync             3m (x9 over 21m)    kubelet, ip-172-20-34-53.eu-central-1.compute.internal  Error syncing pod

@msau42
Copy link
Member

msau42 commented Jan 8, 2018

@feffi the mount error you got is pretty self-explanatory:

MountVolume.SetUp failed for volume "certs" : secrets "registry-server-tls" not found

This is different from the OP's error, which is related to EBS volumes.

@macropin
Copy link

It's worth noting that a message such Warning FailedScheduling 24m (x2 over 24m) default-scheduler PersistentVolumeClaim is not bound: "git-registry-data" (repeated 3 times) can appear to indicate a PV/PVC issue, when infact the pod is restarting due to error and the message is repeated when the container is rescheduled.

@hugohenley
Copy link

Same issue here!

@jolcese
Copy link

jolcese commented Jan 31, 2018

Same problem when deploying a Stateful Set on kops on AWS.
I'm trying to deploy Cassandra with 6 replicas and sometimes it fails on the first pod and sometimes in the 2nd...

PersistentVolumeClaim is not bound: "cassandra-data-pvc-cassandra-1" (repeated 9 times)
AttachVolume.Attach failed for volume "pvc-dd201a20-06dd-11e8-9f82-06210fc81b88" : Error attaching EBS volume "vol-0a752df5ddef30f64" to instance "i-039b684aca16fc74e": "IncorrectState: vol-0a752df5ddef30f64 is not 'available'.\n\tstatus code: 400, request id: e57f9b75-e57b-4455-843c-9f9b61eb83f1"
Back-off restarting failed container
Error syncing pod

versions:

jolcese@jolcese-osx:~/src/kubernetes/kops$ kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.2", GitCommit:"5fa2db2bd46ac79e5e00a4e6ed24191080aa463b", GitTreeState:"clean", BuildDate:"2018-01-18T10:09:24Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.6", GitCommit:"6260bb08c46c31eea6cb538b34a9ceb3e406689c", GitTreeState:"clean", BuildDate:"2017-12-21T06:23:29Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
jolcese@jolcese-osx:~/src/kubernetes/kops$ kops  version
Version 1.8.0
jolcese@jolcese-osx:~/src/kubernetes/kops$

@lvicentesanchez
Copy link

Same error here... I'm create an storage class, pv, pvc and a statefulset in one go and I get the same error. The ebs volume it's actually attached to the node so I'm not sure what's going on.

@lvicentesanchez
Copy link

I have tried to remove the pod and it still fails, once recreated.

@lvicentesanchez
Copy link

lvicentesanchez commented Feb 19, 2018

I'm using Kubernetes 1.8.8, deployed using kops 1.8.1, with RBAC enabled.

Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-07T12:22:21Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.8", GitCommit:"2f73858c9e6ede659d6828fe5a1862a48034a0fd", GitTreeState:"clean", BuildDate:"2018-02-09T21:23:25Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64

@lvicentesanchez
Copy link

lvicentesanchez commented Feb 19, 2018

Some additional information. If I remove everything using kubectl delete -f mantifest/influxdb.yaml, the EBS volume is in available status. If then I create it again, I get a timeout while trying to mount the volume but the EBS volume is 'in use'. So... first time, I get an error because of the 'available' status, after that... the volume can't be attached even if it's 'in use' by the target node.

Unable to mount volumes for pod "monitoring-influxdb-0_kube-system(e039f409-15a5-11e8-8142-0a3a1b5232ac)": timeout expired waiting for volumes to attach/mount for pod "kube-system"/"monitoring-influxdb-0". list of unattached/unmounted volumes=[influxdb-persistent-storage]

@omerh
Copy link

omerh commented Dec 26, 2018

I am also having issues with nvme instances, running Kubernetes 1.10.x in this case, I tried using m5.large with pvc.
This is the StatefulSet that reproduce this

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: k8s-rmq
spec:
  serviceName: "k8s-rmq"
  replicas: 1
  selector:
    matchLabels:
      app: k8s-rmq
  template:
    metadata:
      labels:
        app: k8s-rmq
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
    spec:
      nodeSelector:
        kops.k8s.io/instancegroup: nodes
      terminationGracePeriodSeconds: 30
      containers:
      - name: k8s-rmq
        imagePullPolicy: IfNotPresent
        image: rabbitmq:3.7.8-management-alpine
        ports:
        - containerPort: 5672
          name: amqp
        - containerPort: 15672
          name: management
        envFrom:
            - configMapRef:
                name: k8s-dev-aws         
        env:
          - name: RABBITMQ_DEFAULT_USER
            value: example
          - name: RABBITMQ_DEFAULT_PASS
            value: example
        resources:
          limits:
            cpu: "800m"
            memory: "1Gi"
          requests:
            cpu: "100m"
            memory: "128Mi"
        livenessProbe:
          tcpSocket:
            port: 5672
          initialDelaySeconds: 20
          timeoutSeconds: 5
          periodSeconds: 30
          failureThreshold: 2
          successThreshold: 1
        readinessProbe:
          tcpSocket:
            port: 5672
          initialDelaySeconds: 20
          timeoutSeconds: 5
          periodSeconds: 30
          failureThreshold: 2
          successThreshold: 1
        volumeMounts:
        - name: rmqvol
          mountPath: /var/lib/rabbitmq
  volumeClaimTemplates:
  - metadata:
      name: rmqvol
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 20Gi

this is the storage classes:

NAME            PROVISIONER             AGE
default         kubernetes.io/aws-ebs   337d
gp2 (default)   kubernetes.io/aws-ebs   337d

The EBS is created and attached to the instance.

but Kubelet fails to mount the disk into pod

1m          1m           1       k8s-rmq-0.1573f1a3938f660f                                     Pod    \
                                                         Warning   FailedMount               \
     kubelet, ip-172-20-57-150.eu-west-1.compute.internal  \
    Unable to mount volumes for pod "k8s-rmq-0_default(1c45f9a0-0932-11e9-b1e7-0ac8a16a5f0c)": timeout expired waiting for volumes to attach or mount for pod "default"/"k8s-rmq-0". list of unmounted volumes=[rmqvol default-token-wp48g]. list of unattached volumes=[rmqvol default-token-wp48g]

Cluster provisioned using Kops 1.10

kubectl version

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.0", GitCommit:"ddf47ac13c1a9483ea035a79cd7c10005ff21a6d", GitTreeState:"clean", BuildDate:"2018-12-03T21:04:45Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.11", GitCommit:"637c7e288581ee40ab4ca210618a89a555b6e7e9", GitTreeState:"clean", BuildDate:"2018-11-26T14:25:46Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

When checking on m5.large the mounts on the node there is not disk mount on the nvme drive.

when replacing to m4.large the mounts has:
/dev/xvdcu 20G 49M 19G 1% /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-west-1a/vol-0b73b3a1bf15aac39

The node image in Kops is: kope.io/k8s-1.10-debian-jessie-amd64-hvm-ebs-2018-08-17

And on the same note, different use case, when launching a new cluster using Kops and masters are nvme instances like m5.large, the host startup fails to mount the etcd volume and hangs with protokube:1.10.0 in a loop for

I1226 20:23:59.194888    1721 aws_volume.go:320] nvme path not found "/rootfs/dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0fdeaa59d34bd2ab1"

After installing nvme-cli I can see that the volume exists

root@ip-10-101-35-149:~# nvme list
Node             SN                   Model                                    Namespace Usage                      Format           FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1     vol0913974dacc67c490 Amazon Elastic Block Store               1           0.00   B /  68.72  GB    512   B +  0 B   1.0
/dev/nvme1n1     vol0fdeaa59d34bd2ab1 Amazon Elastic Block Store               1           0.00   B /  21.47  GB    512   B +  0 B   1.0
/dev/nvme2n1     vol0a587f2950331bf7b Amazon Elastic Block Store               1           0.00   B /  21.47  GB    512   B +  0 B   1.0

But this /dev/disk/by-id/nvme-Amazon_Elastic_Block_Store_vol0fdeaa59d34bd2ab1 not exists.

the disk mapping /dev/disk/by-id doesn't exists only /dev/disk/by-uuid/

So basically I can not use any nvme base instance for masters or nodes that has a EBS pvc

@chey
Copy link

chey commented Jan 31, 2019

Cloud: AWS
OS: RedHat 7.6
kube version: v1.12.5

Having a similar problem with storage myself. When draining a node that has an EBS volume which is attached to a Pod and/or Deployment, the storage doesn't move. It releases from original node but never makes it to the new/next node.

Like others, switching to t2.xlarge instances fixed this for me.

@gabordk
Copy link

gabordk commented Feb 12, 2019

Same problem here, AWS, k8s version 1.13.0

@brynmathias
Copy link

Same problem as well, AWS, eks. k8s version 1.11.0
See the issue on c5.9xlarge machines and on p2.xlarge instances.

I have a feeling that it might be due to the maximum number of ebs attachments to an ec2 node.
I do not know if the maximum attachment limit is total, or the attachments are properly removed.

@maflaven
Copy link

maflaven commented Apr 24, 2019

I've been seeing a similar issue on EKS with k8s version 1.11. Our support agent suggested the following:


It's very likely that the Kubernetes scheduler was choosing worker nodes in Availability Zones (AZ) with no volumes available. This can happen when the node selected for placement of the pod by the scheduler is not in the availability zone in which the Persistent volume(s) claim are available i.e. EBS volume exists. For example, when there isn't sufficient CPU and/or memory resources available on Nodes in which the persistent volume exists. That would lead to the scheduler choosing a Node in another zone and failing to schedule pod with this error.

This was a known issue in Kubernetes[1][2] and has been fixed by having "VolumeScheduling"[2] feature enabled in scheduler.

Another workaround could include creating the volumes manually and updating the PVCs but both options would turn into a less available/dynamic cluster.

References:
[1] kubernetes/enhancements#490
[2] #34583
[3] https://kubernetes.io/blog/2018/10/11/topology-aware-volume-provisioning-in-kubernetes/

@msau42
Copy link
Member

msau42 commented Apr 24, 2019

/assign @leakingtapan

@gabordk
Copy link

gabordk commented Apr 29, 2019

Unfortunately this has nothing to do with AWS Availability Zones or VolumeScheduling. AZ related problems are kinda hot nowadays, so people like to mix up that problem with this one, but a quick look at the Availability Zones makes clear that there is no connection.

Today's testing results:
Kubernetes: 1.13.0
Instance: m5.4xlarge
EBS: gp2

  • Both the worker node and the EBS volume is in the same AZ (us-east-1c) (VolumeScheduling is enabled in our StorageClass, previously it wasn't, nothing changed).
  • EBS volume is successfully created and mounted on the worker node
  • kubectl describe pv reports the volume as "Bound", events are empty
  • kubectl describe pvc reports as "Bound", events are empty
  • pod events:
    - MountVolume.WaitForAttach failed for volume "pvc-xxxxx" : could not find attached AWS Volume "aws://us-east-1c/vol-xxxxx". Timeout waiting for mount paths to be created
    - Unable to mount volumes for pod "xyz": timeout expired waiting for volumes to attach or mount for pod

@gabordk
Copy link

gabordk commented Apr 29, 2019

After some debugging I found my problem is this, all the symptoms are matching:
coreos/bugs#2371

@hustshawn
Copy link

@gabordk I got the same issue. Any solution or progress on this?

@k8s-ci-robot k8s-ci-robot added area/provider/aws Issues or PRs related to aws provider and removed sig/aws labels Aug 6, 2019
@dfang
Copy link
Contributor

dfang commented Aug 10, 2019

same issue here.

on microk8s, enabled default storage addon,

then install helm-consul got this issue, all pv and pvc are bound. but consul-server pods still got "pod has unbound immediate PersistentVolumeClaims", recreate these pods didn't help

@jhoblitt
Copy link

I believe I'm seeing the same issue on aws w/ eks 1.3.8 (1.3 "eks.2")

  Warning  FailedMount             56s (x11 over 23m)  kubelet, ip-192-168-124-20.ec2.internal  Unable to mount volumes for pod "jenkins-6758665c4c-gg5tl_jenkins(f6440463-ca87-11e9-a31c-0a4da4f89c32)": timeout expired waiting for volumes to attach or mount for pod "jenkins"/"jenkins-6758665c4c-gg5tl". list of unmounted volumes=[jenkins-home]. list of unattached volumes=[plugins tmp jenkins-config plugin-dir secrets-dir jenkins-home sc-config-volume jenkins-token-pn7mq]

@ishantanu
Copy link

@jhoblitt Did you tried with EKS 1.2.x? I faced some issues with statefulsets with PVC on EKS 1.3.x but everything ran just fine on EKS 1.2.x.

@mogaal
Copy link

mogaal commented Sep 4, 2019

@jhoblitt I was facing the same issue until 10 min ago, I realised there is a problem/bug with the kubernetes-plugin I was using. I solved upgrading to 1.18.3 the kubernetes-plugin for Jenkins.

@jhoblitt
Copy link

jhoblitt commented Sep 4, 2019

@ishantanu I don't believe I was seeing this problem with 1.2 but it's been awhile since I've tested with that version.

@mogaal This problem is present outside of pods managed by jenkins.

@s2504s
Copy link

s2504s commented Sep 12, 2019

@mogaal
Thank you!! I've upgraded version of kubernetes-plugin for Jenkins and it works!!!

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 11, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 10, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sudip-moengage
Copy link

/reopen

@k8s-ci-robot
Copy link
Contributor

@sudip-moengage: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/aws Issues or PRs related to aws provider kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
Development

No branches or pull requests