Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: PVC fails to resize with the message spec.capacity[storage]: Invalid value: "0": must be greater than zero #507

Closed
rajbaratht opened this issue Oct 12, 2022 · 8 comments
Assignees
Labels
area/csi-unity Issue pertains to the CSI Driver for Dell EMC Unity type/bug Something isn't working. This is the default label associated with a bug issue.
Milestone

Comments

@rajbaratht
Copy link

Bug Description

Hello - I had to resize some of the pvc's in our environment and I accomplished that using the kubectl patch command

for i in `kg pvc --no-headers | grep mongo | awk '{ print $1 }'`; do k patch pvc $i --type json -p='[{ "op": "replace", "path": "/spec/resources/requests/storage", "value": "10Gi"}]'; done

Originally the size of the pvc was 3 Gi and we wanted to expand and I used the above command to resize. However, some of the pvc failed resizing with the error message

  Type     Reason              Age                    From                                    Message
  ----     ------              ----                   ----                                    -------
  Normal   Resizing            5m17s (x616 over 15h)  external-resizer csi-unity.dellemc.com  External resizer is resizing volume staging1-p2-c5d9366fd8
  Warning  VolumeResizeFailed  4m26s (x615 over 15h)  external-resizer csi-unity.dellemc.com  updating capacity of PV "staging1-p2-xxxxx" to 0 failed: update capacity of PV staging1-p2-xxxxxxx failed: PersistentVolume "staging1-p2-xxxxxx" is invalid: spec.capacity[storage]: Invalid value: "0": must be greater than zero

 kd pvc/mongod-data-percona-mongo-psmdb-d-cfg-1
Name:          mongod-data-percona-mongo-psmdb-d-cfg-1
Namespace:     data
StorageClass:  unity-iscsi
Status:        Bound
Volume:        staging1-p2-c5d9366fd8
Labels:        app.kubernetes.io/component=cfg
               app.kubernetes.io/instance=percona-mongo-psmdb-d
               app.kubernetes.io/managed-by=percona-server-mongodb-operator
               app.kubernetes.io/name=percona-server-mongodb
               app.kubernetes.io/part-of=percona-server-mongodb
               app.kubernetes.io/replset=cfg
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: csi-unity.dellemc.com
               volume.kubernetes.io/selected-node: s-cont-wkr3
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      3Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       percona-mongo-psmdb-d-cfg-1
Conditions:
  Type       Status  LastProbeTime                     LastTransitionTime                Reason  Message
  ----       ------  -----------------                 ------------------                ------  -------
  Resizing   True    Mon, 01 Jan 0001 00:00:00 +0000   Wed, 12 Oct 2022 09:25:46 -0400
Events:
  Type     Reason              Age                    From                                    Message
  ----     ------              ----                   ----                                    -------
  Normal   Resizing            7m43s (x623 over 15h)  external-resizer csi-unity.dellemc.com  External resizer is resizing volume staging1-p2-c5d9366fd8
  Warning  VolumeResizeFailed  6m53s (x621 over 15h)  external-resizer csi-unity.dellemc.com  updating capacity of PV "staging1-p2-c5d9366fd8" to 0 failed: update capacity of PV staging1-p2-c5d9366fd8 failed: PersistentVolume "staging1-p2-c5d9366fd8" is invalid: spec.capacity[storage]: Invalid value: "0": must be greater than zero

Now it seems to be stuck in resizing

kg pvc/mongod-data-percona-mongo-psmdb-d-cfg-1 -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: csi-unity.dellemc.com
    volume.kubernetes.io/selected-node: s-cont-wkr3
  creationTimestamp: "2022-06-24T00:43:46Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app.kubernetes.io/component: cfg
    app.kubernetes.io/instance: percona-mongo-psmdb-d
    app.kubernetes.io/managed-by: percona-server-mongodb-operator
    app.kubernetes.io/name: percona-server-mongodb
    app.kubernetes.io/part-of: percona-server-mongodb
    app.kubernetes.io/replset: cfg
  name: mongod-data-percona-mongo-psmdb-d-cfg-1
  namespace: data
  resourceVersion: "254271009"
  uid: c5d9366f-d829-49f8-bf9a-37c8e3225891
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: unity-iscsi
  volumeMode: Filesystem
  volumeName: staging1-p2-c5d9366fd8
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 3Gi
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-10-12T13:25:46Z"
    status: "True"
    type: Resizing
  phase: Bound

Logs

time="2022-10-12T13:21:10Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REP 0625: Capabilities=[rpc:<>  rpc:<type:STAGE_UNSTAGE_VOLUME >  rpc:<type:EXPAND_VOLUME >  rpc:<type:SINGLE_NODE_MULTI_WRITER > ], XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:22:43Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REQ 0626: XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:22:43Z" level=info  runid=626 msg="Executing NodeGetCapabilities with args: {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}" func="github.com/dell/csi-unity/service.(*service).NodeGetCapabilities()" file="/go/src/csi-unity/service/node.go:854"
time="2022-10-12T13:22:43Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REP 0626: Capabilities=[rpc:<>  rpc:<type:STAGE_UNSTAGE_VOLUME >  rpc:<type:EXPAND_VOLUME >  rpc:<type:SINGLE_NODE_MULTI_WRITER > ], XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:24:30Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REQ 0627: XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:24:30Z" level=info  runid=627 msg="Executing NodeGetCapabilities with args: {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}" func="github.com/dell/csi-unity/service.(*service).NodeGetCapabilities()" file="/go/src/csi-unity/service/node.go:854"
time="2022-10-12T13:24:30Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REP 0627: Capabilities=[rpc:<>  rpc:<type:STAGE_UNSTAGE_VOLUME >  rpc:<type:EXPAND_VOLUME >  rpc:<type:SINGLE_NODE_MULTI_WRITER > ], XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:25:59Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REQ 0628: XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:25:59Z" level=info  runid=628 msg="Executing NodeGetCapabilities with args: {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}" func="github.com/dell/csi-unity/service.(*service).NodeGetCapabilities()" file="/go/src/csi-unity/service/node.go:854"
time="2022-10-12T13:25:59Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REP 0628: Capabilities=[rpc:<>  rpc:<type:STAGE_UNSTAGE_VOLUME >  rpc:<type:EXPAND_VOLUME >  rpc:<type:SINGLE_NODE_MULTI_WRITER > ], XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:26:15Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REP 3107: Capabilities=[rpc:<>  rpc:<type:STAGE_UNSTAGE_VOLUME >  rpc:<type:EXPAND_VOLUME >  rpc:<type:SINGLE_NODE_MULTI_WRITER > ], XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:26:35Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REQ 3108: XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:26:35Z" level=info  runid=3108 msg="Executing NodeGetCapabilities with args: {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}" func="github.com/dell/csi-unity/service.(*service).NodeGetCapabilities()" file="/go/src/csi-unity/service/node.go:854"
time="2022-10-12T13:26:35Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REP 3108: Capabilities=[rpc:<>  rpc:<type:STAGE_UNSTAGE_VOLUME >  rpc:<type:EXPAND_VOLUME >  rpc:<type:SINGLE_NODE_MULTI_WRITER > ], XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:26:53Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REQ 3109: XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:26:53Z" level=info  runid=3109 msg="Executing NodeGetCapabilities with args: {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}" func="github.com/dell/csi-unity/service.(*service).NodeGetCapabilities()" file="/go/src/csi-unity/service/node.go:854"
time="2022-10-12T13:26:53Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REP 3109: Capabilities=[rpc:<>  rpc:<type:STAGE_UNSTAGE_VOLUME >  rpc:<type:EXPAND_VOLUME >  rpc:<type:SINGLE_NODE_MULTI_WRITER > ], XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:26:58Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REQ 3110: XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:26:58Z" level=info  runid=3110 msg="Executing NodeGetCapabilities with args: {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}" func="github.com/dell/csi-unity/service.(*service).NodeGetCapabilities()" file="/go/src/csi-unity/service/node.go:854"
time="2022-10-12T13:26:58Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REP 3110: Capabilities=[rpc:<>  rpc:<type:STAGE_UNSTAGE_VOLUME >  rpc:<type:EXPAND_VOLUME >  rpc:<type:SINGLE_NODE_MULTI_WRITER > ], XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:25:40Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REP 2483: Capabilities=[rpc:<>  rpc:<type:STAGE_UNSTAGE_VOLUME >  rpc:<type:EXPAND_VOLUME >  rpc:<type:SINGLE_NODE_MULTI_WRITER > ], XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:25:45Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REQ 2484: XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:25:45Z" level=info  runid=2484 msg="Executing NodeGetCapabilities with args: {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}" func="github.com/dell/csi-unity/service.(*service).NodeGetCapabilities()" file="/go/src/csi-unity/service/node.go:854"
time="2022-10-12T13:25:45Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REP 2484: Capabilities=[rpc:<>  rpc:<type:STAGE_UNSTAGE_VOLUME >  rpc:<type:EXPAND_VOLUME >  rpc:<type:SINGLE_NODE_MULTI_WRITER > ], XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:25:56Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REQ 2485: XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:25:56Z" level=info  runid=2485 msg="Executing NodeGetCapabilities with args: {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}" func="github.com/dell/csi-unity/service.(*service).NodeGetCapabilities()" file="/go/src/csi-unity/service/node.go:854"
time="2022-10-12T13:25:56Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REP 2485: Capabilities=[rpc:<>  rpc:<type:STAGE_UNSTAGE_VOLUME >  rpc:<type:EXPAND_VOLUME >  rpc:<type:SINGLE_NODE_MULTI_WRITER > ], XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:26:54Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REQ 2486: XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:26:54Z" level=info  runid=2486 msg="Executing NodeGetCapabilities with args: {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}" func="github.com/dell/csi-unity/service.(*service).NodeGetCapabilities()" file="/go/src/csi-unity/service/node.go:854"
time="2022-10-12T13:26:54Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REP 2486: Capabilities=[rpc:<>  rpc:<type:STAGE_UNSTAGE_VOLUME >  rpc:<type:EXPAND_VOLUME >  rpc:<type:SINGLE_NODE_MULTI_WRITER > ], XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:26:48Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REP 7481: Capabilities=[rpc:<>  rpc:<type:STAGE_UNSTAGE_VOLUME >  rpc:<type:EXPAND_VOLUME >  rpc:<type:SINGLE_NODE_MULTI_WRITER > ], XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:26:48Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REQ 7482: XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:26:48Z" level=info  runid=7482 msg="Executing NodeGetCapabilities with args: {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}" func="github.com/dell/csi-unity/service.(*service).NodeGetCapabilities()" file="/go/src/csi-unity/service/node.go:854"
time="2022-10-12T13:26:48Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REP 7482: Capabilities=[rpc:<>  rpc:<type:STAGE_UNSTAGE_VOLUME >  rpc:<type:EXPAND_VOLUME >  rpc:<type:SINGLE_NODE_MULTI_WRITER > ], XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:26:51Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REQ 7483: XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:26:51Z" level=info  runid=7483 msg="Executing NodeGetCapabilities with args: {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}" func="github.com/dell/csi-unity/service.(*service).NodeGetCapabilities()" file="/go/src/csi-unity/service/node.go:854"
time="2022-10-12T13:26:51Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REP 7483: Capabilities=[rpc:<>  rpc:<type:STAGE_UNSTAGE_VOLUME >  rpc:<type:EXPAND_VOLUME >  rpc:<type:SINGLE_NODE_MULTI_WRITER > ], XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:26:58Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REQ 7484: XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-10-12T13:26:58Z" level=info  runid=7484 msg="Executing NodeGetCapabilities with args: {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}" func="github.com/dell/csi-unity/service.(*service).NodeGetCapabilities()" file="/go/src/csi-unity/service/node.go:854"
time="2022-10-12T13:26:58Z" level=info msg="/csi.v1.Node/NodeGetCapabilities: REP 7484: Capabilities=[rpc:<>  rpc:<type:STAGE_UNSTAGE_VOLUME >  rpc:<type:EXPAND_VOLUME >  rpc:<type:SINGLE_NODE_MULTI_WRITER > ], XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"

Screenshots

image

Additional Environment Information

If you see the below output some of the pvc has resized successfully but few failed and stuck in resizing

kg pvc| grep mongo
mongod-data-percona-mongo-psmdb-d-cfg-0       Bound    staging1-p2-7471dc86ce   10Gi       RWO            unity-iscsi    110d
mongod-data-percona-mongo-psmdb-d-cfg-1       Bound    staging1-p2-c5d9366fd8   3Gi        RWO            unity-iscsi    110d
mongod-data-percona-mongo-psmdb-d-cfg-2       Bound    staging1-p2-3e71a48265   3Gi        RWO            unity-iscsi    110d
mongod-data-percona-mongo-psmdb-d-rs0-0       Bound    staging1-p2-2aa2bbcf46   10Gi       RWO            unity-iscsi    110d
mongod-data-percona-mongo-psmdb-d-rs0-1       Bound    staging1-p2-222b5a90a0   10Gi       RWO            unity-iscsi    110d
mongod-data-percona-mongo-psmdb-d-rs0-2       Bound    staging1-p2-b72461b7f3   3Gi        RWO            unity-iscsi    110d
mongod-data-percona-mongo-psmdb-d-rs1-0       Bound    staging1-p2-bf52ecf372   3Gi        RWO            unity-iscsi    110d
mongod-data-percona-mongo-psmdb-d-rs1-1       Bound    staging1-p2-9e4d434e26   3Gi        RWO            unity-iscsi    110d
mongod-data-percona-mongo-psmdb-d-rs1-2       Bound    staging1-p2-0fc8e738e1   3Gi        RWO            unity-iscsi    110d

Steps to Reproduce

  1. Take a bunch of pvc's
  2. Patch the pvc's to increase the size using a command like so for i in `kg pvc --no-headers | grep mongo | awk '{ print $1 }'`; do k patch pvc $i --type json -p='[{ "op": "replace", "path": "/spec/resources/requests/storage", "value": "10Gi"}]'; done
  3. check if all the pvc's are resized successfully
  4. Check the events for the pvc's that failed to resize

Expected Behavior

The pvc's should resize without any errors

CSM Driver(s)

CSI Driver for Dell Unity XT v2.2
CSI Driver for Dell Unity XT v2.4

Installation Type

Helm

Container Storage Modules Enabled

No response

Container Orchestrator

Kubernetes v1.22.10

Operating System

RockyLinux 8.6

@rajbaratht rajbaratht added needs-triage Issue requires triage. type/bug Something isn't working. This is the default label associated with a bug issue. labels Oct 12, 2022
@rajendraindukuri rajendraindukuri added area/csi-unity Issue pertains to the CSI Driver for Dell EMC Unity and removed needs-triage Issue requires triage. labels Oct 12, 2022
@rajendraindukuri rajendraindukuri self-assigned this Oct 12, 2022
@coulof
Copy link
Collaborator

coulof commented Oct 13, 2022

@dell team, please refer to the conversation here : https://dellemccsm.slack.com/archives/C025E763URH/p1665582178007659
It seems that the LUN was already the correct size but not reflected in the PVC/PV size.

@rajendraindukuri
Copy link
Collaborator

Thanks for the info @coulof
@rajbaratht we observed similar behavior in one of our automation scenarios where we are retrying resizing after some timeout. In this case what happened is since unity API calls are asynchronous, by the time retry is triggered, volume got expanded in the backend and updated pv as needed. We saw this behavior on array having so many parallel operations running (due to slowness of the array)

In your case, are you having any retry logic after a timeout and seeing this behavior? Are you seeing this consistently ?
Can you please give a screenshot of the unity modify jobs and the time they are taking to complete and the timeout(if any) after which you are retrying the operations. We will try to replicate the same.

Thanks
Rajendra

@rajbaratht
Copy link
Author

@rajendraindukuri What do mean by retry login ? I just resized a bunch of pvc's and while few got resized successfully most of the other pvc's resize failed with the error. We can observer this behaviour across few different clusters.

image

image

image

@rajendraindukuri
Copy link
Collaborator

@rajbaratht Just wanted to confirm if you are having any 'retry' logic (if you are using a script to run your operations).

From the screenshots above it looks like modify LUN operations are failing on Unity Array. Can you please double click any of the failed "modify LUN" jobs and send that screenshot to understand at what step modify LUN operation is failing ?

Thanks
Rajendra

@gashof
Copy link

gashof commented Oct 14, 2022

For internal tracking KRV-8768

@gashof
Copy link

gashof commented Oct 17, 2022

Hi Raj,
I just wanted to check with you to see if you saw my email for a call on Wednesday.  Let me know.

@rajendraindukuri rajendraindukuri added this to the v1.5.0 milestone Oct 20, 2022
@rajendraindukuri
Copy link
Collaborator

Hi @rajbaratht

Based on the log details and behavior in our environment , it does not seem to be issue with driver code.

So raised a issue with resizer repo kubernetes-csi/external-resizer#226

I saw other similar issues on the resizer repo in open state. We will have to wait for their response and take it forward based on their response.

Thanks
Rajendra

@rajendraindukuri
Copy link
Collaborator

Hi @rajbaratht Fix for this is provided as part of nightly and the same will be available in the upcoming csi unity driver version v2.6.0. Closing this for now. Please get back to us if you see any issues in your testing. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/csi-unity Issue pertains to the CSI Driver for Dell EMC Unity type/bug Something isn't working. This is the default label associated with a bug issue.
Projects
None yet
Development

No branches or pull requests

6 participants