Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: PowerScale Replication: Artifacts are not properly cleaned after deletion #523

Closed
ChristianAtDell opened this issue Oct 27, 2022 · 2 comments
Assignees
Labels
area/csi-powerscale Issue pertains to the CSI Driver for Dell EMC PowerScale area/csm-replication Issue pertains to the CSM Replication module type/bug Something isn't working. This is the default label associated with a bug issue.
Milestone

Comments

@ChristianAtDell
Copy link

Bug Description

Clean up on both Kubernetes clusters and on the PowerScale storage arrays is incomplete when source PVC is deleted. Extensive manual cleanup is required between every PV provision using replication storage classes:

  • RGs persist indefinitely and cannot be easily deleted, because they have finalizers that prevent deletion. kubectl patch operations also cannot remove these finalizers on RGs, so every time a PV is removed for a fresh run, kubectl edit must be run on target RG and the finalizers lines must be removed manually. Retention policy for remote RG is not honored because its finalizers must be manually removed, and there is not a parameter for source RG retention policy-- only for remote.
  • Target PV should be removed when source PV is deleted and retention policy is set to Delete, but this removal is inconsistent. If it is not removed automatically, which occurs often, it also needs to be edited to remove its finalizers or kubectl delete will leave it in TERMINATING state indefinitely.
  • SyncIQ policies on the storage arrays persist indefinitely, even after the RGs that created them are deleted.
  • Sub directories (SyncIQ policy directory, specific PV subdirectory) remain undeleted on the arrays. Occasionally, particularly when the PVs related to them are deleted by manually removing their finalizers, these subdirectories become permanently locked on the array. SSHing into the array to manually delete them as admin returns 'read only filesystem' or 'operation not permitted', resulting in artifacts that cannot be deleted even manually.
  • If RGs, SyncIQ policies, and subdirectories are not properly scrubbed from the array and Kubernetes clusters between runs, subsequent PV provision is error-prone (most frequently no remote PV gets created. Occasionally, target RG will stay at status UNKNOWN even after source RG updates to status SYNCHRONIZED.)

Logs

Storage Class on Source Cluster:

[root@master-1-xzmwDvByBkVAf powerscale]# k get sc isilon-rep-repctl -o yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{},"creationTimestamp":"2022-10-24T21:38:48Z","name":"isilon-rep-repctl","resourceVersion":"10943211","uid":"2ec9796e-ca79-4b4f-b419-65e4a034b7bc"},"mountOptions":["nolock","nfsvers=4.1"],"parameters":{"AccessZone":"System","ClusterName":"N94","IsiPath":"/ifs/data/csi/ccoff/sub/dir2","RootClientEnabled":"false","replication.storage.dell.com/ignoreNamespaces":"false","replication.storage.dell.com/isReplicationEnabled":"true","replication.storage.dell.com/remoteClusterID":"rep-k8s-2",**"replication.storage.dell.com/remotePVRetentionPolicy":"Delete"**,**"replication.storage.dell.com/remoteRGRetentionPolicy":"Delete",**"replication.storage.dell.com/remoteStorageClassName":"isilon-rep-repctl","replication.storage.dell.com/remoteSystem":"PIE-Isilon-X","replication.storage.dell.com/rpo":"Five_Minutes","replication.storage.dell.com/volumeGroupPrefix":"csi-ccoff"},"provisioner":"csi-isilon.dellemc.com","reclaimPolicy":"Delete","volumeBindingMode":"Immediate"}
  creationTimestamp: "2022-10-26T20:47:39Z"
  name: isilon-rep-repctl
  resourceVersion: "11777024"
  uid: d3779c9f-02e0-4e7e-98a6-25281af579f6
mountOptions:
- nolock
- nfsvers=4.1
parameters:
  AccessZone: System
  ClusterName: N94
  IsiPath: /ifs/data/csi/ccoff/sub/dir2
  RootClientEnabled: "false"
  replication.storage.dell.com/ignoreNamespaces: "false"
  replication.storage.dell.com/isReplicationEnabled: "true"
  replication.storage.dell.com/remoteClusterID: rep-k8s-2
  replication.storage.dell.com/remotePVRetentionPolicy: Delete
  replication.storage.dell.com/remoteRGRetentionPolicy: Delete
  replication.storage.dell.com/remoteStorageClassName: isilon-rep-repctl
  replication.storage.dell.com/remoteSystem: PIE-Isilon-X
  replication.storage.dell.com/rpo: Five_Minutes
  replication.storage.dell.com/volumeGroupPrefix: csi-ccoff
provisioner: csi-isilon.dellemc.com
reclaimPolicy: Delete
volumeBindingMode: Immediate

Storage Class on Target Cluster:

[root@master-1-EUvyrU6w5umnT ~]# k get sc isilon-rep-repctl -o yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{},"creationTimestamp":"2022-10-24T21:38:48Z","name":"isilon-rep-repctl","resourceVersion":"31907","uid":"269e4d99-477a-4a83-bfc6-946b61e88b18"},"mountOptions":["nolock","nfsvers=4.1"],"parameters":{"AccessZone":"System","ClusterName":"PIE-Isilon-X","IsiPath":"/ifs/data/csi/ccoff/sub/dir2","RootClientEnabled":"false","replication.storage.dell.com/ignoreNamespaces":"false","replication.storage.dell.com/isReplicationEnabled":"true","replication.storage.dell.com/remoteClusterID":"ccoff-k8s","**replication.storage.dell.com/remotePVRetentionPolicy":"Delete"**,**"replication.storage.dell.com/remoteRGRetentionPolicy":"Delete",**"replication.storage.dell.com/remoteStorageClassName":"isilon-rep-repctl","replication.storage.dell.com/remoteSystem":"N94","replication.storage.dell.com/rpo":"Five_Minutes","replication.storage.dell.com/volumeGroupPrefix":"csi-ccoff"},"provisioner":"csi-isilon.dellemc.com","reclaimPolicy":"Delete","volumeBindingMode":"Immediate"}
  creationTimestamp: "2022-10-26T20:48:15Z"
  name: isilon-rep-repctl
  resourceVersion: "645017"
  uid: 16b34211-4b24-446b-bba7-c161a252f8c0
mountOptions:
- nolock
- nfsvers=4.1
parameters:
  AccessZone: System
  ClusterName: PIE-Isilon-X
  IsiPath: /ifs/data/csi/ccoff/sub/dir2
  RootClientEnabled: "false"
  replication.storage.dell.com/ignoreNamespaces: "false"
  replication.storage.dell.com/isReplicationEnabled: "true"
  replication.storage.dell.com/remoteClusterID: ccoff-k8s
  replication.storage.dell.com/remotePVRetentionPolicy: Delete
  replication.storage.dell.com/remoteRGRetentionPolicy: Delete
  replication.storage.dell.com/remoteStorageClassName: isilon-rep-repctl
  replication.storage.dell.com/remoteSystem: N94
  replication.storage.dell.com/rpo: Five_Minutes
  replication.storage.dell.com/volumeGroupPrefix: csi-ccoff
provisioner: csi-isilon.dellemc.com
reclaimPolicy: Delete
volumeBindingMode: Immediate

Creation and verification of PV/RG on source cluster:

[root@master-1-xzmwDvByBkVAf powerscale]# k apply -f ./tools/kubectl_yamls/pvc-rep.yaml
persistentvolumeclaim/test-pvc created
[root@master-1-xzmwDvByBkVAf powerscale]# k get pv
NAME             CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM              STORAGECLASS        REASON   AGE
k8s-8e782008ed   5Gi        RWO            Delete           Bound    default/test-pvc   isilon-rep-repctl            4m24s
[root@master-1-xzmwDvByBkVAf powerscale]# k get rg
NAME                                      AGE     STATE   PG ID                                                                             LINK STATE     LAST LINKSTATE UPDATE
rg-2e28e5c0-34a5-4799-b679-b0e6eb48c47a   3m24s   Ready   N94::/ifs/data/csi/ccoff/sub/dir2/csi-ccoff-default-<IP>-Five_Minutes/   SYNCHRONIZED   2022-10-27T17:15:46Z

Verification of PV/RG on target cluster:

[root@master-1-EUvyrU6w5umnT ~]# k get rg
NAME                                      AGE    STATE   PG ID                                                                                      LINK STATE     LAST LINKSTATE UPDATE
rg-2e28e5c0-34a5-4799-b679-b0e6eb48c47a   5m3s   Ready   PIE-Isilon-X::/ifs/data/csi/ccoff/sub/dir2/csi-ccoff-default-<IP>-Five_Minutes/   SYNCHRONIZED   2022-10-27T17:17:30Z

Deletion of PVC on source Cluster, verification that RG shows as empty:

[root@master-1-xzmwDvByBkVAf powerscale]# k delete pvc test-pvc
persistentvolumeclaim "test-pvc" deleted
[root@master-1-xzmwDvByBkVAf powerscale]# k get pv
No resources found
[root@master-1-xzmwDvByBkVAf powerscale]# k get rg
NAME                                      AGE     STATE   PG ID                                                                             LINK STATE   LAST LINKSTATE UPDATE
rg-2e28e5c0-34a5-4799-b679-b0e6eb48c47a   8m36s   Ready   N94::/ifs/data/csi/ccoff/sub/dir2/csi-ccoff-default-<IP>-Five_Minutes/   EMPTY        2022-10-27T17:21:43Z

Verification that PV has been deleted on target cluster

[root@master-1-EUvyrU6w5umnT ~]# k get pv
No resources found
[root@master-1-EUvyrU6w5umnT ~]# k get rg
NAME                                      AGE     STATE   PG ID                                                                                      LINK STATE   LAST LINKSTATE UPDATE
rg-2e28e5c0-34a5-4799-b679-b0e6eb48c47a   8m45s   Ready   PIE-Isilon-X::/ifs/data/csi/ccoff/sub/dir2/csi-ccoff-default-<IP>-Five_Minutes/   EMPTY        2022-10-27T17:21:28Z

Deletion of RG on source cluster (this hangs indefinitely)

[root@master-1-xzmwDvByBkVAf powerscale]# k delete rg rg-2e28e5c0-34a5-4799-b679-b0e6eb48c47a
dellcsireplicationgroup.replication.storage.dell.com "rg-2e28e5c0-34a5-4799-b679-b0e6eb48c47a" deleted

Check RG status on target cluster

[root@master-1-EUvyrU6w5umnT ~]# k get rg
NAME                                      AGE   STATE      PG ID                                                                                      LINK STATE   LAST LINKSTATE UPDATE
rg-2e28e5c0-34a5-4799-b679-b0e6eb48c47a   24m   Deleting   PIE-Isilon-X::/ifs/data/csi/ccoff/sub/dir2/csi-ccoff-default-<IP>-Five_Minutes/   EMPTY        2022-10-27T17:37:28Z

At this point, I must use kubectl edit on the target RG to delete its finalizers, or it will never delete. Once its finalizers are deleted, target RG deletes automatically (with no additional input or manual deletion). After target RG deletes successfully, source RG also finishes its deletion, so target RG deletion is holding up source.

SyncIQ policies on the storage arrays are NOT deleted after these events. Both source and target arrays have their SyncIQ policies persist.

Screenshots

No response

Additional Environment Information

No response

Steps to Reproduce

See logs for full steps.

Expected Behavior

SyncIQ policies should be removed when their relevant RG is deleted.
Remote RG should not require manual deletion of finalizers.
In the event that a PV is deleted manually from clusters, its related subdirectory on storage array should never become permanently locked and undeletable to storage admin.
Local RG may want a new retention policy parameter and auto-delete when its LINK STATE becomes EMPTY, though that's more a feature request than a bug.

CSM Driver(s)

CSI Driver for PowerScale v2.4

Installation Type

Helm v3

Container Storage Modules Enabled

Replication v1.3.0

Container Orchestrator

K8s 1.24.3

Operating System

RHEL 8.4

@ChristianAtDell ChristianAtDell added needs-triage Issue requires triage. type/bug Something isn't working. This is the default label associated with a bug issue. labels Oct 27, 2022
@harshaatdell harshaatdell added area/csm-replication Issue pertains to the CSM Replication module area/csi-powerscale Issue pertains to the CSI Driver for Dell EMC PowerScale backlog and removed needs-triage Issue requires triage. labels Oct 27, 2022
@santhoshatdell
Copy link
Contributor

Workaround is to manually delete the RGs/PVs on the k8s clusters, and SyncIQ policies & sub directories on the PowerScale arrays if they are not removed properly.

@ChristianAtDell
Copy link
Author

Issue is closed with the above PRs.

csmbot pushed a commit that referenced this issue Aug 1, 2023
Rollback to the Old Architecture Image for Better Resolution
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/csi-powerscale Issue pertains to the CSI Driver for Dell EMC PowerScale area/csm-replication Issue pertains to the CSM Replication module type/bug Something isn't working. This is the default label associated with a bug issue.
Projects
None yet
Development

No branches or pull requests

4 participants