Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pv is stuck Terminating due to race condition when csi-attacher removes finalizer and csi-provisioner tries to remove another finalizer #1217

Open
andyzhangx opened this issue May 22, 2024 · 0 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@andyzhangx
Copy link
Member

andyzhangx commented May 22, 2024

What happened:
pv is stuck Terminating due to race condition when csi-attacher removes finalizer and csi-provisioner tries to remove another finalizer

  • symptom
    pv is stuck Terminating when pvc is deleted with HonorPVReclaimPolicy feature gate enabled.

  • process
    csi-attacher would remove the finalizer(e.g. external-attacher/disk-csi-azure-com) when pv is detached, and later on csi-provisioner the would try to remove the external-provisioner.volume.kubernetes.io/finalizer finalizer when pvc is deleted and since pv object is in the cache of provisioner , the finalizer deletion always fail until maximum 6 retries exceeds, and finally leaves the pv in Terminating state forever (the underlying storage is deleted before finalizer remove fails).

csi-attacher-disk	E0510 10:18:09.499513       1 csi_handler.go:701] Failed to remove finalizer from PV "pvc-b1c64ae1-6310-4a6c-aa44-12c80c9981a0": PersistentVolume "pvc-b1c64ae1-6310-4a6c-aa44-12c80c9981a0" is invalid: metadata.finalizers: Forbidden: no new finalizers can be added if the object is being deleted, found new finalizers []string{"kubernetes.io/pv-protection"}

csi-attacher-disk	I0510 10:18:09.510077       1 csi_handler.go:706] Removed finalizer from PV "pvc-b1c64ae1-6310-4a6c-aa44-12c80c9981a0"

csi-provisioner-disk	I0510 10:18:09.466810       1 controller.go:1517] delete "pvc-b1c64ae1-6310-4a6c-aa44-12c80c9981a0": volume deleted

csi-azuredisk-controller	I0510 10:18:09.466386       1 azure_managedDiskController.go:325] azureDisk - deleted a managed disk: /subscriptions/xxx/resourceGroups/icto-1019_npi-lcm-cn-lcm-npi-cluster-01-nodes/providers/Microsoft.Compute/disks/pvc-b1c64ae1-6310-4a6c-aa44-12c80c9981a0

csi-provisioner-disk	I0510 10:18:09.489676       1 controller.go:1554] delete "pvc-b1c64ae1-6310-4a6c-aa44-12c80c9981a0": failed to remove finalizer for persistentvolume: Operation cannot be fulfilled on persistentvolumes "pvc-b1c64ae1-6310-4a6c-aa44-12c80c9981a0": the object has been modified; please apply your changes to the latest version and try again

csi-provisioner-disk	W0510 10:18:09.489714       1 controller.go:989] Retrying syncing volume "pvc-b1c64ae1-6310-4a6c-aa44-12c80c9981a0", failure 6

csi-provisioner-disk	E0510 10:18:09.489752       1 controller.go:1007] error syncing volume "pvc-b1c64ae1-6310-4a6c-aa44-12c80c9981a0": Operation cannot be fulfilled on persistentvolumes "pvc-b1c64ae1-6310-4a6c-aa44-12c80c9981a0": the object has been modified; please apply your changes to the latest version and try again
  • workaround
    remove all finalizers from the pv and then delete pv manually
kubectl patch pv NAME -p '{"metadata":{"finalizers":null}}'

/kind bug
cc @jsafrane

What you expected to happen:

How to reproduce it:

Anything else we need to know?:

Environment:

  • Driver version: v4.0.0
  • Kubernetes version (use kubectl version): 1.27
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants