Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application sync stuck when using PVC with WaitForFirstConsumer storage binding mode #12840

Open
3 tasks done
yohancourbe opened this issue Mar 13, 2023 · 3 comments
Open
3 tasks done
Labels
bug Something isn't working

Comments

@yohancourbe
Copy link

yohancourbe commented Mar 13, 2023

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

When deploying an app that contains a PVC to a storage class with volumeBindingMode set to WaitForFirstConsumer, the application sync is stuck due to a deadlock: the pod won't be created by Argo until the PVC is ready, and the PVC waits for a pod to use the claim before being ready.

See https://kubernetes.io/docs/concepts/storage/storage-classes/#volume-binding-mode

To Reproduce

StorageClass

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: gp3-csi
provisioner: ebs.csi.aws.com
parameters:
  encrypted: 'true'
  type: gp3
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

Expected behavior

ArgoCD should handle this deadlock by continuing with pod creation when the PVC is in state WaitForFirstConsumer

Screenshots

n/a

Version

{
    "Version": "v2.6.1+3f143c9",
    "BuildDate": "2023-02-08T18:51:05Z",
    "GitCommit": "3f143c9307f99a61bf7049a2b1c7194699a7c21b",
    "GitTreeState": "clean",
    "GoVersion": "go1.18.10",
    "Compiler": "gc",
    "Platform": "linux/amd64",
    "KustomizeVersion": "v4.5.7 2022-08-02T16:35:54Z",
    "HelmVersion": "v3.10.3+g835b733",
    "KubectlVersion": "v0.24.2",
    "JsonnetVersion": "v0.19.1"
}

Logs

image

@yohancourbe yohancourbe added the bug Something isn't working label Mar 13, 2023
@yohancourbe yohancourbe changed the title Application sync stuck when using WaitForFirstConsumer storage binding mode Application sync stuck when using PVC with WaitForFirstConsumer storage binding mode Mar 13, 2023
@yohancourbe
Copy link
Author

A partial workaround is to set the storage binding mode to Immediate but as stated in the doc

The Immediate mode indicates that volume binding and dynamic provisioning occurs once the PersistentVolumeClaim is created. For storage backends that are topology-constrained and not globally accessible from all Nodes in the cluster, PersistentVolumes will be bound or provisioned without knowledge of the Pod's scheduling requirements. This may result in unschedulable Pods.

@apa64
Copy link

apa64 commented Jun 27, 2023

Is there a workaround on ArgoCD side? "Delayed/Relaxed wait for Ready"?

@gnunn1
Copy link

gnunn1 commented Jun 30, 2023

You can customize the health check for PersistentVolumeClaim so that if the state is Pending instead of Bound it's considered Healthy instead.

  resource.customizations: |
    PersistentVolumeClaim:
      health.lua: |
        hs = {}
        if obj.status ~= nil then
          if obj.status.phase ~= nil then
            if obj.status.phase == "Pending" then
              hs.status = "Healthy"
              hs.message = obj.status.phase
              return hs
            end
            if obj.status.phase == "Bound" then
              hs.status = "Healthy"
              hs.message = obj.status.phase
              return hs
            end
          end
        end
        hs.status = "Progressing"
        hs.message = "Waiting for certificate"
        return hs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants