Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argocd progressing state "forever" #5620

Open
KlavsKlavsen opened this issue Feb 26, 2021 · 21 comments · May be fixed by #11901
Open

argocd progressing state "forever" #5620

KlavsKlavsen opened this issue Feb 26, 2021 · 21 comments · May be fixed by #11901
Labels
bug Something isn't working

Comments

@KlavsKlavsen
Copy link

If you are trying to resolve an environment-specific issue or have a one-off question about the edge case that does not require a feature then please consider asking a question in argocd slack channel.

Checklist:

  • [ x] I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • [ x] I've included steps to reproduce the bug.
  • [ x] I've pasted the output of argocd version.

Describe the bug

After having installed kube-prometheus-stack helm chart via argocd (on a small microk8s cluster) - it runs perfectly fine.
I had first set wrong storageclass name - so pvc was hanging.. I fixed that and had to manually delete the hanging PVC (I thought argocd would have fixed that? or atleast marked it as an issue.. I didn't see any issue about that in argocd UI.
All instances are now running and in k9s I see the pods fine and all healthy and responding.
argocd UI says "Synced" - but Health state check keeps hanging for hours (has not finished yet) in "progressing"..

Where do I see details on this? I see no errors in logs from argocd-server nor from application-controller.. Shouldn't argocd react on a progress check hanging/spinning for so long? or maybe its a UI problem showing wrong status? (reloading does not change status though).

To Reproduce

install kube-prometheus-stack chart - v13.13.0 with promtheus pvc enabled and wrong classname - and then correct that classname once installation fails.. (I guess)

Expected behavior

ArgoCD should have showed some sort of failure on the "health check" - and details on whats up.. instead of never timing out.

Screenshots

if you want a screenshot of the Health spinnner - say so. :)
The other applications I have in same argocd instance are all showing Green health just fine and argocd runs fine it seems.

Version

argocd: v1.8.4+28aea3d
  BuildDate: 2021-02-05T17:54:42Z
  GitCommit: 28aea3dfdede00443b52cc584814d80e8f896200
  GitTreeState: clean
  GoVersion: go1.14.12
  Compiler: gc
  Platform: linux/amd64
FATA[0000] Argo CD server address unspecified

Logs

Nothing seems to be relevant.. I see no errors.

@KlavsKlavsen KlavsKlavsen added the bug Something isn't working label Feb 26, 2021
@KlavsKlavsen
Copy link
Author

KlavsKlavsen commented Feb 26, 2021

I just rolled out the same to a new cluster in aws.. with autosync disabled.. and it is spinning on sync status 'unknown'.. and spinning is "confusing".. is it doing something or not.. and it actually shows an error.. will try and see if it behaves better here after I press sync.

@KlavsKlavsen
Copy link
Author

as soon as I fixed the error (missing helm repo in argocd repos) - it switched to Missing and unsynced as it should.
So on slow (microk8s - local) clusters - it seems to not handle timeouts very well.. even though the cluster is responding just fine.. and its not every recovering from this state (argocd).

@aslafy-z
Copy link
Contributor

aslafy-z commented May 26, 2021

I can see kind of the same behavior with v2.0.1+33eaf11 where kube-prometheus-stack stays in synced state few seconds then re-sync. Health do the same and is almost constantly in the Progressing state. kube-prometheus-stack itself is well deployed and works nicely.

@marksugar
Copy link

my environment is manual sync. But occasionally there will be ComparisonError rpc error: code = Unknown desc = ssh: handshake failed: read tcp 10.244.1.188:33662->GITLAB:22: read: connection reset by peer. This seems to be a problem with my GITLAB network.

there is no problem with kustomize configuration. But APP HEALTH is always Progressing
I see that the configuration of ARGOCD_CLUSTER_CACHE_RESYNC_DURATION=1hr in #4053 (comment) is one hour automatic synchronization status?

@brsolomon-deloitte
Copy link
Contributor

Seeing a very simple Kibana app stuck in 'Progressing' on ArgoCD v2.1.3+d855831.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: kibana
spec:
  destination:
    name: ''
    namespace: elasticsearch
    server: 'https://kubernetes.default.svc'
  source:
    path: ''
    repoURL: 'https://charts.bitnami.com/bitnami'
    targetRevision: 9.0.8
    chart: kibana
    helm:
      parameters:
        - name: elasticsearch.security.auth.enabled
          value: 'true'
        - name: elasticsearch.security.auth.kibanaPassword
          value: changeme
        - name: elasticsearch.security.tls.verificationMode
          value: none
  project: default
  syncPolicy:
    syncOptions:
      - CreateNamespace=true

@javens0601
Copy link

I'm having the same problem, is there a way to get this stuttering status through the api?

@vicky4u65
Copy link

facing the same issue with github runner , everything inside the cluster seems running & health , argocd ui also says healthy & syced but pod status keeps on saying progressing ,
arocd v2.4.11

@marjun786
Copy link

i have same issue is there any solution?

@Peter-Barrett
Copy link

Same here, a solution would be great.

@ilyassikai
Copy link

ilyassikai commented Nov 15, 2022

Same here.
Is there any solution?

@pbtrudel
Copy link

Same here, with Kiali Operator and CR, with latest version of argocd v2.5.2+148d8da.

1 similar comment
@mihstaub
Copy link

mihstaub commented Dec 8, 2022

Same here, with Kiali Operator and CR, with latest version of argocd v2.5.2+148d8da.

@pseymournutanix
Copy link

fwiw I am seeing the same issue with a Prometheus setup with v2.5.4

@pseymournutanix
Copy link

This is also happening on the live demo site :) https://cd.apps.argoproj.io/applications/prometheus-operator?operation=false&resource=kind%3APrometheus

@pseymournutanix
Copy link

pseymournutanix commented Dec 16, 2022

Shows from the app get CLI:-

monitoring.coreos.com      Prometheus                monitoring   k8s                                        Synced   Progressing

The status of the resource in the Application object shows:-

  - group: monitoring.coreos.com
    health:
      message: Waiting for initialization
      status: Progressing
    kind: Prometheus
    name: k8s
    namespace: monitoring
    status: Synced
    version: v1

@dennbagas
Copy link

dennbagas commented Dec 19, 2022

Same issue here with Prometheus Operator v0.52.0 using ArgoCD v2.5.4+86b2dde

Edit: couple minutes after sync, the state became Unknown and the error says this error calculating structured merge diff: error calculating diff: error while running updater.Apply: converting (v1.Ingress) to (v1beta1.Ingress): unknown conversion
image

@jhanbo
Copy link

jhanbo commented Jan 5, 2023

In our case we were using kube-prometheus-stack helm chart version 30.2.0 where status sub-resource is not enabled in Prometheus CRD
https://github.com/prometheus-community/helm-charts/blob/f9140a1a9f929964e96e62818368d2ae9f54b1ab/charts/kube-prometheus-stack/crds/crd-prometheuses.yaml#L8301

Argocd uses status section to check whether the resource is healthy of Prometheus custom resource

https://github.com/argoproj/argo-cd/blob/master/resource_customizations/monitoring.coreos.com/Prometheus/health.lua

Upgrading kube-prometheus-stack helm chart to version 43.2.1 where the status sub-resource is enabled has resolved this issue

https://github.com/prometheus-community/helm-charts/blob/00901c5fd431052239b35ead6659a9c083e71b3f/charts/kube-prometheus-stack/crds/crd-prometheuses.yaml#L8865

@aslafy-z
Copy link
Contributor

aslafy-z commented Jan 5, 2023

I've seen the same issue and fixed it that way: #11782.
I had to maintain old versions for a specific use case and will be able to upgrade soon.

@EsDmitrii
Copy link

EsDmitrii commented May 31, 2023

the same for me in argo v2.4.10+2ccc17a
app health still in "progressing" for a long time, but all is well, available and working

@lindhe
Copy link
Contributor

lindhe commented Aug 31, 2023

I'm also stuck with this, for a very simple dummy application.

Argo CD version: 2.8.0.

Here's the app object:

Progressing application

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"argoproj.io/v1alpha1","kind":"Application","metadata":{"annotations":{},"finalizers":["resources-finalizer.argocd.argoproj.io"],"name":"app","namespace":"foo"},"spec":{"destination":{"name":"my-cluster","namespace":"foo"},"project":"foo","source":{"path":"manifests/app","repoURL":"git@git.example.com:foo/bar.git","targetRevision":"main"},"syncPolicy":{"automated":{"prune":true,"selfHeal":true}}}}
  creationTimestamp: "2023-08-30T13:50:14Z"
  finalizers:
  - resources-finalizer.argocd.argoproj.io
  generation: 412
  name: app
  namespace: foo
  resourceVersion: "61647168"
  uid: 4cdab6b5-926a-4c87-baec-a9623c2e8928
spec:
  destination:
    name: my-cluster
    namespace: foo
  project: foo
  source:
    path: manifests/app
    repoURL: git@git.example.com:foo/bar.git
    targetRevision: main
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
status:
  controllerNamespace: argocd
  health:
    status: Progressing
  history:
  - deployStartedAt: "2023-08-30T16:35:17Z"
    deployedAt: "2023-08-30T16:35:17Z"
    id: 0
    revision: 3f2e52c5ae1d9fa94d470d915e2556c94f3bff94
    source:
      path: manifests/app
      repoURL: git@git.example.com:foo/bar.git
      targetRevision: main
  operationState:
    finishedAt: "2023-08-30T16:35:17Z"
    message: successfully synced (all tasks run)
    operation:
      initiatedBy:
        automated: true
      retry:
        limit: 5
      sync:
        prune: true
        revision: 3f2e52c5ae1d9fa94d470d915e2556c94f3bff94
    phase: Succeeded
    startedAt: "2023-08-30T16:35:17Z"
    syncResult:
      resources:
      - group: ""
        hookPhase: Running
        kind: PersistentVolumeClaim
        message: persistentvolumeclaim/dbdisk-pvc configured
        name: dbdisk-pvc
        namespace: foo
        status: Synced
        syncPhase: Sync
        version: v1
      - group: ""
        hookPhase: Running
        kind: Service
        message: service/mariadb configured
        name: mariadb
        namespace: foo
        status: Synced
        syncPhase: Sync
        version: v1
      - group: ""
        hookPhase: Running
        kind: Service
        message: service/webserver configured
        name: webserver
        namespace: foo
        status: Synced
        syncPhase: Sync
        version: v1
      - group: ""
        hookPhase: Running
        kind: Pod
        message: pod/dbadmin configured
        name: dbadmin
        namespace: foo
        status: Synced
        syncPhase: Sync
        version: v1
      - group: ""
        hookPhase: Running
        kind: Pod
        message: pod/webserver configured
        name: webserver
        namespace: foo
        status: Synced
        syncPhase: Sync
        version: v1
      - group: ""
        hookPhase: Running
        kind: Pod
        message: pod/db1 configured
        name: db1
        namespace: foo
        status: Synced
        syncPhase: Sync
        version: v1
      - group: networking.k8s.io
        hookPhase: Running
        kind: Ingress
        message: ingress.networking.k8s.io/foo configured
        name: foo
        namespace: foo
        status: Synced
        syncPhase: Sync
        version: v1
      revision: 3f2e52c5ae1d9fa94d470d915e2556c94f3bff94
      source:
        path: manifests/app
        repoURL: git@git.example.com:foo/bar.git
        targetRevision: main
  reconciledAt: "2023-08-31T09:20:34Z"
  resources:
  - health:
      status: Healthy
    kind: PersistentVolumeClaim
    name: dbdisk-pvc
    namespace: foo
    status: Synced
    version: v1
  - health:
      status: Progressing
    kind: Pod
    name: db1
    namespace: foo
    status: Synced
    version: v1
  - health:
      status: Healthy
    kind: Pod
    name: dbadmin
    namespace: foo
    status: Synced
    version: v1
  - health:
      status: Healthy
    kind: Pod
    name: webserver
    namespace: foo
    status: Synced
    version: v1
  - health:
      status: Healthy
    kind: Service
    name: mariadb
    namespace: foo
    status: Synced
    version: v1
  - health:
      status: Healthy
    kind: Service
    name: webserver
    namespace: foo
    status: Synced
    version: v1
  - group: networking.k8s.io
    health:
      status: Healthy
    kind: Ingress
    name: foo
    namespace: foo
    status: Synced
    version: v1
  sourceType: Directory
  summary:
    externalURLs:
    - https://foo.example.com/
    images:
    - gillos/pytest1
    - mariadb
    - phpmyadmin
  sync:
    comparedTo:
      destination:
        name: my-cluster
        namespace: foo
      source:
        path: manifests/app
        repoURL: git@git.example.com:foo/bar.git
        targetRevision: main
    revision: d83d871617efcb68f499ee5abe7763da0c9758bd
    status: Synced

Here's the pod:

Pod

apiVersion: v1
kind: Pod
metadata:
  annotations:
    argocd.argoproj.io/tracking-id: foo_app:/Pod:foo/db1
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{"argocd.argoproj.io/tracking-id":"foo_app:/Pod:foo/db1"},"creationTimestamp":null,"labels":{"app":"mariadb"},"name":"db1","namespace":"foo"},"spec":{"containers":[{"env":[{"name":"MYSQL_ROOT_PASSWORD","value":"hej123"}],"image":"mariadb","name":"db1","resources":{},"volumeMounts":[{"mountPath":"/var/lib/mysql","name":"storage-volume"}]}],"dnsPolicy":"ClusterFirst","restartPolicy":"Never","volumes":[{"name":"storage-volume","persistentVolumeClaim":{"claimName":"dbdisk-pvc"}}]}}
  creationTimestamp: "2023-08-30T14:19:50Z"
  labels:
    app: mariadb
  name: db1
  namespace: foo
  resourceVersion: "61172805"
  uid: 6d09c766-fab5-4a78-a0b5-cc8c134de418
spec:
  containers:
    image: mariadb
    imagePullPolicy: Always
    name: db1
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/mysql
      name: storage-volume
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-d9zp7
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: rancher-node026
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: storage-volume
    persistentVolumeClaim:
      claimName: dbdisk-pvc
  - name: kube-api-access-d9zp7
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-08-30T14:19:50Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-08-30T14:20:13Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2023-08-30T14:20:13Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2023-08-30T14:19:50Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://4de82d4577345858cc851877dbd9db050aa944d372ea1e181eecb65dc81fc795
    image: docker.io/library/mariadb:latest
    imageID: docker.io/library/mariadb@sha256:a104070983c2a9ab542d6142de858457dd15d2cabd7ac26e4ca3891d7721e73e
    lastState: {}
    name: db1
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2023-08-30T14:20:13Z"
  hostIP: 93.184.216.34
  phase: Running
  podIP: 10.42.10.70
  podIPs:
  - ip: 10.42.10.70
  qosClass: BestEffort
  startTime: "2023-08-30T14:19:50Z"

But I see nothing wrong with the pod in question.

Is there anything I can probe in order to debug this further?

image

I've opened a new issue for my very particular case: #15317

@CajuCLC
Copy link

CajuCLC commented Mar 14, 2024

For those that installed Traefik via Helm, you can update your values.yml:
Find (or add):

additionalArguments

Add below:

- --providers.kubernetesingress.ingressendpoint.publishedservice=traefik/traefik

Here is what should look like:

additionalArguments:
 - --providers.kubernetesingress.ingressendpoint.publishedservice=traefik/traefik

Then upgrade the helm (this command might be different for you)

helm upgrade traefik traefik/traefik --values=traefik-values.yml -n traefik

App Health should become healthy within seconds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.