Cache inconsistency of child resources #4053

jessesuen · 2020-08-05T21:07:28Z

On one Argo CD instance (v1.6.2-f282a33), a pod was deleted as a result of a Argo CD Rollout Restart action. The pod was part of a Rollout's ReplicaSet. Even though the pod truly disappeared from kubernetes, it still remained visible in Argo CD.

The inconsistent state remained for roughly ~24 hours, which is our default cache invalidation period. After which the state was corrected and the pod disappeared from the UI.

jdfalk · 2020-08-14T00:04:59Z

I think the offending line is this:
https://github.com/argoproj/gitops-engine/blob/master/pkg/cache/cluster.go#L29

This problem causes everything that is set to auto sync to continuously sync over and over. This is not a bug that should be put off, and it needs to be addressed before v1.7 is launched.

We manage 10 clusters with each of our argocd instances, and due to the numerous bugs in the 1.6.2 release (see: extra lines added in manifests generated in the ui if you select autosync, inability to process helm hooks properly, resources going unknown / missing and not syncing, and a host of other issues) we attempted to use the latest release on our test clusters. Due to this cache nonsense it's been attempting to reapply the deployments to all clusters repeatedly. Eventually after 24 hours it updates correctly. I would consider this a critical bug as it destroys the entire purpose of gitops and trust in the status of any object in the clusters as returned via argocd.

alexmt · 2020-08-14T03:45:14Z

Agree. This issue used to happen once every few months, something has changed and now we are seeing it much more often. To mitigate the problem we've added ARGOCD_CLUSTER_CACHE_RESYNC_DURATION env variable. This allows reducing cluster force refresh period (to e.g. 1hr ARGOCD_CLUSTER_CACHE_RESYNC_DURATION=1hr)

jdfalk · 2020-08-20T15:34:22Z

Which application did you add that env var to? Application-controller I assume?

victorboissiere · 2020-08-24T19:35:57Z

We see the same kind of issues with v1.6.2 and applications get usually stuck. We have about 1200-1400 applications.

jessesuen added the bug Something isn't working label Aug 5, 2020

jessesuen added this to the v1.8 milestone Aug 5, 2020

alexmt mentioned this issue Aug 27, 2020

refactor: cache inconsistency of child resources #4181

Closed

alexmt added a commit to alexmt/argo-cd that referenced this issue Aug 29, 2020

fix: cache inconsistency of child resources (argoproj#4053)

22844dd

alexmt mentioned this issue Aug 29, 2020

fix: cache inconsistency of child resources (#4053) #4202

Merged

alexmt closed this as completed in #4202 Aug 29, 2020

alexmt pushed a commit that referenced this issue Aug 29, 2020

fix: cache inconsistency of child resources (#4053) (#4202)

dec73c7

alexmt pushed a commit that referenced this issue Aug 31, 2020

fix: cache inconsistency of child resources (#4053) (#4202)

8a7fa9d

This was referenced Sep 1, 2020

Still display OutOfSync in the UI when the changes are applied in argocd #4166

Closed

Argo Healthcheck stuck in "Progressing" for CRDs such as Istio #4220

Closed

marksugar mentioned this issue Jun 25, 2021

argocd progressing state "forever" #5620

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache inconsistency of child resources #4053

Cache inconsistency of child resources #4053

jessesuen commented Aug 5, 2020

jdfalk commented Aug 14, 2020

alexmt commented Aug 14, 2020

jdfalk commented Aug 20, 2020

victorboissiere commented Aug 24, 2020

Cache inconsistency of child resources #4053

Cache inconsistency of child resources #4053

Comments

jessesuen commented Aug 5, 2020

jdfalk commented Aug 14, 2020

alexmt commented Aug 14, 2020

jdfalk commented Aug 20, 2020

victorboissiere commented Aug 24, 2020