Empty objects vs. null objects cause repeated syncs #18213

pgier · 2024-05-14T12:50:58Z

Summary

When a config option is not set in the a kube manifest (e.g. statefulset) ArgoCD continually tries to re-sync the object because it sees a diff between an empty and undefined object.

Motivation

I have a statefulset that does not define the resources section. ArgoCD continually tries to sync the object because Kubernetes automatically injects an empty object.

Proposal

The diff calculator should ignore the difference between an empty and null object in a kubernetes resource field.

Is this with Server Side Diff on or off? I would imagine looking at https://argo-cd.readthedocs.io/en/stable/user-guide/diff-strategies/#server-side-diff may help with scenarios like this. I don't believe there's plans to make the default Server Side Diff until a 3.x release but this may be something you want to take a look at...

evanrich · 2024-05-15T20:04:35Z

Is this with Server Side Diff on or off? I would imagine looking at https://argo-cd.readthedocs.io/en/stable/user-guide/diff-strategies/#server-side-diff may help with scenarios like this. I don't believe there's plans to make the default Server Side Diff until a 3.x release but this may be something you want to take a look at...

i just tried enabling that and it didn't seem to make a difference on my environment

evanrich · 2024-05-15T20:41:03Z

FWIW, this works for me for the "resources: {}" diff

      ignoreDifferences:
      - group: apps
        kind: Deployment
        jqPathExpressions:
        - '.spec.template.spec.containers[].resources'
      - group: apps
        kind: StatefulSet
        jqPathExpressions:
        - '.spec.template.spec.containers[].resources'

pgier · 2024-05-22T15:43:32Z

Is this with Server Side Diff on or off?

I haven't set any options for server side diff in the application or the main configmap, so it should be off.
On the application I only have these sync options set.

        syncOptions:
        - CreateNamespace=true
        - ServerSideApply=true

@evanrich Thanks, I was hoping I wouldn't have to ignore resources for the cases where there are real differences, but I can probably use that for now.

pgier · 2024-05-22T17:20:47Z

This is very strange, I have multiple clusters with basically the same configuration.
AWS EKS (same region)
Kube version: 1.28.8
ArgoCD version: v2.10.7+b060053

Same application (external-secrets) deployed to both clusters.
One of the clusters shows a diff on the empty vs. null resources and continually tries to resync.
The other cluster shows the same difference if you look at the live manifest vs. desired manifest, but does not display anything on the diff tab and does not try to re-sync.

I verified that on the cluster which is not showing a diff and therefore not syncing has resources: {} in the live manifest and nothing defined for resources in the desired manifest. So for some reason this cluster is successfully ignoring that difference even though I didn't configure any ignoreDifferences.

So same application running in the same version of Argo and Kubernetes seems to sometimes ignore this difference and sometimes not ignore it.

batazor · 2024-05-24T02:22:18Z

Noticed this problem with version 2.11 quite a lot

Adding this to ignoreDifferences doesn't seem like a good idea to me

javydekoning · 2024-05-25T18:00:12Z

This has suddenly emerged in my k3s cluster as well.

Kubernetes version: v1.30.0+k3s1
Argo: v2.11.2

pgier · 2024-06-07T14:56:44Z

Similar issue: #13004

pgier · 2024-06-07T16:23:12Z

Just wanted to note that I also tried enabling server side diff, but it didn't have any effect on this issue.

evanrich · 2024-06-07T18:06:05Z

I haven't seen any updates in the release notes for recent releases that indicate this has been changed, so I'm staying on helm chart version 6.7.18 (argo v2.10.9) hopefully this gets addressed/commented on soon because it's not just empty objects but the a ton of other resources as well are suddenly throwing sync errors in latest versions.

vertisan · 2024-06-07T18:08:57Z

Same problem here.
Kubernetes: k3s 1.30.0+k3s1
ArgoCD: v2.11.2+25f7504

Like others, I have used ignoreDifferences as a workaround for now, but the fix will be very useful :)
If someone doesn't want to list all kinds, they can use *:

ignoreDifferences:
  - group: apps
    kind: "*"
    jqPathExpressions:
      - ".spec.template.spec.containers[].resources"
  - group: apps
    kind: "*"
    jqPathExpressions:
      - ".spec.template.spec.initContainers[].resources"

pgier · 2024-06-12T18:11:04Z

Comparing a working vs. non-working cluster, I found that the gvkParser is nil on the non-working cluster (the one where the empty resources is detected as a difference).
Looks like the gvkParser is not created due to some invalid openapi schema in the cluster. Unfortunately the error message was not printing correctly so it took a while to track this down.
argoproj/gitops-engine#585

With some changes to the code I see this error message in the logs on the cluster that's not working:

"error creating gvk parser: duplicate entry for /v1, Kind=APIResourceList" server="https://kubernetes.default.svc"

See also: argoproj/gitops-engine#425

This seems to be a known issue in the GVKParser: kubernetes/kubernetes#103597

pgier · 2024-06-13T14:26:53Z

The issue in my case turned out to be caused by an older version of the kubernetes metrics server installed in the EKS cluster. First I updated updated the application controller using a locally built container (see argoproj/gitops-engine#585) to display the parser error.

duplicate entry for /v1, Kind=APIResourceList"

This duplication comes from the kube api server having multiple versions of the APIResourceList definition, which can be seen in the raw openapi schema.

kubectl get --raw /openapi/v2 | jq -r '.definitions | keys | .[]' | sort | grep io.k8s.apimachinery.pkg.apis.meta.v1

Then I searched for where APIResourceList_v2 is referenced.

kubectl get --raw /openapi/v2 | jq | grep "io.k8s.apimachinery.pkg.apis.meta.v1.APIResourceList_v2" -B 25

The only place it was referenced was the metrics server api endpoint /apis/metrics.k8s.io/v1beta1/.
After applying a recent metrics server config, the issue seems to be resolved.

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.7.1/components.yaml

After the metrics update removed APIResourceList_v2 the issue went away and Argo gives a clean diff. Even though the cluster has some other definitions with a _v2, those don't cause a problem for the parser, probably because they are pruned (kubernetes/kubernetes#110179).
I don't know if this is the same cause for other people that are having this issue, but hopefully these steps can help to debug this.

Edit: also wanted to note that after applying the newer metrics server it takes a few minutes before the openapi schema is updated, and then you'll also need to restart the argocd-application-controller pod.

crenshaw-dev · 2024-06-13T14:33:48Z

Interesting. I've seen similar duplications in k8s 1.30.0, so I'll be diving into the problem more. Glad you were able to sort it in your instance!

javydekoning · 2024-06-13T14:42:59Z

Interesting. I've seen similar duplications in k8s 1.30.0, so I'll be diving into the problem more. Glad you were able to sort it in your instance!

Same here, fresh k3s deployment ships with docker.io/rancher/mirrored-metrics-server:v0.7.0 and still faces this issue.

pschichtel · 2024-06-14T00:26:23Z

I'm seeing the same since (I think) I upgraded from k8s 1.29.* to 1.30.1. Most of my helm charts render resources: {} when not specifying any, so it only affects two charts for me: kube-prometheus-stack and mariadb-operator.
I can also confirm: server side diff doesn't change anything.

pschichtel · 2024-06-25T17:16:40Z

This disappeared for me with the upgrade to 1.30.2

crenshaw-dev · 2024-06-25T17:17:11Z

@pschichtel interesting... was there a change in version of metrics-server? (Or of any other aggregated API?)

pschichtel · 2024-06-25T19:18:38Z

@crenshaw-dev metrics-server was updated as part of the 1.29.* -> 1.30.1 upgrade, but it hasn't changed when upgrading to 1.30.2

crenshaw-dev · 2024-06-25T19:28:41Z

That's strange... are you able to port-forward your metrics-server service and dump the /openapi/v2 output from that service?

pschichtel · 2024-06-25T19:37:23Z

not sure that's helpful

curl -k https://localhost:10250/openapi/v2
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/openapi/v2\"",
  "reason": "Forbidden",
  "details": {},
  "code": 403
}

pschichtel · 2024-06-25T20:08:46Z

I just checked argo again and the issue actually came back, sorry for the confusion!

crenshaw-dev · 2024-06-25T21:13:24Z

Okay that makes more sense, I'm not seeing any relevant changes in k8s or metrics-server.

I spent a bunch of time today trying to de-dupe OpenAPI models before feeding them to GVKParser. It avoids the NewGVKParser error, but apparently some models (like DeploymentSpec are disappearing from the GVKParser's schema), so I'm getting other failures during diff construction.

Working on it though.

crenshaw-dev · 2024-06-26T22:55:37Z

I'm working on the fix and have written up a description of the issue here: https://github.com/argoproj/gitops-engine/pull/587/files#diff-bbeadede3a86f27922253617b66c49b7672bb52896a10cf5626b398fa0fa9280

I'm trying to decide between just logging the dropped duplicate GVKs or also surfacing to the user that their diff might be wrong due to duplicate GVKs. So far I'm trying to surface it to the user, but might end up being more code than it's worth.

A couple open questions:

Should aggregated APIs like metrics-server even include definitions for things like APIResourceList in their OpenAPI doc? afaik, that type isn't directly used in any of the API's user-facing functions. Maybe it's for Kubernetes' benefit, doing discovery?
Would constructing proto.Models from an OpenAPI v3 doc make these problems go away? I haven't tested, because I'm not sure how to construct a full OpenAPI v3 doc from the per-type docs currently served by k8s. Even if I did construct that doc, I'm not confident that it wouldn't hit the same duplicate GVK issue when constructing the GVKParser.

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* fix(controller): bad server-side diffs (#18213) Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * use upstream commit Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> --------- Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

andrii-korotkov-verkada · 2024-06-30T15:36:10Z

The issue seems to still be there for Rollout, analysisRunMetadata field (see the screenshot).

I've build a custom image from latest master (+1 cherry pick), ArgoCD pods were updated, even Redis ones.
Update: I'd wait for a bit in case it's a stale UI issue.

andrii-korotkov-verkada · 2024-06-30T15:53:33Z

The app eventually got in sync.

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* fix(controller): bad server-side diffs (#18213) (2.10) Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * fix revision Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * hopefully the right hash now Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> --------- Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* fix(controller): bad server-side diffs (#18213) (2.11) Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * hopefully the right hash now Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> --------- Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* fix(controller): bad server-side diffs (#18213) (2.9) Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * fix(controller): bad server-side diffs (#18213) (2.9) Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> * hopefully the right hash now Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com> --------- Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

pgier added the enhancement New feature or request label May 14, 2024

pgier mentioned this issue Jun 12, 2024

fix: printing gvkparser error message argoproj/gitops-engine#585

Merged

pgier closed this as completed Jun 13, 2024

crenshaw-dev reopened this Jun 25, 2024

alexmt added bug/enhancement component:core Syncing, diffing, cluster state cache type:bug labels Jun 25, 2024

crenshaw-dev mentioned this issue Jun 26, 2024

Upon upgrading from EKS 1.29 to 1.30 argocd is marking assigned nodeports as out of sync #18764

Open

crenshaw-dev added a commit to crenshaw-dev/argo-cd that referenced this issue Jun 27, 2024

fix(controller): bad server-side diffs (argoproj#18213)

500b28f

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

crenshaw-dev mentioned this issue Jun 27, 2024

fix(controller): bad server-side diffs (#18213) #18840

Merged

ishitasequeira closed this as completed in #18840 Jun 29, 2024

crenshaw-dev added a commit to crenshaw-dev/argo-cd that referenced this issue Jul 1, 2024

fix(controller): bad server-side diffs (argoproj#18213) (2.11)

8377820

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

crenshaw-dev added a commit to crenshaw-dev/argo-cd that referenced this issue Jul 1, 2024

fix(controller): bad server-side diffs (argoproj#18213) (2.10)

44ed6fa

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

crenshaw-dev added a commit to crenshaw-dev/argo-cd that referenced this issue Jul 1, 2024

fix(controller): bad server-side diffs (argoproj#18213) (2.9)

063c891

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

crenshaw-dev added a commit to crenshaw-dev/argo-cd that referenced this issue Jul 1, 2024

fix(controller): bad server-side diffs (argoproj#18213) (2.9)

7a878f5

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty objects vs. null objects cause repeated syncs #18213

Empty objects vs. null objects cause repeated syncs #18213

pgier commented May 14, 2024

blakeromano commented May 15, 2024

evanrich commented May 15, 2024

evanrich commented May 15, 2024 •

edited

Loading

pgier commented May 22, 2024

pgier commented May 22, 2024

batazor commented May 24, 2024

javydekoning commented May 25, 2024

pgier commented Jun 7, 2024

pgier commented Jun 7, 2024

evanrich commented Jun 7, 2024

vertisan commented Jun 7, 2024

pgier commented Jun 12, 2024 •

edited

Loading

pgier commented Jun 13, 2024 •

edited

Loading

crenshaw-dev commented Jun 13, 2024

javydekoning commented Jun 13, 2024

pschichtel commented Jun 14, 2024 •

edited

Loading

pschichtel commented Jun 25, 2024

crenshaw-dev commented Jun 25, 2024 •

edited

Loading

pschichtel commented Jun 25, 2024

crenshaw-dev commented Jun 25, 2024

pschichtel commented Jun 25, 2024

pschichtel commented Jun 25, 2024

crenshaw-dev commented Jun 25, 2024

crenshaw-dev commented Jun 26, 2024

andrii-korotkov-verkada commented Jun 30, 2024 •

edited

Loading

andrii-korotkov-verkada commented Jun 30, 2024

Empty objects vs. null objects cause repeated syncs #18213

Empty objects vs. null objects cause repeated syncs #18213

Comments

pgier commented May 14, 2024

Summary

Motivation

Proposal

Related

blakeromano commented May 15, 2024

evanrich commented May 15, 2024

evanrich commented May 15, 2024 • edited Loading

pgier commented May 22, 2024

pgier commented May 22, 2024

batazor commented May 24, 2024

javydekoning commented May 25, 2024

pgier commented Jun 7, 2024

pgier commented Jun 7, 2024

evanrich commented Jun 7, 2024

vertisan commented Jun 7, 2024

pgier commented Jun 12, 2024 • edited Loading

pgier commented Jun 13, 2024 • edited Loading

crenshaw-dev commented Jun 13, 2024

javydekoning commented Jun 13, 2024

pschichtel commented Jun 14, 2024 • edited Loading

pschichtel commented Jun 25, 2024

crenshaw-dev commented Jun 25, 2024 • edited Loading

pschichtel commented Jun 25, 2024

crenshaw-dev commented Jun 25, 2024

pschichtel commented Jun 25, 2024

pschichtel commented Jun 25, 2024

crenshaw-dev commented Jun 25, 2024

crenshaw-dev commented Jun 26, 2024

andrii-korotkov-verkada commented Jun 30, 2024 • edited Loading

andrii-korotkov-verkada commented Jun 30, 2024

evanrich commented May 15, 2024 •

edited

Loading

pgier commented Jun 12, 2024 •

edited

Loading

pgier commented Jun 13, 2024 •

edited

Loading

pschichtel commented Jun 14, 2024 •

edited

Loading

crenshaw-dev commented Jun 25, 2024 •

edited

Loading

andrii-korotkov-verkada commented Jun 30, 2024 •

edited

Loading