Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.6.1 Multi Source App - Application status / Sync / Refresh #12379

Closed
3 tasks done
carslen opened this issue Feb 9, 2023 · 33 comments · Fixed by #12576
Closed
3 tasks done

2.6.1 Multi Source App - Application status / Sync / Refresh #12379

carslen opened this issue Feb 9, 2023 · 33 comments · Fixed by #12576
Labels
bug Something isn't working multi-source-apps Bugs or enhancements related to multi-source Applications.

Comments

@carslen
Copy link

carslen commented Feb 9, 2023

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

Application status of multi source Apps in WebUI and ArgoCD CLI doesn't match with live system/deployed resources.

To Reproduce

  • Create a multi source Argo App. I created this one for testing purpose with sync policy manual.
  • kubectl apply -f demo-multi-source.yaml
  • The app is listed as OutOfSync in WebUI and CLI
$ argocd app list
NAME             CLUSTER                         NAMESPACE  PROJECT  STATUS     HEALTH   SYNCPOLICY  CONDITIONS  REPO  PATH  
TARGET
argocd/maria-db  https://kubernetes.default.svc  maria-db   default  OutOfSync  Missing  <none>      <none>
  • Sync the app (WebUI or CLI). The App will remain OutOfSync in WebUI and CLI, but the resources will created in background:
$ argocd app list
NAME             CLUSTER                         NAMESPACE  PROJECT  STATUS     HEALTH   SYNCPOLICY  CONDITIONS  REPO  PATH  TARGET
argocd/maria-db  https://kubernetes.default.svc  maria-db   default  OutOfSync  Missing  <none>      <none>
[~/kind]$ argocd app sync maria-db
TIMESTAMP                  GROUP        KIND       NAMESPACE                  NAME    STATUS    HEALTH        HOOK  MESSAGE
2023-02-09T18:23:27+01:00          ConfigMap        maria-db      maria-db-mariadb  OutOfSync  Missing
2023-02-09T18:23:27+01:00             Secret        maria-db      maria-db-mariadb  OutOfSync  Missing
2023-02-09T18:23:27+01:00            Service        maria-db      maria-db-mariadb  OutOfSync  Missing
2023-02-09T18:23:27+01:00         ServiceAccount    maria-db      maria-db-mariadb  OutOfSync  Missing
2023-02-09T18:23:27+01:00   apps  StatefulSet       maria-db      maria-db-mariadb  OutOfSync  Missing
2023-02-09T18:23:30+01:00          Namespace                          maria-db   Running   Synced              namespace/maria-db created
2023-02-09T18:23:30+01:00   apps  StatefulSet       maria-db      maria-db-mariadb  OutOfSync  Missing              statefulset.apps/maria-db-mariadb created
2023-02-09T18:23:30+01:00         ServiceAccount    maria-db      maria-db-mariadb  OutOfSync  Missing              serviceaccount/maria-db-mariadb created
2023-02-09T18:23:30+01:00             Secret        maria-db      maria-db-mariadb  OutOfSync  Missing              secret/maria-db-mariadb created
2023-02-09T18:23:30+01:00          ConfigMap        maria-db      maria-db-mariadb  OutOfSync  Missing              configmap/maria-db-mariadb created
2023-02-09T18:23:30+01:00            Service        maria-db      maria-db-mariadb  OutOfSync  Missing              service/maria-db-mariadb created
[~/kind]$ argocd app list
NAME             CLUSTER                         NAMESPACE  PROJECT  STATUS     HEALTH   SYNCPOLICY  CONDITIONS  REPO  PATH  TARGET
argocd/maria-db  https://kubernetes.default.svc  maria-db   default  OutOfSync  Missing  <none>      <none>
[~/kind]$ k -n maria-db get all
NAME                     READY   STATUS    RESTARTS   AGE
pod/maria-db-mariadb-0   2/2     Running   0          81s

NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
service/maria-db-mariadb   ClusterIP   10.96.213.213   <none>        3306/TCP,9104/TCP   81s

NAME                                READY   AGE
statefulset.apps/maria-db-mariadb   1/1     81s
  • After refreshing the app, WebUI and CLI the created resources appear immediately and the app is in sync again with live system.
Bildschirmaufnahme.2023-02-09.um.18.30.58.mov

If the multi-source app is configured for auto sync, the resources gets created immediately, but the Argo App will remain OutOfSync. If the values.yaml changes, an App sync isn't sufficient, refresh is required. If a resource is out of sync and need pruning, then it get even worse, because then a combination of app sync + prune followed by refreshing the app is required.

Expected behavior

App Sync should update the App and update the App status.

Version

argocd: v2.5.6+9db2c94.dirty
  BuildDate: 2023-01-11T01:02:03Z
  GitCommit: 9db2c9471f6ff599c3f630b446e940d3a065620b
  GitTreeState: dirty
  GoVersion: go1.19.4
  Compiler: gc
  Platform: darwin/arm64
argocd-server: v2.6.1+3f143c9
  BuildDate: 2023-02-08T18:51:05Z
  GitCommit: 3f143c9307f99a61bf7049a2b1c7194699a7c21b
  GitTreeState: clean
  GoVersion: go1.18.10
  Compiler: gc
  Platform: linux/arm64
  Kustomize Version: v4.5.7 2022-08-02T16:35:54Z
  Helm Version: v3.10.3+g835b733
  Kubectl Version: v0.24.2
  Jsonnet Version: v0.19.1
@carslen carslen added the bug Something isn't working label Feb 9, 2023
@pdeva
Copy link

pdeva commented Feb 9, 2023

we are seeing this issue too

@jurgen-weber-deltatre
Copy link

Yeah, I am experiencing some of the same issues. Not exactly the same, but the same sort of idea. What I experienced:

  • When first deployed, yes. Resources were on the cluster and looked good but the ArgoCD application was saying out of sync. After a bit of button bashing of the refresh and sync buttons it all of a sudden came good and went green/in-sync.

  • Making a values file change was not picked up, I made the change and waited 20 or minutes. After that I hit refresh, nothing. Then a hard refresh nothing. I then came back to it 5 minutes later and it had picked up the change.

  • My test includes a rollout, the rollouts worked fine in 2.5. You could see the rollout start and the analysis run, etc. With the status of the application changing with the rollout. None of this happens now and then in the rollouts UI you can see it has attempted and failed (why the run is failing is unrelated) and needs to be solved separately.

  • If I then attempt to sync again to get the rollout to go again, nothing happens. IT says it sync'd, no new rollout attempt and nothing.

@khorn7sk
Copy link

khorn7sk commented Feb 10, 2023

We also seeing this issue. As I can see application controller does not refresh(every 3 min) status of application with multi path. I mean, I didn't see any logs messages like this Refreshing app status (controller refresh requested), level (0).... Controller just ignored them.

@mulhotavares
Copy link

I wonder if the order you define them would fix this?
I mean, you reference the 2nd repo from the 1st one. Can you try flipping the order to see if it helps? Define the values repo first, and reference it from the second repo...

@khorn7sk
Copy link

@mulhotavares I'm tried both variant, the same result

@mulhotavares
Copy link

Yeah, was able to reproduce it here too.
The change is correctly applied, but the app shows as out of sync, because apparently it's not automatically refreshing the live manifests.

@keithchong keithchong added the multi-source-apps Bugs or enhancements related to multi-source Applications. label Feb 11, 2023
@keithchong
Copy link
Contributor

FYI @crenshaw-dev , @ishitasequeira . I saw other issues related to sync/refresh, eg. #11772

@zenitraM
Copy link

I think this may be a dupe of #12301.

@tomjohnburton
Copy link

Seeing this too

@crenshaw-dev
Copy link
Collaborator

Thanks for everyone's patience on this. Since it's not a super destructive bug, I've been focused on other stuff. I'm going to pop open a debugger today and see what I can figure out.

@crenshaw-dev
Copy link
Collaborator

In another of what I know must feel like painfully slow and sparse updates: debugger is open. :-)

@jurgenweber
Copy link

You're the best, I can feel your pain. :)

@crenshaw-dev
Copy link
Collaborator

FOUND IT. Dodgy else if problem. PR coming. :-)

crenshaw-dev added a commit to crenshaw-dev/argo-cd that referenced this issue Feb 22, 2023
…oproj#12379)

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
@mulhotavares
Copy link

Wonder if this is related, but I found a similar issue on argo 2.5 after adding a ignoreDifferences: section to my Application... In argo 2.5, if you have an application that is always OutOfSync, and you add ignoreDifferences to your application YAML, clicking on Sync will not make the OutOfSync status go away. You'll need to refresh the app after syncing to get rid of it...
@crenshaw-dev maybe related?

@lukma99
Copy link

lukma99 commented Feb 23, 2023

Also, I get "Unable to load data, error getting application" when viewing the application details of a multi-source-app. Could this be a part of this issue? The application itself is in a healthy state and all resources are present.

argo

@crenshaw-dev
Copy link
Collaborator

@mulhotavares if related, it's loosely related. By my reading, the problems directly caused by this bug should be limited to only multi-source applications.

@lukma99 poooossibly. I'll try to reproduce it in the master branch.

@mulhotavares
Copy link

@mulhotavares if related, it's loosely related. By my reading, the problems directly caused by this bug should be limited to only multi-source applications.

@lukma99 poooossibly. I'll try to reproduce it in the master branch.

Thanks @crenshaw-dev. I was just trying to help as I guessed it could be part of the same code path.

@crenshaw-dev
Copy link
Collaborator

@mulhotavares do you know if there's another issue open for that problem? I think you're right, the fix is very close in the code path to the fix for this issue.

@mulhotavares
Copy link

mulhotavares commented Feb 23, 2023

@mulhotavares do you know if there's another issue open for that problem? I think you're right, the fix is very close in the code path to the fix for this issue.

This seems to be related:

#9678

The author says it applies and it still shows as out-of-sync, as if it the unrelated field was ignored. But it could be that the manifests are just not being refreshed?

@crenshaw-dev
Copy link
Collaborator

@mulhotavares yikes, a lot's going on in that issue. Would you mind opening a new issue with only the problem that you know can be solved with refresh? I think I know how to fix it. Once that's done, we can link to the original issue and see if it solves those various problems.

@mulhotavares
Copy link

@mulhotavares yikes, a lot's going on in that issue. Would you mind opening a new issue with only the problem that you know can be solved with refresh? I think I know how to fix it. Once that's done, we can link to the original issue and see if it solves those various problems.

Sure. Let me create one.

crenshaw-dev added a commit that referenced this issue Feb 24, 2023
) (#12576)

* fix: evaluate all possible refresh reasons for multi-source apps (#12379)

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* remove redundant parentheses

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* tests

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* don't auto-sync, it makes tests flaky

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* auto-sync because sync CLI doesn't work for multi-source apps

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* don't require out-of-sync - app may sync quickly

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* timeout 60

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* fix timeout

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

---------

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
gcp-cherry-pick-bot bot pushed a commit that referenced this issue Feb 24, 2023
) (#12576)

* fix: evaluate all possible refresh reasons for multi-source apps (#12379)

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* remove redundant parentheses

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* tests

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* don't auto-sync, it makes tests flaky

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* auto-sync because sync CLI doesn't work for multi-source apps

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* don't require out-of-sync - app may sync quickly

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* timeout 60

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* fix timeout

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

---------

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
crenshaw-dev added a commit that referenced this issue Feb 24, 2023
) (#12609)

* fix: evaluate all possible refresh reasons for multi-source apps (#12379)



* remove redundant parentheses



* tests



* don't auto-sync, it makes tests flaky



* auto-sync because sync CLI doesn't work for multi-source apps



* don't require out-of-sync - app may sync quickly



* timeout 60



* fix timeout



---------

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Co-authored-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
@crenshaw-dev
Copy link
Collaborator

I plan to release the fix Monday. :-)

@mulhotavares
Copy link

Amazing! Thanks a lot @crenshaw-dev!
What is the fix version for this? 2.6.3?

@crenshaw-dev
Copy link
Collaborator

Yep, it'll be 2.6.3

@mulhotavares
Copy link

@crenshaw-dev As promised, I created this issue:
#12610

@khorn7sk
Copy link

khorn7sk commented Feb 28, 2023

FYI: looks like now we have some infinite loop

time="2023-02-28T11:30:48Z" level=info msg="Refreshing app status (controller refresh requested), level (0)" application=kube-argo/dev-service-name
time="2023-02-28T11:30:48Z" level=info msg="No status changes. Skipping patch" application=kube-argo/dev-service-name
time="2023-02-28T11:30:48Z" level=info msg="Reconciliation completed" application=kube-argo/dev-service-name dest-name= dest-namespace=dev dest-server="https://kubernetes.default.svc" fields.level=0 time_ms=5
time="2023-02-28T11:30:48Z" level=info msg="Refreshing app status (controller refresh requested), level (0)" application=kube-argo/dev-service-name
time="2023-02-28T11:30:48Z" level=info msg="No status changes. Skipping patch" application=kube-argo/dev-service-name
time="2023-02-28T11:30:48Z" level=info msg="Reconciliation completed" application=kube-argo/dev-service-name dest-name= dest-namespace=dev dest-server="https://kubernetes.default.svc" fields.level=0 time_ms=5
time="2023-02-28T11:30:49Z" level=info msg="Refreshing app status (controller refresh requested), level (0)" application=kube-argo/dev-service-name
time="2023-02-28T11:30:49Z" level=info msg="No status changes. Skipping patch" application=kube-argo/dev-service-name
time="2023-02-28T11:30:49Z" level=info msg="Reconciliation completed" application=kube-argo/dev-service-name dest-name= dest-namespace=dev dest-server="https://kubernetes.default.svc" fields.level=0 time_ms=5
time="2023-02-28T11:30:50Z" level=info msg="Refreshing app status (controller refresh requested), level (0)" application=kube-argo/dev-service-name
time="2023-02-28T11:30:50Z" level=info msg="No status changes. Skipping patch" application=kube-argo/dev-service-name
time="2023-02-28T11:30:50Z" level=info msg="Reconciliation completed" application=kube-argo/dev-service-name dest-name= dest-namespace=dev dest-server="https://kubernetes.default.svc" fields.level=0 time_ms=5
time="2023-02-28T11:30:50Z" level=info msg="Refreshing app status (controller refresh requested), level (0)" application=kube-argo/dev-service-name
time="2023-02-28T11:30:50Z" level=info msg="No status changes. Skipping patch" application=kube-argo/dev-service-name
time="2023-02-28T11:30:50Z" level=info msg="Reconciliation completed" application=kube-argo/dev-service-name dest-name= dest-namespace=dev dest-server="https://kubernetes.default.svc" fields.level=0 time_ms=5
time="2023-02-28T11:30:51Z" level=info msg="Refreshing app status (controller refresh requested), level (0)" application=kube-argo/dev-service-name
time="2023-02-28T11:30:51Z" level=info msg="No status changes. Skipping patch" application=kube-argo/dev-service-name
time="2023-02-28T11:30:51Z" level=info msg="Reconciliation completed" application=kube-argo/dev-service-name dest-name= dest-namespace=dev dest-server="https://kubernetes.default.svc" fields.level=0 time_ms=5
time="2023-02-28T11:30:52Z" level=info msg="Refreshing app status (controller refresh requested), level (0)" application=kube-argo/dev-service-name
time="2023-02-28T11:30:52Z" level=info msg="No status changes. Skipping patch" application=kube-argo/dev-service-name
time="2023-02-28T11:30:52Z" level=info msg="Reconciliation completed" application=kube-argo/dev-service-name dest-name= dest-namespace=dev dest-server="https://kubernetes.default.svc" fields.level=0 time_ms=4
time="2023-02-28T11:30:52Z" level=info msg="Refreshing app status (controller refresh requested), level (0)" application=kube-argo/dev-service-name
time="2023-02-28T11:30:52Z" level=info msg="No status changes. Skipping patch" application=kube-argo/dev-service-name
time="2023-02-28T11:30:52Z" level=info msg="Reconciliation completed" application=kube-argo/dev-service-name dest-name= dest-namespace=dev dest-server="https://kubernetes.default.svc" fields.level=0 time_ms=5
time="2023-02-28T11:30:53Z" level=info msg="Refreshing app status (controller refresh requested), level (0)" application=kube-argo/dev-service-name
time="2023-02-28T11:30:53Z" level=info msg="No status changes. Skipping patch" application=kube-argo/dev-service-name
time="2023-02-28T11:30:53Z" level=info msg="Reconciliation completed" application=kube-argo/dev-service-name dest-name= dest-namespace=dev dest-server="https://kubernetes.default.svc" fields.level=0 time_ms=5

@crenshaw-dev
Copy link
Collaborator

@khorn7sk would you mind opening a new issue for this?

Could you also bump the log level up to debug to see if we log a reason why the refresh was requested?

@d-wierdsma
Copy link

Yesterday I believe I experienced a similar issue to @khorn7sk. I had upgraded to 2.6.4 and noticed that the repo server was making many requests each minute around the same time and was hammering our git instances with an insane amount of requests (See images 1 and 2).

My current ArgoCD setup has about 120 apps, and auto-refresh is turned off in favour of git webhooks, which don't have changes every minute.

All the logs look like this, unfortunately I was unable to turn on debug mode as I had to fix it quickly after I figured out the issue.

{"error":"failed to get git client for repo git@<git-url>","grpc.code":"Unknown","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T18:38:59Z","grpc.time_ms":690.985,"level":"error","msg":"finished unary call with code Unknown","span.kind":"server","system":"grpc","time":"2023-03-14T18:39:00Z"}

@crenshaw-dev please let me know if I should create a separate ticket to track this issue as it took out of git instance last night :D

image
image

@crenshaw-dev
Copy link
Collaborator

@d-wierdsma please do file a new issue!

If you go back to the beginning of the logs, do you get anything different? I suspect the error above is due to rate-limiting. But we won't know until this is merged, because the actual error message is obscured: #12876

@khorn7sk
Copy link

For me it was issue with kyverno, it generating a lot of reports, and that report triggered ArgoCD.

@d-wierdsma
Copy link

I can definitely look at the beginning of this to determine what might have been going on. Shouldn't repo server only retry a connection a certain amount of times per app then fail @crenshaw-dev ? I've set ARGOCD_GIT_ATTEMPTS_COUNT=3 for repo server so I figured it wouldn't attempt so many times.

Entirely possible that it is getting rate-limited by git, in fact I expect so with that amount of requests 😅

@d-wierdsma
Copy link

@crenshaw-dev created a ticket! #12878

Please let me know if there is anything I can do to help out with investigation or potential fixes as well, always eager to help out :D

yyzxw pushed a commit to yyzxw/argo-cd that referenced this issue Aug 9, 2023
…oproj#12379) (argoproj#12576)

* fix: evaluate all possible refresh reasons for multi-source apps (argoproj#12379)

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* remove redundant parentheses

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* tests

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* don't auto-sync, it makes tests flaky

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* auto-sync because sync CLI doesn't work for multi-source apps

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* don't require out-of-sync - app may sync quickly

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* timeout 60

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

* fix timeout

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>

---------

Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
@burkhat
Copy link

burkhat commented Oct 19, 2023

For me it was issue with kyverno, it generating a lot of reports, and that report triggered ArgoCD.

@khorn7sk We've at the moment a similiar problem with ArgoCD and we're using Kyverno too.
How does Kyverno triggers ArgoCD?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working multi-source-apps Bugs or enhancements related to multi-source Applications.
Projects
None yet
Development

Successfully merging a pull request may close this issue.