Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArgoCD Application Stuck In Syncing/Terminating State, when status field update exceeds Application CR resource size #8113

Open
ravi-cldcvr opened this issue Jan 7, 2022 · 13 comments
Labels
bug Something isn't working

Comments

@ravi-cldcvr
Copy link

ravi-cldcvr commented Jan 7, 2022

ArgoCD Application deployment is stuck in an infinite loop of Syncing the Application. Initially it works but after sometimes it get stuck in Syncing state.
We have tried Terminating the sync but after terminating it got stuck in Terminating State.

Can Anyone Help to sort this out. What would be the possible issue!!

Screenshots

image

image

image

Version

2.1.7

Logs

time="2022-01-07T16:19:36Z" level=debug msg="Failed to apply normalization: error in remove for path: '/spec/preserveUnknownFields': Unable to remove nonexistent key: preserveUnknownFields: missing value"
time="2022-01-07T16:19:36Z" level=debug msg="Failed to apply normalization: error in remove for path: '/status': Unable to remove nonexistent key: status: missing value"
time="2022-01-07T16:19:36Z" level=debug msg="Failed to apply normalization: error in remove for path: '/spec/preserveUnknownFields': Unable to remove nonexistent key: preserveUnknownFields: missing value"
time="2022-01-07T16:19:36Z" level=debug msg="Failed to apply normalization: error in remove for path: '/status': Unable to remove nonexistent key: status: missing value"
time="2022-01-07T16:19:36Z" level=debug msg="Failed to apply normalization: error in remove for path: '/spec/preserveUnknownFields': Unable to remove nonexistent key: preserveUnknownFields: missing value"
time="2022-01-07T16:19:36Z" level=debug msg="Failed to apply normalization: error in remove for path: '/spec/preserveUnknownFields': Unable to remove nonexistent key: preserveUnknownFields: missing value"
time="2022-01-07T16:19:36Z" level=debug msg="Failed to apply normalization: error in remove for path: '/status': Unable to remove nonexistent key: status: missing value"
time="2022-01-07T16:19:36Z" level=debug msg="Failed to apply normalization: error in remove for path: '/spec/preserveUnknownFields': Unable to remove nonexistent key: preserveUnknownFields: missing value"
time="2022-01-07T16:19:36Z" level=debug msg="patch: {\"status\":{\"reconciledAt\":\"2022-01-07T16:19:36Z\"}}" application=argocd
time="2022-01-07T16:19:36Z" level=info msg="Failed to Update application operation state: etcdserver: request is too large, retrying in 1s"
time="2022-01-07T16:19:36Z" level=info msg="Update successful" application=argocd
time="2022-01-07T16:19:36Z" level=info msg="Reconciliation completed" application=argocd dedup_ms=0 dest-name= dest-namespace=argocd dest-server="https://417901E660A9365B4057207C70C682EE.gr7.ap-south-1.eks.amazonaws.com" diff_ms=174 fields.level=2 git_ms=13 health_ms=3 live_ms=1 settings_ms=0 sync_ms=0 time_ms=249
time="2022-01-07T16:19:36Z" level=info msg="Refreshing app status (controller refresh requested), level (1)" application=argocd```
@ravi-cldcvr ravi-cldcvr added the bug Something isn't working label Jan 7, 2022
@utkarsh-devops
Copy link

time="2022-01-07T18:08:00Z" level=warning msg="finished unary call with code FailedPrecondition" error="rpc error: code = FailedPrecondition desc = another operation is already in progress" grpc.code=FailedPrecondition grpc.method=Sync grpc.service=application.ApplicationService grpc.start_time="2022-01-07T18:08:00Z" grpc.time_ms=484.012 span.kind=server system=grpc

@jgwest
Copy link
Member

jgwest commented Jan 10, 2022

@ravi-cldcvr can you provide more information on the Application you are attempting to synchronize? For example, can you provide the YAML definition for it? It appears that Argo CD is unable to update the status field of the CR, due to it being too large (for example, the Application has too many child resources).

@gouravjoshicldcvr
Copy link

@jgwest Thanks for your response.
Yeah you are correct my application is little big which contains 35 helm charts and these charts create approximately more than 600 resources.
what should we do in such case? if you can suggest it will be very helpful to us and this is a blocker for us.

@gouravjoshicldcvr
Copy link

I think every-time when argocd sync with git in the backend it creates a revision and append in same application yaml file.
if it syncs for 5-6 times then it will create more revisions and same will be appended in application.yaml and then finally it reaches to the state where argocd unable to update yaml and app stuck in syncing state.

@jgwest jgwest changed the title ArgoCD Application Stuck In Syncing/Terminating State ArgoCD Application Stuck In Syncing/Terminating State, when status field update exceeds Application CR resource size Jan 11, 2022
@gouravjoshicldcvr
Copy link

@jgwest Can you suggest something on this to make it work in our current production env bcz all our application got stuck in syncing state which is very painful.

@jgwest
Copy link
Member

jgwest commented Jan 12, 2022

I'm not aware of any Argo CD settings that you can tweak here to reduce the size/verbosity of the status field (you could ask on Argo CD slack to see if anyone else has hit this or similar issue), my only suggestion would be to refactor your Argo CD application to use fewer resources.

@gouravjoshicldcvr
Copy link

@jgwest Thanks for your response.
I can not remove the resources from app.
Can you please tell me how much time argocd team will take to assign this issue to someone.

@jgwest
Copy link
Member

jgwest commented Jan 12, 2022

@gouravjoshicldcvr Argo CD planning is not that formally organized, each bug and feature is evaluated by individual teams/companies/contributors for contributions, such I can't give you such an estimate.

@gouravjoshicldcvr
Copy link

@jgwest Can we restrict argocd to store only certain number of revision like 3 or 5, in this limit will never breach.

@jessesuen
Copy link
Member

This issue was discussed in the contributing meeting today.

if it syncs for 5-6 times then it will create more revisions and same will be appended in application.yaml and then finally it reaches to the state where argocd unable to update yaml and app stuck in syncing state.

The history of syncs normally contributes very little to the size. However, it's possible it may get quite large if you are leveraging inlined helm values. Is that the case?

For example, here is example of items in history:

"history": [
{
"revision": "f373c3409ba9e17a44a01fef0e8ccfb267cb0ddb",
"deployedAt": "2021-11-02T10:29:43Z",
"id": 11,
"source": {
"repoURL": "https://github.com/xxx/yyy.git",
"path": "system/argo-cd",
"targetRevision": "HEAD"
},
"deployStartedAt": "2021-11-02T10:29:41Z"
},
{
"revision": "3f274138d93cc0d1579dad461b0bfb2edeb53924",
"deployedAt": "2021-11-05T18:03:33Z",
"id": 12,
"source": {
"repoURL": "https://github.com/xxx/yyy.git",
"path": "system/argo-cd",
"targetRevision": "HEAD"
},
"deployStartedAt": "2021-11-05T18:02:20Z"
}

@jgwest Can we restrict argocd to store only certain number of revision like 3 or 5, in this limit will never breach.

Yes, if history is contributing to your CR size, you can reduce this using:

kind: Application
spec:
  revisionHistoryLimit: 1

Outside of that, the other largest contributors to CRD size are:

  • status.resources - list of resources + health/sync status
  • status. operationState.syncResult - list of resources, apply result, a message from most recent sync

However, there is no way to reduce those two in size

@gouravjoshicldcvr
Copy link

  • most
    @jgwest yes are providing values with --values-literal-file and as per my understanding it appends the values in application.yaml file only.

@gouravjoshicldcvr
Copy link

@jgwest by limiting the reversion this issue got solved.
Thanks

@andrewm-aero
Copy link

In case anyone else ends up here with a similar issue, we ran into this, but we couldn't get the application controller to "terminate" the sync, even after setting the revision limit. The solution ended up being to delete the "status" field via a "kubectl edit" and set the status to an empty object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants