Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI app sync --retry-limit does not work #4505

Open
3 tasks done
mmckane opened this issue Oct 7, 2020 · 11 comments
Open
3 tasks done

CLI app sync --retry-limit does not work #4505

mmckane opened this issue Oct 7, 2020 · 11 comments
Labels
answered question Issue is a question or reach for support works-for-me Works as intended, or unable to reproduce

Comments

@mmckane
Copy link

mmckane commented Oct 7, 2020

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

argocd app sync myapp --retry-limit 5 does not work as expected. It does not seem to retry at all if another sync or operation is in progress. If another sync is running already via autosync or manually started in the UI the CLI will error out and exit without completing the sync.

To Reproduce

Start manual sync of an app in the ui, at the same time slightly after the ui sync has started manually sync the app with the command argocd app sync myapp --retry-limit 5. The CLI will not retry the sync and exit with a code of 20 and output the following:

time="2020-10-07T17:51:09-05:00" level=fatal msg="rpc error: code = FailedPrecondition desc = another operation is already in progress"

Expected behavior

CLI retries the sync properly with backoffs and does not exit with an error code.

Version

argocd: v1.7.7+33c93ae
  BuildDate: 2020-09-29T04:59:10Z
  GitCommit: 33c93aea0b9ee3d02fb9703cd82cecce3540e954
  GitTreeState: clean
  GoVersion: go1.14.1
  Compiler: gc
  Platform: windows/amd64
argocd-server: v1.7.7+33c93ae
  BuildDate: 2020-09-29T04:56:23Z
  GitCommit: 33c93aea0b9ee3d02fb9703cd82cecce3540e954
  GitTreeState: clean
  GoVersion: go1.14.1
  Compiler: gc
  Platform: linux/amd64
  Ksonnet Version: v0.13.1
  Kustomize Version: {Version:kustomize/v3.6.1 GitCommit:c97fa946d576eb6ed559f17f2ac43b3b5a8d5dbd BuildDate:2020-05-27T20:47:35Z GoOs:linux GoArch:amd64}
  Helm Version: version.BuildInfo{Version:"v3.2.0", GitCommit:"e11b7ce3b12db2941e90399e874513fbd24bcb71", GitTreeState:"clean", GoVersion:"go1.13.10"}
  Kubectl Version: v1.17.8
@mmckane mmckane added the bug Something isn't working label Oct 7, 2020
@mmckane
Copy link
Author

mmckane commented Oct 8, 2020

Not familiar enough with the code base yet to make a PR to fix this. But to triage this for someone else it appears that the CLI calls this Function which always exits with an error if an operation is in progress.

@jessesuen
Copy link
Member

This is actually unrelated to --retry-limit. The error is another operation is already in progress, which is current design of Argo CD (we do not allow two operations to happen, nor do we allowed operations to queue up).

The workaround for this is:

argocd app wait APPNAME --operation && argocd app sync APPNAME

@jessesuen jessesuen added answered question Issue is a question or reach for support works-for-me Works as intended, or unable to reproduce and removed bug Something isn't working labels Oct 8, 2020
@mmckane
Copy link
Author

mmckane commented Oct 9, 2020

So we are using that workaround but are still seeing errors 10-20% of the time in our pipeline. I was hoping this switch would help. Is there any plans to allow queuing or retry logic into the cli in the case a pipeline hits this error?

Is the solution to just turn of autosync so our deploy pipeline doesn't take this error, or are there other operations that could also result in this error?

@boolafish
Copy link

+1 for this. Seeing this error too.

@cbl315
Copy link
Contributor

cbl315 commented Jun 3, 2021

Any update about this issue?
Argocd version: v1.8.4

@mmckane
Copy link
Author

mmckane commented Jun 3, 2021

We ended up Turning off autosync and it seems to have mitigated the issue a bit. We still have problems around app of apps that can be syncd by multiple deployments fail because another deploy/sync is happening at the same time.

@cbl315
Copy link
Contributor

cbl315 commented Jun 4, 2021

We ended up Turning off autosync and it seems to have mitigated the issue a bit. We still have problems around app of apps that can be syncd by multiple deployments fail because another deploy/sync is happening at the same time.

Turn off the autosync seems not elegant, hope there is a better way.

@robermar23
Copy link

robermar23 commented Jun 7, 2021

We have the same issue.

Running ver 2

Multiple microservices handled by the same ArgoCD Application.

Builds start on commit. Each build calls:
argocd app wait APPNAME --operation && argocd app sync APPNAME

If multiple builds are "waiting", once the first sync completes, the rest attempt to kick off their own sync at the same time, returning the same "operation already in progress" error.

Is selective sync our only solution here?

@andrewm-aero
Copy link

+1, getting bit by this too.

Given that ArgoCD is founded on the concept of declarative management, it seems bewildering that there's no single operation that says "Wait until synced to the latest, do whatever you need to ensure that happens, only fail if that is impossible or takes too long". In order to get pipelines which don't spuriously fail, we've been reduced to scraping the log output for that particular error message, and log scraping is generally a sign that something has gone wrong at a fundamental level.

@tdongsi
Copy link

tdongsi commented Jul 14, 2021

+1, getting this problem.

We want to have parallel pipelines to do argocd app sync and, optionally, argocd app rollback if there is a problem.

argocd app wait APPNAME --operation can only help if there are two active parallel parallel pipelines. More than that, we have the same problem as described by @robermar23 above.

If multiple builds are "waiting", once the first sync completes, the rest attempt to kick off their own sync at the same time, returning the same "operation already in progress" error.

@tdongsi
Copy link

tdongsi commented Jul 14, 2021

The workaround for me is to discard argocd app wait entirely and coordinate ArgoCD access (i.e., any argocd app commands) with some lock service in CI system.

For example: My CI system happen to be Jenkins, so the Jenkins-specific solution looks like this in Jenkinsfile:

    def jobs = [:]
    for (String app: apps) {
      jobs[app] = {
        lock('service/argocd') {
          sh "argocd app sync $app"
        }        
      }
    }

    parallel jobs

In this case, multiple pipelines (and their forks due to parallel steps) will wait for and obtain the lock named service/argocd before proceeding with argocd app sync command.

itewk added a commit to itewk/ploigos-step-runner that referenced this issue Oct 1, 2021
itewk added a commit to itewk/ploigos-step-runner that referenced this issue Oct 1, 2021
itewk added a commit to itewk/ploigos-step-runner that referenced this issue Oct 1, 2021
itewk added a commit to itewk/ploigos-step-runner that referenced this issue Oct 1, 2021
itewk added a commit to itewk/ploigos-step-runner that referenced this issue Oct 1, 2021
itewk added a commit to itewk/ploigos-step-runner that referenced this issue Oct 1, 2021
itewk added a commit to itewk/ploigos-step-runner that referenced this issue Oct 13, 2021
itewk added a commit to ploigos/ploigos-step-runner that referenced this issue Oct 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
answered question Issue is a question or reach for support works-for-me Works as intended, or unable to reproduce
Projects
None yet
Development

No branches or pull requests

7 participants