Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant git requests on v1.8.0-rc1 #4926

Closed
3 tasks done
servo1x opened this issue Nov 28, 2020 · 8 comments · Fixed by #4937
Closed
3 tasks done

Constant git requests on v1.8.0-rc1 #4926

servo1x opened this issue Nov 28, 2020 · 8 comments · Fixed by #4937
Assignees
Labels
bug Something isn't working
Milestone

Comments

@servo1x
Copy link

servo1x commented Nov 28, 2020

If you are trying to resolve an environment-specific issue or have a one-off question about the edge case that does not require a feature then please consider asking a question in argocd slack channel.

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

After upgrading to v1.8.0-rc1, we noticed that git requests (ls-remote / checkout) continues to increase, this is hammering our git server.

We have a mono repo with 500 helm chart applications and have included .argocd-allow-concurrency, we then scaled the application-controller with 5 replicas and set ARGOCD_CONTROLLER_REPLICAS=5.

To Reproduce

# argo application controller
argocd-application-controller --status-processors 50 --operation-processors 25 --repo-server-timeout-seconds 900 --app-resync 86400 --app-state-cache-expiration 1h0m0s --self-heal-timeout-seconds 5 --loglevel info --sentinel argocd-redis-master-0.argocd-redis-headless.argocd.svc:26379 --sentinel argocd-redis-slave-0.argocd-redis-headless.argocd.svc:26379 --sentinel argocd-redis-slave-1.argocd-redis-headless.argocd.svc:26379 --sentinelmaster argocd

Expected behavior

Application controller to not be constantly causing git-requests.

Screenshots

The end of the 26th is when we deployed v1.8.0-rc1 (where the huge spike is).

Here's a chart of the metrics for repo server for the last 7 days, the spikes are the --app-resync, otherwise the syncs are 0 (ARGOCD_CONTROLLER_REPLICAS=5):

Screen Shot 2020-11-27 at 9 37 36 PM

We deployed v1.8.0-rc1 on the 26th, where we can see a huge spike in git requests (ARGOCD_CONTROLLER_REPLICAS=5):

Screen Shot 2020-11-27 at 9 40 17 PM

After that huge spike, the requests become constant at ~60 (ARGOCD_CONTROLLER_REPLICAS=5):

Screen Shot 2020-11-27 at 9 41 40 PM

Here's a chart going from 0 application controllers to 1 to demonstrate the rise in syncing (ARGOCD_CONTROLLER_REPLICAS=1):

Screen Shot 2020-11-27 at 9 02 09 PM

Version

argocd: v1.7.4+f8cbd6b
  BuildDate: 2020-09-05T02:46:53Z
  GitCommit: f8cbd6bf432327cc3b0f70d23b66511bb906a178
  GitTreeState: clean
  GoVersion: go1.14.1
  Compiler: gc
  Platform: darwin/amd64
argocd-server: v1.8.0-rc1+868516f
  BuildDate: 2020-11-25T18:14:46Z
  GitCommit: 868516f0bd03e067d9252ca4030e0b291594aff2
  GitTreeState: clean
  GoVersion: go1.14.12
  Compiler: gc
  Platform: linux/amd64
  Ksonnet Version: v0.13.1
  Kustomize Version: v3.8.1 2020-07-16T00:58:46Z
  Helm Version: v3.4.1+gc4e7485
  Kubectl Version: v1.17.8

Logs

We notice this in the application logs:

2020-11-28T05:03:36.652593013Z time="2020-11-28T05:03:36Z" level=info msg="Refreshing app status (spec.destination differs), level (2)" application=test

Could this be causing the constant git checkouts?

@servo1x servo1x added the bug Something isn't working label Nov 28, 2020
@servo1x servo1x changed the title App continues to constantly resync on v1.8.0-rc1 Constant git requests checkout on v1.8.0-rc1 Nov 28, 2020
@servo1x servo1x changed the title Constant git requests checkout on v1.8.0-rc1 Constant git requests on v1.8.0-rc1 Nov 28, 2020
@jessesuen jessesuen added this to the v1.8 milestone Nov 30, 2020
@alexmt
Copy link
Collaborator

alexmt commented Dec 1, 2020

Hello @servo1x ,

The initial spike of Git checkout, ls-remote is expected, because v1.8 release invalidates redis cache.

Found a bug that causes ls-remote increase. It affects applications that have cluster name instead of URL in the destination field (fix: #4937 ). Can you please confirm you have applications with cluster name in the destination?

@servo1x
Copy link
Author

servo1x commented Dec 1, 2020

Hi @alexmt, that's correct we are using cluster names instead of URL. For the git checkout spike, we would expect the spike to be no more than the number of applications we have (~500), but it is almost 5 times as much and eventually it continues to checkout at a rate of ~60.

@alexmt alexmt reopened this Dec 1, 2020
@alexmt
Copy link
Collaborator

alexmt commented Dec 1, 2020

thank you for response @servo1x . Trying to find reason of increased checkout rate.

@alexmt
Copy link
Collaborator

alexmt commented Dec 2, 2020

Another possibility is that the shared controller just reconciles applications more frequently and sends more requests to the repo server.

Repo server is supposed to cache manifests but it does cache two types manifest generation errors (path specified in app source is missing or commit verification fails). Prepared PR that fixes it: #4947

@alexmt
Copy link
Collaborator

alexmt commented Dec 3, 2020

Hello @servo1x ,

We've published v1.8.0-rc2 with several fixes that hopefully solve that problem. Can you give it a try please?
Heads up - git checkout/ls-remote will spike again.

Thanks,
Alex

@servo1x
Copy link
Author

servo1x commented Dec 3, 2020

Thanks @alexmt, I will give it a try and report back! Will test this over the weekend to not impact our users.

@servo1x
Copy link
Author

servo1x commented Dec 5, 2020

Hi @alexmt, the issue appears to be resolved. Git requests both ls-remote and checkout do not exceed the # of applications.

Screen Shot 2020-12-05 at 1 09 20 PM

The number of requests gradually increases and the spike was from the git webhook triggering a refresh on everything. Eventually we can see as the refreshes complete, the requests drop to 0.

Screen Shot 2020-12-05 at 1 11 19 PM

Thanks for investigating and fixing the issue! 🎉 🎉 🎉

@alexmt
Copy link
Collaborator

alexmt commented Dec 5, 2020

Great news! Thanks a lot for reporting it and help with debugging!

@alexmt alexmt closed this as completed Dec 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants