Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeouts in argocd-repo-server after github webhook notification #9017

Open
atschabu opened this issue Apr 6, 2022 · 2 comments
Open

Timeouts in argocd-repo-server after github webhook notification #9017

atschabu opened this issue Apr 6, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@atschabu
Copy link

atschabu commented Apr 6, 2022

Describe the bug

Whenever we push to our github repository, we see a spike in git ls, several of them timing out. Most likely due to github rate limiting us.

Full slack conversation here: https://cloud-native.slack.com/archives/C01TSERG0KZ/p1648534989348809

To Reproduce

Create roughly 100 applications pointing at the main branch of a single github repository and update it.

Expected behavior

git ls calls for the same branch/repo combination are cached until the next webhook call, or hard refresh

Version

argocd: v2.3.3+07ac038.dirty
  BuildDate: 2022-03-30T05:20:22Z
  GitCommit: 07ac038a8f97a93b401e824550f0505400a8c84e
  GitTreeState: dirty
  GoVersion: go1.18
  Compiler: gc
  Platform: darwin/amd64
WARN[0000] Failed to invoke grpc call. Use flag --grpc-web in grpc calls. To avoid this warning message, use flag --grpc-web.
argocd-server: v2.3.3+07ac038
  BuildDate: 2022-03-30T00:06:18Z
  GitCommit: 07ac038a8f97a93b401e824550f0505400a8c84e
  GitTreeState: clean
  GoVersion: go1.17.6
  Compiler: gc
  Platform: linux/amd64
  Ksonnet Version: v0.13.1
  Kustomize Version: v4.4.1 2021-11-11T23:36:27Z
  Helm Version: v3.8.0+gd141386
  Kubectl Version: v0.23.1
  Jsonnet Version: v0.18.0

Logs

{"error":"Get "https://github.com/<org>/<repo>/info/refs?service=git-upload-pack": context deadline exceeded (Client.Timeout exceeded while awaiting headers)", "grpc.code":"Unknown", "grpc.method":"GenerateManifest", "grpc.request.deadline":"2022-03-29T06:03:43Z", "grpc.service":"repository.RepoServerService", "grpc.start_time":"2022-03-29T06:01:43Z", "grpc.time_ms":15000.721, "level":"error", "msg":"finished unary call with code Unknown", "span.kind":"server", "system":"grpc"}
@atschabu atschabu added the bug Something isn't working label Apr 6, 2022
@crenshaw-dev
Copy link
Member

I suspect that the controller is initiating a lot of GenerateManifest repo-server calls at the same time, causing a bunch of ls-remote calls before the cache can be populated. GitHub is throttling the calls, causing refreshes to fail.

I added some logging and I think demonstrated this problem:

INFO[0000] argocd-repo-server v99.99.99+unknown serving on [::]:8081 
INFO[0129] ls-remote cache miss                         
INFO[0129] ls-remote cache miss                         
INFO[0129] ls-remote cache miss                         
INFO[0129] ls-remote cache miss                         
INFO[0129] ls-remote cache miss                         
INFO[0129] ls-remote cache miss                         
INFO[0129] ls-remote cache miss                         
INFO[0129] ls-remote cache miss                         
INFO[0129] ls-remote cache miss                         
INFO[0129] ls-remote cache miss                         
INFO[0129] ls-remote cache miss                         
INFO[0129] ls-remote cache miss                         
INFO[0129] ls-remote cache miss                         
INFO[0129] ls-remote cache miss                         
INFO[0129] ls-remote cache miss                         
INFO[0129] ls-remote cache miss                         
INFO[0129] ls-remote cache miss                         
INFO[0129] ls-remote cache miss                         
INFO[0129] ls-remote cache miss                         
INFO[0129] ls-remote cache miss                         
DEBU[0130] setting revision cache for "https://github.com/crenshaw-dev/test" 
DEBU[0130] setting revision cache for "https://github.com/crenshaw-dev/test" 
DEBU[0130] setting revision cache for "https://github.com/crenshaw-dev/test" 
DEBU[0130] setting revision cache for "https://github.com/crenshaw-dev/test" 
DEBU[0130] symbolic reference 'HEAD' (refs/heads/main) resolved to '4cba55b56621065d3ab00d68e737ffdb79e22cdd' 
DEBU[0130] symbolic reference 'HEAD' (refs/heads/main) resolved to '4cba55b56621065d3ab00d68e737ffdb79e22cdd' 
DEBU[0130] symbolic reference 'HEAD' (refs/heads/main) resolved to '4cba55b56621065d3ab00d68e737ffdb79e22cdd' 
DEBU[0130] setting revision cache for "https://github.com/crenshaw-dev/test" 
DEBU[0130] setting revision cache for "https://github.com/crenshaw-dev/test" 
DEBU[0130] symbolic reference 'HEAD' (refs/heads/main) resolved to '4cba55b56621065d3ab00d68e737ffdb79e22cdd' 
DEBU[0130] setting revision cache for "https://github.com/crenshaw-dev/test" 

I think we need a mechanism to "group" these simultaneous requests so that there are fewer ls-remote calls to the SCM.

@poonsalai
Copy link

any update on this please? we are also getting timeouts on webhooks, seems from argo side. Also is there any setting that we can put at argo to increase the timeout?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants