Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support and document using HPA for repo-server #2559

Open
alexmt opened this issue Oct 24, 2019 · 5 comments
Open

Support and document using HPA for repo-server #2559

alexmt opened this issue Oct 24, 2019 · 5 comments
Labels
component:config-management Tools specific issues (helm, kustomize etc) enhancement New feature or request type:supportability Enhancements that help operators to run Argo CD

Comments

@alexmt
Copy link
Collaborator

alexmt commented Oct 24, 2019

Summary

Provide and ability to automate repo-server auto-scaling using HPA.

Motivation

The repo server needs to be scaled up if argocd manages two many applications or a lot of applications are defined in the same repo. In both cases, manifest generation is taking too much time and apps reconciliation is slow.

Proposal

Add gauge Prometheus metric which represents the number of pending manifest requests.
Add sample HPA configuration which auto-scales repo-server if number of pending manifest requests is too high.

@alexmt alexmt added the enhancement New feature or request label Oct 24, 2019
@alexec
Copy link
Contributor

alexec commented Oct 28, 2019

One more job for the #2468 ?

alexmt pushed a commit to alexmt/argo-cd that referenced this issue Nov 8, 2019
alexmt pushed a commit that referenced this issue Nov 8, 2019
alexmt pushed a commit that referenced this issue Nov 8, 2019
@jannfis jannfis added component:config-management Tools specific issues (helm, kustomize etc) type:supportability Enhancements that help operators to run Argo CD labels May 14, 2020
@maxbrunet
Copy link
Contributor

So I have used the gauge and came up with this using a KEDA ScaleObject:

apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
  name: argocd-repo-server
spec:
  scaleTargetRef:
    deploymentName: argocd-repo-server
  maxReplicaCount: 30
  minReplicaCount: 3
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus-k8s.monitoring.svc.cluster.local:9090
      metricName: argocd_repo_pending_request_total
      query: avg(sum(argocd_repo_pending_request_total{namespace="argocd", job="argocd-repo-server"}) by (instance))
      threshold: '3'

But something bothers me, I think it is too late when scale up is triggered, all the requests are already in the repo-server replicas, added replicas will only be able to process subsequent requests, which may not happen for sometime, so we will scale back down, and scaling up was pointless.

Here is how sum(argocd_repo_pending_request_total) graphs for us:

image

It is mostly spikes of ~50-60 requests. Have you considered using a work queue? Maybe Redis could be used as a FIFO queue and to pass the manifests (the controller would read directly what is currently the cache)?

@musabmasood
Copy link

We're seeing these CPU spikes as well, something should definitely buffer this out. I haven't yet tried scaling up the repo server too much but with your comment it seems it won't really help that much.

@maxbrunet
Copy link
Contributor

For now, we are using a Cron scaler for business hours:

apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
  name: argocd-repo-server
spec:
  scaleTargetRef:
    deploymentName: argocd-repo-server
  minReplicaCount: 3
  triggers:
  - type: cron
    metadata:
      timezone: America/Toronto
      start: 0 9 * * 1-5
      end: 0 18 * * 1-5
      desiredReplicas: "14"

On Slack, Alexander also pointed out another area of improvement: the repo-server should re-use the cloned Git repositories, right now it clones them on each request.

@PatTheSilent
Copy link

@maxbrunet I can imagine that could lead to problems if repo cleanups aren't properly implemented. Due to how repo-server currently works it's completely not a problem to use force-pushing, floating tags, custom plugins that may create new files or change existing ones during manifest generation (for example decrypting secrets encrypted with SOPS).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:config-management Tools specific issues (helm, kustomize etc) enhancement New feature or request type:supportability Enhancements that help operators to run Argo CD
Projects
None yet
Development

No branches or pull requests

6 participants