-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support and document using HPA for repo-server #2559
Comments
One more job for the #2468 ? |
…e number of pending manifest requests.
… of pending manifest requests. (#2658)
… of pending manifest requests. (#2658)
So I have used the gauge and came up with this using a KEDA ScaleObject: apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
name: argocd-repo-server
spec:
scaleTargetRef:
deploymentName: argocd-repo-server
maxReplicaCount: 30
minReplicaCount: 3
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus-k8s.monitoring.svc.cluster.local:9090
metricName: argocd_repo_pending_request_total
query: avg(sum(argocd_repo_pending_request_total{namespace="argocd", job="argocd-repo-server"}) by (instance))
threshold: '3' But something bothers me, I think it is too late when scale up is triggered, all the requests are already in the repo-server replicas, added replicas will only be able to process subsequent requests, which may not happen for sometime, so we will scale back down, and scaling up was pointless. Here is how It is mostly spikes of ~50-60 requests. Have you considered using a work queue? Maybe Redis could be used as a FIFO queue and to pass the manifests (the controller would read directly what is currently the cache)? |
We're seeing these CPU spikes as well, something should definitely buffer this out. I haven't yet tried scaling up the repo server too much but with your comment it seems it won't really help that much. |
For now, we are using a Cron scaler for business hours: apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
name: argocd-repo-server
spec:
scaleTargetRef:
deploymentName: argocd-repo-server
minReplicaCount: 3
triggers:
- type: cron
metadata:
timezone: America/Toronto
start: 0 9 * * 1-5
end: 0 18 * * 1-5
desiredReplicas: "14" On Slack, Alexander also pointed out another area of improvement: the repo-server should re-use the cloned Git repositories, right now it clones them on each request. |
@maxbrunet I can imagine that could lead to problems if repo cleanups aren't properly implemented. Due to how repo-server currently works it's completely not a problem to use force-pushing, floating tags, custom plugins that may create new files or change existing ones during manifest generation (for example decrypting secrets encrypted with SOPS). |
Summary
Provide and ability to automate repo-server auto-scaling using HPA.
Motivation
The repo server needs to be scaled up if argocd manages two many applications or a lot of applications are defined in the same repo. In both cases, manifest generation is taking too much time and apps reconciliation is slow.
Proposal
Add gauge Prometheus metric which represents the number of pending manifest requests.
Add sample HPA configuration which auto-scales repo-server if number of pending manifest requests is too high.
The text was updated successfully, but these errors were encountered: