Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argoCD resource events impacts to etcd db size #10529

Open
3 tasks done
daro1337 opened this issue Sep 6, 2022 · 0 comments
Open
3 tasks done

argoCD resource events impacts to etcd db size #10529

daro1337 opened this issue Sep 6, 2022 · 0 comments
Labels
bug Something isn't working

Comments

@daro1337
Copy link

daro1337 commented Sep 6, 2022

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

I had a network problem to the kubernetes API (flaps), so argoCD applications got a timeout when trying to sync.
This led to constant changes in the status of the app, and I hadt housands of events like this:

kubectl get events -n argocd
...
45h         Normal    ResourceUpdated      application/some-app    Updated health status: Healthy -> Missing
45h         Normal    ResourceUpdated      application/some-app    Updated sync status: OutOfSync -> Unknown
45h         Normal    ResourceUpdated      application/some-app   Updated health status: Healthy -> Missing
45h         Normal    ResourceUpdated      application/some-app    Updated sync status: Unknown -> OutOfSync
45h         Normal    ResourceUpdated      application/some-app    Updated health status: Missing -> Healthy
45h         Normal    ResourceUpdated      application/some-app    Updated sync status: Unknown -> OutOfSync
45h         Normal    ResourceUpdated      application/some-app    Updated health status: Missing -> Healthy
45h         Normal    ResourceUpdated      application/some-app   Updated sync status: OutOfSync -> Unknown
...

I have like 200+ apps in my argoCD so it make scale and this leads to grow my etcd to 600MB+ in couple days and continued to grow.
I've made etcd snapshot and I checked where this data increase comes from. Because I have dedicated k8s cluster for argo it was easy to tell that issue is with argo. After inspecting etcd

To Reproduce

  1. make network related issue so k8s API is flapping
  2. argo will try to sync apps every 3min (default)
  3. monitor etcd size

Expected behavior

argoCD should cleanup events resource because it can easily generate thousands of them

Workaround
As a workaround to restore etcd space:

  1. kubectl delete events -n argocd --all -v10 --grace-period 0 --force
  2. make standard etcd procedure (compact & defrag)

Screenshots
ETCD database size increase over time and decrease when I start cleaning up events
etcd-size

Version

v2.3.4

Logs

45h         Normal    ResourceUpdated      application/some-app    Updated health status: Healthy -> Missing
45h         Normal    ResourceUpdated      application/some-app    Updated sync status: OutOfSync -> Unknown
45h         Normal    ResourceUpdated      application/some-app   Updated health status: Healthy -> Missing
45h         Normal    ResourceUpdated      application/some-app    Updated sync status: Unknown -> OutOfSync
45h         Normal    ResourceUpdated      application/some-app    Updated health status: Missing -> Healthy
45h         Normal    ResourceUpdated      application/some-app    Updated sync status: Unknown -> OutOfSync
45h         Normal    ResourceUpdated      application/some-app    Updated health status: Missing -> Healthy
45h         Normal    ResourceUpdated      application/some-app   Updated sync status: OutOfSync -> Unknown
@daro1337 daro1337 added the bug Something isn't working label Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant