You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're using ArgoCD to manage cluster configurations. Recently, the application-controller pod entered a crashloop due to OOM errors. It happens. The concerning part is that it caused a service degradation on the clusters it manages by replacing values in argocd-vault-plugin managed Secrets with placeholder values, as if the plugin sidecar didn't run at all. For example, this is what it did to the apiserver crt:
Multiple Secrets were affected (not all) and some services did not take it well, resulting in cluster degradation.
I experimented with this later and was able to reproduce with following steps:
Install ArgoCD from gitops-operator 1.12.1 (v2.10.5+335875d)
Setup AVP plugin as a sidecar.
Create ArgoCD application(s) managing Secrets using AVP.
Set memory limit so that application-controller gets OOMKilled.
Expected results:
Sync does not work at all or works partially, but when and where it works, it delivers correct resources, with Secrets correctly populated from Hashicorp Vault using argocd-vault-plugin (AVP).
Actual results:
A number of Secrets got synced with placeholder values instead of actual tokens/certs/passwords from Hashicorp Vault.
When application-controller is stable, ArgoCD works as expected. The issues observed with Secret resolution or otherwise.
❯ argocd version
argocd: v2.10.3+0fd6344
BuildDate: 2024-03-13T19:37:04Z
GitCommit: 0fd6344537eb948cff602824a1d060421ceff40e
GitTreeState: clean
GoVersion: go1.21.7
Compiler: gc
Platform: linux/amd64
WARN[0000] Failed to invoke grpc call. Use flag --grpc-web in grpc calls. To avoid this warning message, use flag --grpc-web.
argocd-server: v2.10.5+335875d
BuildDate: 2024-04-04T12:32:14Z
GitCommit: 335875d13e018bed6e03873f4742582582964745
GitTreeState: clean
GoVersion: go1.21.7 (Red Hat 1.21.7-1.module+el8.10.0+21318+5ea197f8)
Compiler: gc
Platform: linux/amd64
ExtraBuildInfo: {Vendor Information: Red Hat OpenShift GitOps version: v1.12.1}
Kustomize Version: v5.2.1 unknown
Helm Version: v3.14.0+g2a2fb3b
Kubectl Version: v0.26.11
Jsonnet Version: v0.20.0
This seems weird. Plugins are executed on behalf of the repository server, not the application controller. The application controller does not interact with a plugin in whatever way.
Is there something else happening when the application controller gets OOM killed?
I didn't notice anything else wrong and didn't find anything interesting in the logs. There is definitely a correlation between the application controller crashloop and messed up secrets (I've seen it twice). Not sure about causality, perhaps it's indirect, or perhaps there was something else going on. I'll experiment some more if I find the time, so far we're running stable after bumping memory limits and sharding the app controller.
Hello!
We're using ArgoCD to manage cluster configurations. Recently, the application-controller pod entered a crashloop due to OOM errors. It happens. The concerning part is that it caused a service degradation on the clusters it manages by replacing values in argocd-vault-plugin managed Secrets with placeholder values, as if the plugin sidecar didn't run at all. For example, this is what it did to the apiserver crt:
Multiple Secrets were affected (not all) and some services did not take it well, resulting in cluster degradation.
I experimented with this later and was able to reproduce with following steps:
Expected results:
Sync does not work at all or works partially, but when and where it works, it delivers correct resources, with Secrets correctly populated from Hashicorp Vault using argocd-vault-plugin (AVP).
Actual results:
A number of Secrets got synced with placeholder values instead of actual tokens/certs/passwords from Hashicorp Vault.
When application-controller is stable, ArgoCD works as expected. The issues observed with Secret resolution or otherwise.
The text was updated successfully, but these errors were encountered: