You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We use this helm chart on our OKD cluster, and it's been working pretty nicely so far. However, the EtcdBackupCronJobStatusFailed has started firing when a job is running.
However, I don't understand why I've started receiving such errors only recently. We've recently updated the cluster, which is probably the cause. But the issue might come from something else.
Cluster version : 4.11.0-0.okd-2022-11-05-030711
Solutions or workarounds
I've tried to replace kube_job_status_succeeded{namespace="infra-backup-etcd"} == 0 by kube_job_status_failed{namespace="infra-backup-etcd"} > 0, but I'm not sure whether it would raise an alert in case of a failure to schedule a job.
Another solution would be kube_job_status_succeeded{namespace="infra-backup-etcd"} + kube_job_status_active{namespace="infra-backup-etcd"} != 1. It seems to work fine, though there could still be some cases I've missed.
The text was updated successfully, but these errors were encountered:
I don't think we've been able to reproduce this at any point. It might be 4.11 specific, we don't currently have that in production because we stick with LTS versions so it's less tested.
We use this helm chart on our OKD cluster, and it's been working pretty nicely so far. However, the
EtcdBackupCronJobStatusFailed
has started firing when a job is running.However, I don't understand why I've started receiving such errors only recently. We've recently updated the cluster, which is probably the cause. But the issue might come from something else.
Cluster version :
4.11.0-0.okd-2022-11-05-030711
Solutions or workarounds
I've tried to replace
kube_job_status_succeeded{namespace="infra-backup-etcd"} == 0
bykube_job_status_failed{namespace="infra-backup-etcd"} > 0
, but I'm not sure whether it would raise an alert in case of a failure to schedule a job.Another solution would be
kube_job_status_succeeded{namespace="infra-backup-etcd"} + kube_job_status_active{namespace="infra-backup-etcd"} != 1
. It seems to work fine, though there could still be some cases I've missed.The text was updated successfully, but these errors were encountered: