Automated alerts for cluster issues would be extremely useful to improve the safety of a CNPG cluster.
Ideally, alerts should come in both a warning and critical severity levels, so that users have time to react to an alert before a problem escalates.
Proposed alerts should include:
- Instance failure based on the cluster spec
- Low Disk space
- High Replication Lag
- High connection count on any instance
There are some alerts such as high CPU usage and/or memory that are already covered by kube-prometheus-stack and should not be implemented here.
Automated alerts for cluster issues would be extremely useful to improve the safety of a CNPG cluster.
Ideally, alerts should come in both a
warningandcriticalseverity levels, so that users have time to react to an alert before a problem escalates.Proposed alerts should include:
There are some alerts such as high CPU usage and/or memory that are already covered by
kube-prometheus-stackand should not be implemented here.