Skip to content

Feature: Cluster Chart PrometheusRule alerts #192

@itay-grudev

Description

@itay-grudev

Automated alerts for cluster issues would be extremely useful to improve the safety of a CNPG cluster.

Ideally, alerts should come in both a warning and critical severity levels, so that users have time to react to an alert before a problem escalates.

Proposed alerts should include:

  • Instance failure based on the cluster spec
  • Low Disk space
  • High Replication Lag
  • High connection count on any instance

There are some alerts such as high CPU usage and/or memory that are already covered by kube-prometheus-stack and should not be implemented here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions