Skip to content

Commit

Permalink
Merge pull request #31711 from p-se/wip-pse-fix-osd-full-alert
Browse files Browse the repository at this point in the history
monitoring: wait before firing osd full alert

Reviewed-by: Jan Fajerski <jfajerski@suse.com>
  • Loading branch information
tchaikov committed Nov 24, 2019
2 parents 3e66ada + d262ade commit 2add8d1
Showing 1 changed file with 5 additions and 4 deletions.
9 changes: 5 additions & 4 deletions monitoring/prometheus/alerts/ceph_default_alerts.yml
Expand Up @@ -50,6 +50,7 @@ groups:
description: One or more OSDs down for more than 15 minutes.
- alert: OSDs near full
expr: ((ceph_osd_stat_bytes_used / ceph_osd_stat_bytes) and on(ceph_daemon) ceph_osd_up == 1) > 0.8
for: 5m
labels:
severity: critical
type: ceph_default
Expand All @@ -65,8 +66,8 @@ groups:
oid: 1.3.6.1.4.1.50495.15.1.2.4.4
annotations:
description: >
OSD {{ $labels.ceph_daemon }} was marked down and back up at least once a
minute for 5 minutes.
OSD {{ $labels.ceph_daemon }} was marked down and back up at least once a
minute for 5 minutes.
# alert on high deviation from average PG count
- alert: high pg count deviation
expr: abs(((ceph_osd_numpg > 0) - on (job) group_left avg(ceph_osd_numpg > 0) by (job)) / on (job) group_left avg(ceph_osd_numpg > 0) by (job)) > 0.35
Expand All @@ -77,8 +78,8 @@ groups:
oid: 1.3.6.1.4.1.50495.15.1.2.4.5
annotations:
description: >
OSD {{ $labels.ceph_daemon }} deviates by more than 30% from
average PG count.
OSD {{ $labels.ceph_daemon }} deviates by more than 30% from
average PG count.
# alert on high commit latency...but how high is too high
- name: mds
rules:
Expand Down

0 comments on commit 2add8d1

Please sign in to comment.