Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

monitoring: wait before firing osd full alert #31711

Merged
merged 2 commits into from Nov 24, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
9 changes: 5 additions & 4 deletions monitoring/prometheus/alerts/ceph_default_alerts.yml
Expand Up @@ -50,6 +50,7 @@ groups:
description: One or more OSDs down for more than 15 minutes.
- alert: OSDs near full
expr: ((ceph_osd_stat_bytes_used / ceph_osd_stat_bytes) and on(ceph_daemon) ceph_osd_up == 1) > 0.8
for: 5m
labels:
severity: critical
type: ceph_default
Expand All @@ -65,8 +66,8 @@ groups:
oid: 1.3.6.1.4.1.50495.15.1.2.4.4
annotations:
description: >
OSD {{ $labels.ceph_daemon }} was marked down and back up at least once a
minute for 5 minutes.
OSD {{ $labels.ceph_daemon }} was marked down and back up at least once a
minute for 5 minutes.
# alert on high deviation from average PG count
- alert: high pg count deviation
expr: abs(((ceph_osd_numpg > 0) - on (job) group_left avg(ceph_osd_numpg > 0) by (job)) / on (job) group_left avg(ceph_osd_numpg > 0) by (job)) > 0.35
Expand All @@ -77,8 +78,8 @@ groups:
oid: 1.3.6.1.4.1.50495.15.1.2.4.5
annotations:
description: >
OSD {{ $labels.ceph_daemon }} deviates by more than 30% from
average PG count.
OSD {{ $labels.ceph_daemon }} deviates by more than 30% from
average PG count.
# alert on high commit latency...but how high is too high
- name: mds
rules:
Expand Down