Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
mgr/prometheus: Fix regression with OSD/host details/overview dashboards
Fix issues with PromQL expressions and vector matching with the `ceph_disk_occupation` metric. As it turns out, `ceph_disk_occupation` cannot simply be used as expected, as there seem to be some edge cases for users that have several OSDs on a single disk. This leads to issues which cannot be approached by PromQL alone (many-to-many PromQL erros). The data we have expected is simply different in some rare cases. I have not found a sole PromQL solution to this issue. What we basically need is the following. 1. Match on labels `host` and `instance` to get one or more OSD names from a metadata metric (`ceph_disk_occupation`) to let a user know about which OSDs belong to which disk. 2. Match on labels `ceph_daemon` of the `ceph_disk_occupation` metric, in which case the value of `ceph_daemon` must not refer to more than a single OSD. The exact opposite to requirement 1. As both operations are currently performed on a single metric, and there is no way to satisfy both requirements on a single metric, the intention of this commit is to extend the metric by providing a similar metric that satisfies one of the requirements. This enables the queries to differentiate between a vector matching operation to show a string to the user (where `ceph_daemon` could possibly be `osd.1` or `osd.1+osd.2`) and to match a vector by having a single `ceph_daemon` in the condition for the matching. Although the `ceph_daemon` label is used on a variety of daemons, only OSDs seem to be affected by this issue (only if more than one OSD is run on a single disk). This means that only the `ceph_disk_occupation` metadata metric seems to need to be extended and provided as two metrics. `ceph_disk_occupation` is supposed to be used for matching the `ceph_daemon` label value. foo * on(ceph_daemon) group_left ceph_disk_occupation `ceph_disk_occupation_human` is supposed to be used for anything where the resulting data is displayed to be consumed by humans (graphs, alert messages, etc). foo * on(device,instance) group_left(ceph_daemon) ceph_disk_occupation_human Fixes: https://tracker.ceph.com/issues/52974 Signed-off-by: Patrick Seidensal <pseidensal@suse.com> (cherry picked from commit 18d3a71) Conflicts: monitoring/grafana/dashboards/host-details.json monitoring/grafana/dashboards/hosts-overview.json monitoring/grafana/dashboards/jsonnet/grafana_dashboards.jsonnet monitoring/grafana/dashboards/osd-device-details.json monitoring/grafana/dashboards/tests/features/hosts_overview.feature src/pybind/mgr/prometheus/module.py - Octopus does not generate Grafana dashboards using jsonnet, hence grafana_dashboards.jsonnet was removed. - Octopus does not support features, hence hosts_overview.feature was removed. - Features implemented in prometheus/module.py that never were backported to Octopus were removed. - `tox.ini` file adapted to include mgr/prometheus tests introduced by the backport. - Add `cherrypy` to src/pybind/mgr/requirements.txt to fix Prometheus unit testing.
- Loading branch information
Showing
8 changed files
with
249 additions
and
22 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.