New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
octopus: mgr/dashboard: fix Grafana OSD/host panels #44924
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
p-se
requested review from
sunilangadi2 and
Sarthak0702
and removed request for
a team
February 7, 2022 14:10
jenkins test make check |
p-se
force-pushed
the
wip-53883-octopus
branch
from
February 8, 2022 08:38
d348965
to
5406b0b
Compare
jenkins test make check |
Fix issues with PromQL expressions and vector matching with the `ceph_disk_occupation` metric. As it turns out, `ceph_disk_occupation` cannot simply be used as expected, as there seem to be some edge cases for users that have several OSDs on a single disk. This leads to issues which cannot be approached by PromQL alone (many-to-many PromQL erros). The data we have expected is simply different in some rare cases. I have not found a sole PromQL solution to this issue. What we basically need is the following. 1. Match on labels `host` and `instance` to get one or more OSD names from a metadata metric (`ceph_disk_occupation`) to let a user know about which OSDs belong to which disk. 2. Match on labels `ceph_daemon` of the `ceph_disk_occupation` metric, in which case the value of `ceph_daemon` must not refer to more than a single OSD. The exact opposite to requirement 1. As both operations are currently performed on a single metric, and there is no way to satisfy both requirements on a single metric, the intention of this commit is to extend the metric by providing a similar metric that satisfies one of the requirements. This enables the queries to differentiate between a vector matching operation to show a string to the user (where `ceph_daemon` could possibly be `osd.1` or `osd.1+osd.2`) and to match a vector by having a single `ceph_daemon` in the condition for the matching. Although the `ceph_daemon` label is used on a variety of daemons, only OSDs seem to be affected by this issue (only if more than one OSD is run on a single disk). This means that only the `ceph_disk_occupation` metadata metric seems to need to be extended and provided as two metrics. `ceph_disk_occupation` is supposed to be used for matching the `ceph_daemon` label value. foo * on(ceph_daemon) group_left ceph_disk_occupation `ceph_disk_occupation_human` is supposed to be used for anything where the resulting data is displayed to be consumed by humans (graphs, alert messages, etc). foo * on(device,instance) group_left(ceph_daemon) ceph_disk_occupation_human Fixes: https://tracker.ceph.com/issues/52974 Signed-off-by: Patrick Seidensal <pseidensal@suse.com> (cherry picked from commit 18d3a71) Conflicts: monitoring/grafana/dashboards/host-details.json monitoring/grafana/dashboards/hosts-overview.json monitoring/grafana/dashboards/jsonnet/grafana_dashboards.jsonnet monitoring/grafana/dashboards/osd-device-details.json monitoring/grafana/dashboards/tests/features/hosts_overview.feature src/pybind/mgr/prometheus/module.py - Octopus does not generate Grafana dashboards using jsonnet, hence grafana_dashboards.jsonnet was removed. - Octopus does not support features, hence hosts_overview.feature was removed. - Features implemented in prometheus/module.py that never were backported to Octopus were removed. - `tox.ini` file adapted to include mgr/prometheus tests introduced by the backport. - Add `cherrypy` to src/pybind/mgr/requirements.txt to fix Prometheus unit testing.
p-se
force-pushed
the
wip-53883-octopus
branch
from
February 10, 2022 14:51
5406b0b
to
48fe18e
Compare
nizamial09
approved these changes
Feb 15, 2022
pereman2
approved these changes
Feb 15, 2022
3 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
backport tracker: https://tracker.ceph.com/issues/53883
backport of #43685
parent tracker: https://tracker.ceph.com/issues/52974
this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/master/src/script/ceph-backport.sh