New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr/dashboard:Get different storage class metrics in Prometheus dashboard #47163
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome RFE-to-PR time @aaSharma14 ! Some suggestions over there.
58742c1
to
eb5d69c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Is there anything left here @aaSharma14 ?
Also, we should squash the commits and update its title..
eb5d69c
to
c4b344e
Compare
…hboard Get metrics of the different "HDDRule" and "MixedUse" classes of the "Raw Storage" for their ceph VMs. So that Prometheus can scrape the data and display it to them in grafana Fixes: https://tracker.ceph.com/issues/56625 Signed-off-by: Aashish Sharma <aasharma@redhat.com>
c4b344e
to
2070567
Compare
Nothing left from my side here @nizamial09 . I have squashed the commits as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks a lot for addressing my comments!
Given we don't have any integration or e2e testing on the mgr/prometheus module, what about running the qa/tasks/mgr/test_prometheus.py
test in our API test suite?
I'd suggest adding an additional test there that compares the dump of the tested http://<prometheus>:9283/metrics
endpoint with a git-versioned template like this:
# HELP ceph_health_status Cluster health status
# TYPE ceph_health_status untyped
ceph_health_status 1.0
...
# HELP ceph_mon_metadata MON Metadata
# TYPE ceph_mon_metadata untyped
ceph_mon_metadata{ceph_daemon="mon.a",hostname="",public_addr="172.20.0.2",rank="0",ceph_version=""} 1.0
# HELP ceph_mgr_metadata MGR metadata
# TYPE ceph_mgr_metadata gauge
ceph_mgr_metadata{ceph_daemon="mgr.x",hostname="ceph",ceph_version="ceph version 17.0.0-13521-gf9bd8922 (f9bd8922d75a454b4a0ace5a85637213e5ee5700) quincy (dev)"} 1.0
# HELP ceph_mgr_status MGR status (0=standby, 1=active)
# TYPE ceph_mgr_status gauge
ceph_mgr_status{ceph_daemon="mgr.x"} 1.0
...
The above dump was obtained from a run of ceph-dev in my laptop, so many fields will differ from other runs (IPs, timestamps, metric values, etc). One alternative would be going through that dump looking for IP address, hostnames, etc., and replace that with REGEX (e.g.: hostname="172.20.0.2"
with hostname=REGEX_HOSTNAME
). However that would require a lot of editing and would make it really complicate to update that dump when new fields are added/removed.
So my suggestion would be to implement the 'smartness' in the code instead. Instead of making the changes inline:
- We take each metric (including the
# HELP
and# TYPE
metadata) in the tested prometheus and in the reference dump and produce a tuple of three elements to compare:- HELP: strict comparison
- TYPE: strict comparison
- metric itself: we compare the following:
- the metric name (all metrics in the reference dump should be present in the tested dump)
- the metric label names (but not the metric label values)
I'm ok with going with this PR as is, but an interesting example would be to run that test in this PR and verify that it fails because the cluster_by_class_*
metrics are missing.
What do you think?
Yes, makes sense @epuertat , maybe we can discuss this in the standup call as well to get more clear overview of its implementation |
Fixes: https://tracker.ceph.com/issues/56625
SIgned-off-by:Aashish Sharma aasharma@redhat.com
New metrics-
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows