New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr/prometheus: Update rule format and enhance SNMP support #43783
Conversation
|
Here's the tox output [paul@rhp1gen3 tests]$ tox
py3 installed: attrs==21.2.0,beautifulsoup4==4.10.0,bs4==0.0.1,iniconfig==1.1.1,packaging==21.0,pluggy==1.0.0,py==1.10.0,pyparsing==3.0.3,pytest==6.2.5,PyYAML==6.0,soupsieve==2.2.1,toml==0.10.2
py3 run-test-pre: PYTHONHASHSEED='3007607034'
py3 run-test: commands[0] | pytest -rA test_syntax.py test_unittests.py
=============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.9.7, pytest-6.2.5, py-1.10.0, pluggy-1.0.0
cachedir: .tox/py3/.pytest_cache
rootdir: /home/paul/git/ceph/monitoring/prometheus/tests
collected 8 items
test_syntax.py ..... [ 62%]
test_unittests.py ... [100%]
===================================================================================================== PASSES ======================================================================================================
============================================================================================= short test summary info =============================================================================================
PASSED test_syntax.py::test_alerts_present
PASSED test_syntax.py::test_unittests_present
PASSED test_syntax.py::test_rules_format
PASSED test_syntax.py::test_unittests_format
PASSED test_syntax.py::test_rule_syntax
PASSED test_unittests.py::test_alerts_present
PASSED test_unittests.py::test_unittests_present
PASSED test_unittests.py::test_run_unittests
================================================================================================ 8 passed in 5.33s ================================================================================================
py3 run-test: commands[1] | ./validate_rules.py
Checking rule groups
cluster health : ..
mon : .....
osd : ................
mds : .......
mgr : ..
pgs : .........
nodes : .....
pools : ....
healthchecks : .
cephadm : ...
PrometheusServer : .
rados : .
Summary
Rule file : ../alerts/ceph_default_alerts.yml
Unit Test file : test_alerts.yml
Rule groups processed : 12
Rules processed : 56
SNMP OIDs declared : 34
Rule errors : 0
Rule warnings : 0
Rule name duplicates : 0
Unit tests missing : 0
No problems detected in the rule file
No problems detected in unit tests file
_____________________________________________________________________________________________________ summary _____________________________________________________________________________________________________
py3: commands succeeded
congratulations :)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just a small comment on the readability of the CamelCase alert names.
| org cluster (alerts) source Category | ||
| 1.3.6.1 .4 .1 .50495 .1 .2 .1 .2 (Ceph Health) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liewegas I checked the IANA registry for the 50495 org and saw your newdream email address. Should we update that address to some generic ceph.io address?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW @pcuzner according to this conversation, could you plz add the RECENT_CRASH alert too? Thanks!
Rules now adhere to the format defined by Prometheus.io. This changes alert naming and each alert now includes a a summary description to provide a quick one-liner. In addition to reformatting some missing alerts for MDS and cephadm have been added, and corresponding tests added. The MIB has also been refactored, so it now passes standard lint tests and a README included for devs to understand the OID schema. Fixes: https://tracker.ceph.com/issues/53111 Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
|
@epuertat Doh. Added, and squashed. |
Thanks! |
|
jenkins test make check |
Rules now adhere to the format defined by Prometheus.io.
This changes alert naming and each alert now includes a
a summary description to provide a quick one-liner.
In addition to reformatting some missing alerts for MDS and
cephadm have been added, and corresponding tests added.
The MIB has also been refactored, so it now passes standard
lint tests and a README included for devs to understand the
OID schema.
Fixes: https://tracker.ceph.com/issues/53111
Signed-off-by: Paul Cuzner pcuzner@redhat.com
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox