New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr/prometheus: expose repaired pgs metrics #47494
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides my comments on the granularity, I'd like to see a new alert here or a new Dashboard. Personally, I don't think that we should add new metrics if they don't result in anything visible to the operator (via a dashboard or peferrably, an alert).
635bfc6
to
4bf7dbc
Compare
jenkins test make check |
jenkins test windows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few lint errors here but apart from that it looks good
flake8 run-test: commands[0] | flake8 --config=tox.ini alerts balancer cephadm cli_api crash devicehealth diskprediction_local hello iostat localpool nfs orchestrator prometheus selftest
prometheus/module.py:1570:21: E126 continuation line over-indented for hanging indent
prometheus/module.py:1574:21: E123 closing bracket does not match indentation of opening bracket's line
prometheus/module.py:1575:1: W293 blank line contains whitespace
prometheus/module.py:1576:78: W291 trailing whitespace
prometheus/module.py:1577:21: E128 continuation line under-indented for visual indent
1 E123 closing bracket does not match indentation of opening bracket's line
1 E126 continuation line over-indented for hanging indent
1 E128 continuation line under-indented for visual indent
1 W291 trailing whitespace
1 W293 blank line contains whitespace
ERROR: InvocationError for command /home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/.tox/flake8/bin/flake8 --config=tox.ini alerts balancer cephadm cli_api crash devicehealth diskprediction_local hello iostat localpool nfs orchestrator prometheus selftest (exited with code 1)
Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
4bf7dbc
to
a846205
Compare
'counter', | ||
'pg_objects_repaired', | ||
'Number of objects repaired in a pool Count', | ||
('poolid',) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pereman2 Why is this metric specifies poolid
label instead of pool_id
like (all?) other metrics?
The ones that are closely related such as num_objects_recovered
or objects
certainly use pool_id
. It would be nice to be consistent.
Expose
num_objects_repaired
so users can monitor auto repaired pgs.Signed-off-by: Pere Diaz Bou pdiazbou@redhat.com
Fixes: https://tracker.ceph.com/issues/57623
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows