Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/prometheus: expose repaired pgs metrics #47494

Merged
merged 1 commit into from Sep 21, 2022

Conversation

pereman2
Copy link
Contributor

@pereman2 pereman2 commented Aug 8, 2022

Expose num_objects_repaired so users can monitor auto repaired pgs.

Signed-off-by: Pere Diaz Bou pdiazbou@redhat.com
Fixes: https://tracker.ceph.com/issues/57623

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@pereman2 pereman2 added this to In progress in Dashboard via automation Aug 8, 2022
@pereman2 pereman2 requested review from a team, nSedrickm, Pegonzal, epuertat, aaSharma14, nizamial09 and avanthakkar and removed request for a team August 11, 2022 08:35
Copy link
Member

@epuertat epuertat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides my comments on the granularity, I'd like to see a new alert here or a new Dashboard. Personally, I don't think that we should add new metrics if they don't result in anything visible to the operator (via a dashboard or peferrably, an alert).

src/pybind/mgr/prometheus/module.py Outdated Show resolved Hide resolved
Dashboard automation moved this from In progress to Reviewer approved Sep 13, 2022
@nizamial09
Copy link
Member

jenkins test make check

@nizamial09
Copy link
Member

jenkins test windows

Copy link
Member

@nizamial09 nizamial09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few lint errors here but apart from that it looks good

flake8 run-test: commands[0] | flake8 --config=tox.ini alerts balancer cephadm cli_api crash devicehealth diskprediction_local hello iostat localpool nfs orchestrator prometheus selftest
prometheus/module.py:1570:21: E126 continuation line over-indented for hanging indent
prometheus/module.py:1574:21: E123 closing bracket does not match indentation of opening bracket's line
prometheus/module.py:1575:1: W293 blank line contains whitespace
prometheus/module.py:1576:78: W291 trailing whitespace
prometheus/module.py:1577:21: E128 continuation line under-indented for visual indent
1     E123 closing bracket does not match indentation of opening bracket's line
1     E126 continuation line over-indented for hanging indent
1     E128 continuation line under-indented for visual indent
1     W291 trailing whitespace
1     W293 blank line contains whitespace
ERROR: InvocationError for command /home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/.tox/flake8/bin/flake8 --config=tox.ini alerts balancer cephadm cli_api crash devicehealth diskprediction_local hello iostat localpool nfs orchestrator prometheus selftest (exited with code 1)

Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
@Pegonzal Pegonzal merged commit 8b6cb68 into ceph:main Sep 21, 2022
Dashboard automation moved this from Reviewer approved to Done Sep 21, 2022
@Pegonzal Pegonzal deleted the pgs-repaired branch September 21, 2022 12:10
'counter',
'pg_objects_repaired',
'Number of objects repaired in a pool Count',
('poolid',)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pereman2 Why is this metric specifies poolid label instead of pool_id like (all?) other metrics?

The ones that are closely related such as num_objects_recovered or objects certainly use pool_id. It would be nice to be consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Dashboard
  
Done
5 participants