Skip to content

mgr/Mgr.cc: clear daemon health metrics instead of removing down/out osd from daemon state#57005

Merged
yuriw merged 1 commit intoceph:mainfrom
cfsnyder:wip-cfsnyder-63195
Jul 10, 2024
Merged

mgr/Mgr.cc: clear daemon health metrics instead of removing down/out osd from daemon state#57005
yuriw merged 1 commit intoceph:mainfrom
cfsnyder:wip-cfsnyder-63195

Conversation

@cfsnyder
Copy link
Copy Markdown
Contributor

@cfsnyder cfsnyder commented Apr 19, 2024

Reverts the change from #53993 and directly clears daemon health metrics for down and out OSDs. The former approach of removing down/out OSDs from the daemon state has undesirable consequences for stat output, including the prometheus exporter.

For example, the approach of removing the daemon from the daemon state causes the down/out OSD to not appear in prometheus output at all. An alert for detecting down/out OSDs that triggers when count(ceph_osd_up = 0) > 1 no longer works.

Fixes: https://tracker.ceph.com/issues/66168

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@cfsnyder cfsnyder requested a review from a team as a code owner April 19, 2024 15:56
@cfsnyder
Copy link
Copy Markdown
Contributor Author

jenkins test make check

@cfsnyder
Copy link
Copy Markdown
Contributor Author

jenkins test api

@cfsnyder
Copy link
Copy Markdown
Contributor Author

jenkins test make check

@cfsnyder
Copy link
Copy Markdown
Contributor Author

@neha-ojha ping. Any reason that no one has been assigned to review this PR yet?

Copy link
Copy Markdown
Contributor

@pdvian pdvian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cfsnyder I agree with your approach here. I tried to solve same problem in a different way with #50543. Your changes looks more appropriate to me.

Copy link
Copy Markdown
Contributor

@rzarzynski rzarzynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit description reuses https://tracker.ceph.com/issues/63195 (the ticket for the change it reverts) which would create a mess when backporting.

I think we need a new, dedicated ticket for this patch.

…osd from daemon state

Reverts the change from ceph#53993
and directly clears daemon health metrics for down and out OSDs.
The former approach of removing down/out OSDs from the daemon
state has undesirable consequences for stat output, including
the prometheus exporter.

Fixes: https://tracker.ceph.com/issues/66168
Signed-off-by: Cory Snyder <csnyder@1111systems.com>
@cfsnyder cfsnyder force-pushed the wip-cfsnyder-63195 branch from 3c74d8f to 282558c Compare May 21, 2024 18:19
@cfsnyder
Copy link
Copy Markdown
Contributor Author

The commit description reuses https://tracker.ceph.com/issues/63195 (the ticket for the change it reverts) which would create a mess when backporting.

I think we need a new, dedicated ticket for this patch.

I've created a new ticket and linked it in the commit and PR.

@ljflores
Copy link
Copy Markdown
Member

ljflores commented Jul 9, 2024

@ljflores
Copy link
Copy Markdown
Member

ljflores commented Jul 9, 2024

jenkins test windows

@yuriw yuriw merged commit c5a5e76 into ceph:main Jul 10, 2024
NitzanMordhai pushed a commit to NitzanMordhai/ceph that referenced this pull request Aug 1, 2024
mgr/Mgr.cc: clear daemon health metrics instead of removing down/out osd from daemon state

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants