Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr, mgr/prometheus: Fix regression with prometheus metrics #45505

Merged
merged 1 commit into from Apr 11, 2022

Conversation

pdvian
Copy link

@pdvian pdvian commented Mar 17, 2022

The ceph dameons on host are inheriting ceph version from the host.
This introduces a wrong interpretation in prometheus metrics as well
as in dump_server. Each ceph daemon should represent it's own
ceph version based on the ceph binary is use for that daemon.

Consider a situation where partial upgrade is done on host, some daemons
which are restarted should have ceph version tag as upgraded version
and rest should have older ceph version but presently all inherites
host version. In containerized environment, all daemons are
using ceph version of last daemon registered as a service on the host.

Fixes: https://tracker.ceph.com/issues/54611

Signed-off-by: Prashant D pdhange@redhat.com

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

The ceph dameons on host are inheriting ceph version from the host.
This introduces a wrong interpretation in prometheus metrics as well
as in dump_server. Each ceph daemon should represent it's own
ceph version based on the ceph binary is use for that daemon.

Consider a situation where partial upgrade is done on host, some daemons
which are restarted should have ceph version tag as upgraded version
and rest should have older ceph version but presently all inherites
host version. In containerized environment, all daemons are
using ceph version of last daemon registered as a service on the host.

Fixes: https://tracker.ceph.com/issues/54611

Signed-off-by: Prashant D <pdhange@redhat.com>
Copy link
Contributor

@ljflores ljflores left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @pdvian! The modifications logically make sense, and I appreciate the changes to the variable names for better clarity. This would be good to test on either the LRC or Gibba if we go through another upgrade process.

@neha-ojha
Copy link
Member

Looks good @pdvian! The modifications logically make sense, and I appreciate the changes to the variable names for better clarity. This would be good to test on either the LRC or Gibba if we go through another upgrade process.

Do our upgrade tests in teuthology have any coverage for this?

@yuriw yuriw merged commit 15548e0 into ceph:master Apr 11, 2022
@pdvian
Copy link
Author

pdvian commented Apr 12, 2022

Looks good @pdvian! The modifications logically make sense, and I appreciate the changes to the variable names for better clarity. This would be good to test on either the LRC or Gibba if we go through another upgrade process.

Do our upgrade tests in teuthology have any coverage for this?

No. I suppose to add teuthology upgrade test for this scenario last week but failed to do so. Let me add it as earliest as possible. Thanks for reminder.

@pdvian
Copy link
Author

pdvian commented Jul 1, 2022

Looks good @pdvian! The modifications logically make sense, and I appreciate the changes to the variable names for better clarity. This would be good to test on either the LRC or Gibba if we go through another upgrade process.

Do our upgrade tests in teuthology have any coverage for this?

No. I suppose to add teuthology upgrade test for this scenario last week but failed to do so. Let me add it as earliest as possible. Thanks for reminder.

Opened #46162 to add teuthology upgrade test for this PR.

@epuertat
Copy link
Member

epuertat commented Jul 4, 2022

@avanthakkar @pereman2 in case this be a concern for the new ceph metrics exporter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants