pacific: mgr, mgr/prometheus: Fix regression with prometheus metrics #46429

pdvian · 2022-05-30T13:16:27Z

backport tracker: https://tracker.ceph.com/issues/55309

backport of #45505
parent tracker: https://tracker.ceph.com/issues/54611

this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/master/src/script/ceph-backport.sh

The ceph dameons on host are inheriting ceph version from the host. This introduces a wrong interpretation in prometheus metrics as well as in dump_server. Each ceph daemon should represent it's own ceph version based on the ceph binary is use for that daemon. Consider a situation where partial upgrade is done on host, some daemons which are restarted should have ceph version tag as upgraded version and rest should have older ceph version but presently all inherites host version. In containerized environment, all daemons are using ceph version of last daemon registered as a service on the host. Fixes: https://tracker.ceph.com/issues/54611 Signed-off-by: Prashant D <pdhange@redhat.com> (cherry picked from commit aeca2e4)

ljflores · 2022-06-07T17:26:38Z

http://pulpito.front.sepia.ceph.com/yuriw-2022-05-31_21:35:41-rados-wip-yuri2-testing-2022-05-31-1300-pacific-distro-default-smithi/
http://pulpito.front.sepia.ceph.com/yuriw-2022-06-07_14:00:55-rados-wip-yuri2-testing-2022-06-03-1350-pacific-distro-default-smithi/

Failures, unrelated:
1. https://tracker.ceph.com/issues/53501
2. https://tracker.ceph.com/issues/55322
3. https://tracker.ceph.com/issues/55741
4. https://tracker.ceph.com/issues/53939
5. https://tracker.ceph.com/issues/52321
6. https://tracker.ceph.com/issues/54071
7. https://tracker.ceph.com/issues/51835
8. https://tracker.ceph.com/issues/49777
9. https://tracker.ceph.com/issues/54411
10. https://tracker.ceph.com/issues/54992

Details:
1. Exception when running 'rook' task. - Ceph - Orchestrator
2. test-restful.sh: mon metadata unable to be retrieved - Ceph - Mgr
3. cephadm/test_dashboard_e2e.sh: Unable to find element cd-modal .custom-control-label when testing on orchestrator/01-hosts.e2e-spec.ts - Ceph - Mgr - Dashboard
4. ceph-nfs-upgrade, pacific: Upgrade Paused due to UPGRADE_REDEPLOY_DAEMON: Upgrading daemon osd.0 on host smithi103 failed - Ceph - Orchestrator
5. qa/tasks/rook times out: 'check osd count' reached maximum tries (90) after waiting for 900 seconds - Ceph - Orchestrator
6. rados/cephadm/osds: Invalid command: missing required parameter hostname() - Ceph - Orchestrator
7. mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch) - Ceph - Mgr
8. test_pool_min_size: 'check for active or peered' reached maximum tries (5) after waiting for 25 seconds - Ceph - RADOS
9. mds_upgrade_sequence: "overall HEALTH_WARN 4 failed cephadm daemon(s); 1 filesystem is degraded; insufficient standby MDS daemons available; 33 daemons have recently crashed" during suites/fsstress.sh - Ceph - CephFS
10. cannot stat '/etc/containers/registries.conf': No such file or directory - Ceph - Orchestrator

pdvian added this to the pacific milestone May 30, 2022

pdvian added the core label May 30, 2022

github-actions bot added mgr monitoring pybind labels May 30, 2022

yuriw added the wip-yuri2-testing label May 31, 2022

neha-ojha requested a review from ljflores May 31, 2022 19:58

ljflores approved these changes May 31, 2022

View reviewed changes

ljflores added the needs-qa label May 31, 2022

yuriw merged commit f07e47a into ceph:pacific Jun 8, 2022

pdvian mentioned this pull request Jul 1, 2022

Revert "pacific: mgr, mgr/prometheus: Fix regression with prometheus metrics" #46921

Merged

pdvian mentioned this pull request Aug 19, 2022

pacific: mgr, mgr/prometheus: Fix regression with prometheus metrics #47693

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pacific: mgr, mgr/prometheus: Fix regression with prometheus metrics #46429

pacific: mgr, mgr/prometheus: Fix regression with prometheus metrics #46429

pdvian commented May 30, 2022

ljflores commented Jun 7, 2022

pacific: mgr, mgr/prometheus: Fix regression with prometheus metrics #46429

pacific: mgr, mgr/prometheus: Fix regression with prometheus metrics #46429

Conversation

pdvian commented May 30, 2022

ljflores commented Jun 7, 2022