Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pacific: mgr, mgr/prometheus: Fix regression with prometheus metrics #46429

Merged
merged 1 commit into from Jun 8, 2022

Conversation

pdvian
Copy link

@pdvian pdvian commented May 30, 2022

backport tracker: https://tracker.ceph.com/issues/55309


backport of #45505
parent tracker: https://tracker.ceph.com/issues/54611

this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/master/src/script/ceph-backport.sh

The ceph dameons on host are inheriting ceph version from the host.
This introduces a wrong interpretation in prometheus metrics as well
as in dump_server. Each ceph daemon should represent it's own
ceph version based on the ceph binary is use for that daemon.

Consider a situation where partial upgrade is done on host, some daemons
which are restarted should have ceph version tag as upgraded version
and rest should have older ceph version but presently all inherites
host version. In containerized environment, all daemons are
using ceph version of last daemon registered as a service on the host.

Fixes: https://tracker.ceph.com/issues/54611

Signed-off-by: Prashant D <pdhange@redhat.com>
(cherry picked from commit aeca2e4)
@ljflores
Copy link
Contributor

ljflores commented Jun 7, 2022

http://pulpito.front.sepia.ceph.com/yuriw-2022-05-31_21:35:41-rados-wip-yuri2-testing-2022-05-31-1300-pacific-distro-default-smithi/
http://pulpito.front.sepia.ceph.com/yuriw-2022-06-07_14:00:55-rados-wip-yuri2-testing-2022-06-03-1350-pacific-distro-default-smithi/

Failures, unrelated:
1. https://tracker.ceph.com/issues/53501
2. https://tracker.ceph.com/issues/55322
3. https://tracker.ceph.com/issues/55741
4. https://tracker.ceph.com/issues/53939
5. https://tracker.ceph.com/issues/52321
6. https://tracker.ceph.com/issues/54071
7. https://tracker.ceph.com/issues/51835
8. https://tracker.ceph.com/issues/49777
9. https://tracker.ceph.com/issues/54411
10. https://tracker.ceph.com/issues/54992

Details:
1. Exception when running 'rook' task. - Ceph - Orchestrator
2. test-restful.sh: mon metadata unable to be retrieved - Ceph - Mgr
3. cephadm/test_dashboard_e2e.sh: Unable to find element cd-modal .custom-control-label when testing on orchestrator/01-hosts.e2e-spec.ts - Ceph - Mgr - Dashboard
4. ceph-nfs-upgrade, pacific: Upgrade Paused due to UPGRADE_REDEPLOY_DAEMON: Upgrading daemon osd.0 on host smithi103 failed - Ceph - Orchestrator
5. qa/tasks/rook times out: 'check osd count' reached maximum tries (90) after waiting for 900 seconds - Ceph - Orchestrator
6. rados/cephadm/osds: Invalid command: missing required parameter hostname() - Ceph - Orchestrator
7. mgr/DaemonServer.cc: FAILED ceph_assert(pending_service_map.epoch > service_map.epoch) - Ceph - Mgr
8. test_pool_min_size: 'check for active or peered' reached maximum tries (5) after waiting for 25 seconds - Ceph - RADOS
9. mds_upgrade_sequence: "overall HEALTH_WARN 4 failed cephadm daemon(s); 1 filesystem is degraded; insufficient standby MDS daemons available; 33 daemons have recently crashed" during suites/fsstress.sh - Ceph - CephFS
10. cannot stat '/etc/containers/registries.conf': No such file or directory - Ceph - Orchestrator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants