New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quincy: Revert "ceph-exporter: cephadm changes" #51053
quincy: Revert "ceph-exporter: cephadm changes" #51053
Conversation
This reverts commit 5f04222. Issues were found with the ceph-exporter service in 17.2.6. --- - Ceph metrics could be duplicated: from the existing centralized mgr/prometheus exporter (legacy approach), and from the new ceph-exporter, which is deployed as a side-car container on each node and reports Ceph metrics for all colocated services in that node. - In a few cases (e.g.: ceph_mds_mem_cap+), where those metrics contain characters not supported by Prometheus (e.g.: +), Prometheus would complain about malformed metric names (however those metrics will still be properly reported via mgr/prometheus as ceph_mds_mem_cap_plus). - In a few cases (e.g.: ceph_mds_mem_cap_minus), with metrics ending in -, they would have different names in Prometheus (e.g.: ceph_mds_mem_cap_minus when coming from mgr/prometheus and ceph_mds_mem_cap_ when coming from ceph-exporter). --- Therefore we've decided it's best to revert cephadm's support for ceph-exporter in quincy Signed-off-by: Adam King <adking@redhat.com> Conflicts: src/cephadm/cephadm src/pybind/mgr/cephadm/tests/test_services.py
f0a5d68
to
524390a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I repeated the revert based on 17.2.6 tag instead of the tip of the quincy branch and came up with a nearly identical commit (just a tiny context change). The two conflicts in src/cephadm/cephadm
and src/pybind/mgr/cephadm/tests/test_services.py
are trivial, caused by ec770e8 and 4191614 respectively.
I don't think this invalidates your testing by any means, but technically this was an upgrade to the tip of the quincy branch, not to 17.2.6 + revert. Can we also try an upgrade from 17.2.5 just in case? |
yeah, the patches seem to be near identical. I've tried cherry-picking the commit in this PR directly on top of the 17.2.6 tag and there are no merge conflicts given the conflicts are the same for cherry-picking on 17.2.6 vs. current quincy branch (and as you mentioned they're all trivial conflicts anyway). |
The daemons landed on |
|
ACK, you're right. I only guaranteed the python code for cephadm/orchestrator stuff was on 17.2.6, but everything else was latest quincy. I'll redo it real quick with the changes on the actual 17.2.6 build |
retest with an actual 17.2.6 + reversion build
results are the same. Will test 17.2.5 to this build as well |
17.2.5 to 17.2.6 + reversion saw no issues either
no ceph-exporter support in 17.2.5 so this was just a standard upgrade |
Reruns of failed jobs: https://pulpito.ceph.com/adking-2023-05-01_12:44:48-orch:cephadm-wip-adk3-testing-2023-04-25-1440-quincy-distro-default-smithi/ After reruns, 9 failed jobs:
Overall, the mon crush location PR and any upgrade related PRs shouldn't be merged, but most others should be fine. |
Reruns of failed/dead jobs: https://pulpito.ceph.com/adking-2023-05-22_14:09:29-orch:cephadm-wip-adk3-testing-2023-05-21-1607-quincy-distro-default-smithi/ After reruns, 2 failed, 1 dead job:
Overall, nothing to block merging |
This reverts commit 5f04222.
Issues were found with the ceph-exporter service in 17.2.6.
Ceph metrics could be duplicated: from the existing centralized mgr/prometheus exporter (legacy approach), and from the new ceph-exporter, which is deployed as a side-car container on each node and reports Ceph metrics for all colocated services in that node.
In a few cases (e.g.: ceph_mds_mem_cap+), where those metrics contain characters not supported by Prometheus (e.g.: +), Prometheus would complain about malformed metric names (however those metrics will still be properly reported via mgr/prometheus as ceph_mds_mem_cap_plus).
In a few cases (e.g.: ceph_mds_mem_cap_minus), with metrics ending in -, they would have different names in Prometheus (e.g.: ceph_mds_mem_cap_minus when coming from mgr/prometheus and ceph_mds_mem_cap_ when coming from ceph-exporter).
Therefore we've decided it's best to revert cephadm's support for ceph-exporter in quincy
Conflicts:
src/cephadm/cephadm
src/pybind/mgr/cephadm/tests/test_services.py
I tested this change using an image based on the 17.2.6 release plus this reversion. The branch used for that can be found here https://github.com/adk3798/ceph/commits/17-2-6-cephadm-ceph-exporter-reversion. The testing deployed a 17.2.6 cluster, including the ceph-exporter service, and then upgraded to an image based on the linked branch that includes the ceph-exporter reversion. Before upgrade daemons and services were
Luckily, it seems the impact of the reversion was minimal. I saw a single log message
followed by cephadm discarding the ceph-exporter spec as it can't load it and the ceph-exporter daemons being subsequently removed as they no longer had a matching service spec. No other issues were seen. Daemons and services post upgrade were
which is the same as before but with the ceph-exporter daemons/service removed. Given the minimal impact of a single warning level log message (no health warnings were seen either) I think upgrading from a 17.2.6 cluster with the ceph-exporter deployed to one with cephadm's support for it reverted should be fine.
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows