New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr/cephadm: allow setting mon crush locations through mon service spec #49103
Conversation
9059634
to
f16f2d2
Compare
f16f2d2
to
b97234b
Compare
This is no longer WIP and is now open for full review. Docs and teuthology test have been added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're CRUSHING this ;)
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
b97234b
to
8c25b17
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs LGTM
This comment was marked as resolved.
This comment was marked as resolved.
8c25b17
to
4d8cc53
Compare
4d8cc53
to
c0e2bc1
Compare
qa/suites/orch/cephadm/workunits/task/test_set_mon_crush_locations.yaml
Outdated
Show resolved
Hide resolved
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
c0e2bc1
to
89a85c5
Compare
qa/suites/orch/cephadm/workunits/task/test_set_mon_crush_locations.yaml
Outdated
Show resolved
Hide resolved
89a85c5
to
1dd9bf1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few minor comments, nothing blocking IMO, and most could be done in follow up work.
I took a specific look at the embedded shell script int the test. I think it looks OK, but it is complex. I honestly probably would have looked for a way to do it in python rather than bash... but I see nothing wrong with the current version.
current_crush_locs = [m['crush_location'] for m in quorum_status['monmap']['mons'] if m['name'] == dd.daemon_id][0] | ||
except Exception as e: | ||
logger.info(f'Failed setting crush location for mon {dd.daemon_id}: {e}') | ||
desired_crush_locs = '{' + ','.join(mon_spec.crush_locations[dd.hostname]) + '}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it matter if, for example, the order changed?
Perhaps if instead of creating a string of the desired crush locations, we parse the current_crush_locs into a comparable object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the order matters, so you're probably right that we should check for that and only set if we really are missing locations, not just they're in a different order. Want to push this into a follow up PR though.
crush_locations: | ||
host1: | ||
- datacenter=a | ||
host2: | ||
- datacenter=b | ||
- rack=2 | ||
host3: | ||
- datacenter=a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, a future enhancement that might avoid having to spell out every host would be to allow
host labels to be translated into the crush rules. Then you could label your hosts and then have the labels select the crush rules. I think that would be a bit more expressive/scalable. Something for another day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely a good idea and I don't think would be too hard to build on top of the current work. Will try to do this in a followup
…ns field In order to allow having cephadm set the crush locations for the mons. For helping with setting up stretch mode with a cephadm cluster Signed-off-by: Adam King <adking@redhat.com>
…n in spec Necessary to do this for stretch mode tiebreaker mon replacement Fixes: https://tracker.ceph.com/issues/58101 Signed-off-by: Adam King <adking@redhat.com>
Previously, the service config function was only called when we deploy a new daemon for that service. That meant that updates to the spec such as changing a cert that don't affect the daemon placement wouldn't trigger the service level config to happen again. With this change, we now mark the service as needing its config function ran if a daemon for the service is added/removed or if the spec is updated. Fixes: https://tracker.ceph.com/issues/58100 Signed-off-by: Adam King <adking@redhat.com>
The part of this that added the --set-crush-location flag when deploying the mon was handled in another commit. This piece is to finish the functionality by having cephadm set the location through commands to handle when multiple bucket=loc pairs are specified for a single monitor Fixes: https://tracker.ceph.com/issues/58101 Signed-off-by: Adam King <adking@redhat.com>
Trying to add a feature where mon crush locations can be set through the orchestrator using the mon service spec. This is meant to be a test for that. Signed-off-by: Adam King <adking@redhat.com>
Signed-off-by: Adam King <adking@redhat.com>
1dd9bf1
to
cd96b70
Compare
Reruns of failed/dead jobs: https://pulpito.ceph.com/adking-2023-04-07_13:59:11-orch:cephadm-wip-adk-testing-2023-04-04-2123-distro-default-smithi/ After reruns: 6 failed and 2 dead jobs
Specifically for this PR, the new test for setting the crush locations passed |
Reruns of all but upgrade-with-workload jobs (which are known to timeout after multiple hours currently, so reruns would waste a lot of resources): https://pulpito.ceph.com/adking-2023-04-22_17:05:01-orch:cephadm-wip-adk-testing-2023-04-20-0105-distro-default-smithi/ After reruns, 2 dead and 5 failed jobs:
|
mgr/cephadm: allow setting mon crush locations through mon service spec Reviewed-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com> Reviewed-by: Redouane Kachach <rkachach@redhat.com>
Fixes: https://tracker.ceph.com/issues/58101
Fixes: https://tracker.ceph.com/issues/58100
A few things this handles.
would tell cephadm any mon deployed on vm-00 should have crush location "datacenter=a", on vm-01 "datacenter=b,rack=2" etc.
What is still missing/unsolved
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows