-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr/cephadm: HA for RGW endpoints #38615
Conversation
jenkins test make check |
84fefc5
to
6efc9b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
- Next task: A theutology test to cover ha-rgw service
6ece807
to
af4a252
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, nice work, just some minor nits.
def populate_files(config_dir, config_files, uid, gid): | ||
# type: (str, Dict, int, int) -> None | ||
"""create config files for different services""" | ||
for fname in config_files: | ||
config_file = os.path.join(config_dir, fname) | ||
config_content = dict_get_join(config_files, fname) | ||
logger.info('Write file: %s' % (config_file)) | ||
with open(config_file, 'w') as f: | ||
os.fchown(f.fileno(), uid, gid) | ||
os.fchmod(f.fileno(), 0o600) | ||
f.write(config_content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please make this refactorization a seperate commit. I really want to avoid merging a single HUGE commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sebastian-philipp : We are in a hurry with this feature now. I think that we are discussing this feature ( in different PRS) for almost two months. In the first revision you pointed several important points that i think have been addressed.
This re-factorization was one of the first things you suggested us (not as a separate PR) . It would have been very easy to move this to another PR before Christmas, but in this moment, this movement can carry us to more delays and to finish with this nice feature outside of our release.
We have tested this several times manually and passed the Integration tests... so I have to request to do only the minimum changes needed.
if daemon_spec.daemon_type == 'haproxy': | ||
haspec = cast(HA_RGWSpec, daemon_spec.spec) | ||
if haspec.haproxy_container_image: | ||
image = haspec.haproxy_container_image | ||
|
||
if daemon_spec.daemon_type == 'keepalived': | ||
haspec = cast(HA_RGWSpec, daemon_spec.spec) | ||
if haspec.keepalived_container_image: | ||
image = haspec.keepalived_container_image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we please avoid adding those `if daemon_type == foobar? hacks here please? I really invested a lot of time in cleaning up this here. having those hacks creeping is super painful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is a problem but with haproxy and keepalived being the 4th and 5th daemon types (container, osd and cephadm-exporter as well) to now need something like this (special daemon-specific operations during _create_daemon) it feels like there needs to be some generic way to allow something like this that doesn't clutter up the function. I personally would rather that refactor be its own PR rather than included here with this already being so large.
daemon_id = self.mgr.get_unique_name(daemon_type, host, daemons, | ||
prefix=spec.service_id, | ||
forcename=name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting, service_id
starts with rgw-ha
, what does daemon_id
look like then here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log statement a few lines later that prints daemon_type.daemon_id came out like
Placing haproxy.haproxy_for_rgw.vm-00.yaljjn on host vm-00
for haproxy and
Placing keepalived.haproxy_for_rgw.vm-00.ymborn on host vm-00
for keepalived
on host vm-00 coming from a spec file with
service_type: ha-rgw
service_id: haproxy_for_rgw
It doesn't seem like ha-rgw
appears in the id anywhere unless you explicitly include it as part of your service id in the spec
# ha-rgw needs definitve host list to create keepalived config files | ||
# if definitive host list has changed, all ha-rgw daemons must get new | ||
# config, including those that are already on the correct host and not | ||
# going to be deployed | ||
def update_ha_rgw_definitive_hosts(self, spec: ServiceSpec, hosts: List[HostPlacementSpec], | ||
add_hosts: Set[HostPlacementSpec]) -> HA_RGWSpec: | ||
spec = cast(HA_RGWSpec, spec) | ||
if not (set(hosts) == set(spec.definitive_host_list)): | ||
spec.definitive_host_list = hosts | ||
ha_rgw_daemons = self.mgr.cache.get_daemons_by_service(spec.service_name()) | ||
for daemon in ha_rgw_daemons: | ||
if daemon.hostname in [h.hostname for h in hosts] and daemon.hostname not in add_hosts: | ||
self.mgr.cache.schedule_daemon_action( | ||
daemon.hostname, daemon.name(), 'reconfig') | ||
return spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I though this is why we added
ceph/src/pybind/mgr/cephadm/module.py
Line 426 in d8675ec
self.requires_post_actions: Set[str] = set() |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the definitive hosts are used in creating the keepalived config file (it needs to know where other keepalived are being placed) that it uses on deployment. That means, if done using post actions, every single keepalived daemon would need to be reconfigured right after it is initially deployed to get the config file with the location of the other daemons. Doing it the way it is here allows new keepalived daemons to get this info before they are created so you only have to reconfig already existing daemons if the set of hosts ha-rgw is on changes. It should still be functional using post actions but all new keepalived daemons would effectively have to be deployed twice. This is sort of an odd case as we would ideally like to set this attribute in the small window between when placement is done and when the daemons are actually created.
Or, should we just merge it as it is? |
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
Cephadm deploying keepalived and HAproxy for providing High availability for RGW endpoints Fixes: https://tracker.ceph.com/issues/45116 Signed-off-by: Daniel-Pivonka <dpivonka@redhat.com> Signed-off-by: Adam King <adking@redhat.com> Signed-off-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>
thank you @jmolmo |
New HA_RGW service provides HA for RGW endpoints using keepalived and haproxy
Signed-off-by: Adam King adking@redhat.com
Fixes: https://tracker.ceph.com/issues/45116
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox