New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr/cephadm: Remove gateway.conf from iscsi pool when service is removed #40313
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also remove the iscsi-gateway.key
and iscsi-gateway.crt
config keys during purge?
I thought those are pulled from the MGR config-key store now? or are you referring to pulling them out of the MGR config-key store? |
with self.mgr.rados.open_ioctx(spec.pool) as ioctx: | ||
ioctx.remove_object("gateway.conf") | ||
logger.debug(f'<gateway.conf> removed from {spec.pool}') | ||
except rados.ObjectNotFound: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also allow other exceptions here. Like access forbidden or something. I don't see a point in retrying calling purge
till it succeeds.
except rados.ObjectNotFound: | |
except rados.Errror: | |
logger.exception('failed to purge {service_name}') |
yeah, remove them from the mgr config-key store. similar to this: ceph/src/pybind/mgr/cephadm/services/cephadmservice.py Lines 805 to 819 in 7e29d0a
|
7e29d0a
to
db957fe
Compare
spec = cast(IscsiServiceSpec, self.mgr.spec_store[service_name].spec) | ||
try: | ||
# remove service configuration from the pool | ||
with self.mgr.rados.open_ioctx(spec.pool) as ioctx: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next issue: this might hang indefinitely, if the cluster is in trouble. Doesn't sounds like a problem, but users need to be able to fix their cluster using cephadm and this is going to fail, if we're hanging indefinitely here.
wdyt of testing for HEALTH_OK here? @adk3798
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should avoid HEALTH_OK tests; better to have a timeout. librados can do timeouts but it's not super well supported, and this is a shared client instance; it might be better to shell out to the rados CLI to do this.
db957fe
to
725e1fa
Compare
spec = cast(IscsiServiceSpec, self.mgr.spec_store[service_name].spec) | ||
try: | ||
# remove service configuration from the pool | ||
rm_pool_thread = Thread(target=remove_pool, args=(self.mgr, spec.pool)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to shell out. just a thread might not help us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you suggest to shell out? multiprocessing module will add another mgr instance into the container and can produce problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think that to "shell out" the rados command only can be achieved through "_run_cephadm" ... ( or making possible in the mgr container to use ceph commands). if we try to execute the "rados" command directly in the mgr container we have this:
[ceph: root@cephlab2-node-00 /]# rados -p myscsi ls
2021-04-12T14:45:42.483+0000 7f90c006cd00 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2021-04-12T14:45:42.483+0000 7f90c006cd00 -1 AuthRegistry(0x55fd4904f440) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
2021-04-12T14:45:42.484+0000 7f90c006cd00 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2021-04-12T14:45:42.484+0000 7f90c006cd00 -1 AuthRegistry(0x7ffd0e55ad00) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
2021-04-12T14:45:42.485+0000 7f90b821d700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
2021-04-12T14:45:42.486+0000 7f90b8a1e700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
2021-04-12T14:45:42.486+0000 7f90b921f700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
2021-04-12T14:45:42.486+0000 7f90c006cd00 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication
failed to fetch mon config (--no-mon-config to skip)
[ceph: root@cephlab2-node-00 /]# rados -c /etc/ceph/ceph.conf -p myscsi ls
2021-04-12T14:47:23.129+0000 7f5b8cd5bd00 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2021-04-12T14:47:23.129+0000 7f5b8cd5bd00 -1 AuthRegistry(0x562c24dfb440) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
2021-04-12T14:47:23.130+0000 7f5b8cd5bd00 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2021-04-12T14:47:23.130+0000 7f5b8cd5bd00 -1 AuthRegistry(0x7fff123a85e0) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
2021-04-12T14:47:23.130+0000 7f5b85f0e700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
2021-04-12T14:47:23.131+0000 7f5b84f0c700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
2021-04-12T14:47:23.131+0000 7f5b8570d700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
2021-04-12T14:47:23.131+0000 7f5b8cd5bd00 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication
failed to fetch mon config (--no-mon-config to skip)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the preferred implementation?
_run_cephadm
or
make mgr container able to execute ceph/rados commands?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- cmd = ['radosgw-admin',
- '--key=%s' % keyring,
- '--user', 'rgw.%s' % rgw_id,
- 'realm', 'list',
- '--format=json']
- result = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
- out = result.stdout
- if not out:
- return []
- try:
- j = json.loads(out)
- return j.get('realms', [])
- except Exception:
- raise OrchestratorError('failed to parse realm info')
for the keyring, pass the keyring file for the mgr itself, which should be rados -k /var/lib/ceph/mgr/ceph-{{mgr_id}}/keyring -i {{mgr_id}} ..
hosts=self.mgr._hosts_with_daemon_inventory(), | ||
daemons=[], | ||
networks=self.mgr.cache.networks) | ||
_, slots, _ = ha.place() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we just look at the existing daemons? self.mgr.cache.find_daemons_by_service(...)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we reach this part, all the iscsi daemons have been removed. The only thing we have about the deleted iscsi service is just the spec file. To calculate again the hosts assignment is the only way i found to be able to remove the different config setting that references hosts where the iscsi daemons were deployed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put it in the post_remove()
method instead then? That takes a DaemonDescription
for the just-removed daemon.
725e1fa
to
3f1e733
Compare
jenkins test make check |
# remove service configuration from the pool | ||
try: | ||
mgr_id = self.mgr.get_mgr_id() | ||
keyring = f'/var/lib/ceph/mgr/ceph-{mgr_id}/keyring' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just ended up doing this for the HA NFS PR.. .see 5a028ae#diff-b13a6b9b90f10d54533fcaa8b7d9db7f8d24ae39bb52cd611c28c9c39cb429ecR142
- fetch the config option so it works with vstart
- use the list form of subprocess.run instead of a string
|
||
# remove iscsi cert and key from ceph config | ||
for iscsi_key, value in iscsi_config_dict.items(): | ||
if daemon.hostname in iscsi_key: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this check for a /
or something? otherwise if the host is say isc
every key will be removed
Remove gateway.conf from iscsi pool when service is removed Remove iscsi ceph config keys Remove iscsi dashboard gateways config from dashboard fixes: https://tracker.ceph.com/issues/48930 Signed-off-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>
3f1e733
to
1b9e3ed
Compare
See: Details:
It seems that cephadm binary installed is old and does not have the new parameter "skip-admin-label" introduced in #40941
|
When backporting, please include 41181 |
fixes: https://tracker.ceph.com/issues/48930
Signed-off-by: Juan Miguel Olmo Martínez jolmomar@redhat.com
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox