Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/cephadm: Remove gateway.conf from iscsi pool when service is removed #40313

Merged
merged 1 commit into from May 4, 2021

Conversation

jmolmo
Copy link
Member

@jmolmo jmolmo commented Mar 22, 2021

fixes: https://tracker.ceph.com/issues/48930

Signed-off-by: Juan Miguel Olmo Martínez jolmomar@redhat.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@jmolmo jmolmo requested a review from a team March 22, 2021 15:15
@liewegas liewegas requested a review from dillaman March 22, 2021 15:21
Copy link

@dillaman dillaman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@mgfritch mgfritch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also remove the iscsi-gateway.key and iscsi-gateway.crt config keys during purge?

@dillaman
Copy link

dillaman commented Mar 23, 2021

also remove the iscsi-gateway.key and iscsi-gateway.crt config keys during purge?

I thought those are pulled from the MGR config-key store now? or are you referring to pulling them out of the MGR config-key store?

with self.mgr.rados.open_ioctx(spec.pool) as ioctx:
ioctx.remove_object("gateway.conf")
logger.debug(f'<gateway.conf> removed from {spec.pool}')
except rados.ObjectNotFound:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also allow other exceptions here. Like access forbidden or something. I don't see a point in retrying calling purge till it succeeds.

Suggested change
except rados.ObjectNotFound:
except rados.Errror:
logger.exception('failed to purge {service_name}')

@mgfritch
Copy link
Contributor

also remove the iscsi-gateway.key and iscsi-gateway.crt config keys during purge?

I thought those are pulled from the MGR config-key store now? or are you referring to pulling them out of the MGR config-key store?

yeah, remove them from the mgr config-key store.

similar to this:

def purge(self, service_name: str) -> None:
self.mgr.check_mon_command({
'prefix': 'config rm',
'who': utils.name_to_config_section(service_name),
'name': 'rgw_realm',
})
self.mgr.check_mon_command({
'prefix': 'config rm',
'who': utils.name_to_config_section(service_name),
'name': 'rgw_zone',
})
self.mgr.check_mon_command({
'prefix': 'config-key rm',
'key': f'rgw/cert/{service_name}',
})

spec = cast(IscsiServiceSpec, self.mgr.spec_store[service_name].spec)
try:
# remove service configuration from the pool
with self.mgr.rados.open_ioctx(spec.pool) as ioctx:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next issue: this might hang indefinitely, if the cluster is in trouble. Doesn't sounds like a problem, but users need to be able to fix their cluster using cephadm and this is going to fail, if we're hanging indefinitely here.

wdyt of testing for HEALTH_OK here? @adk3798

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid HEALTH_OK tests; better to have a timeout. librados can do timeouts but it's not super well supported, and this is a shared client instance; it might be better to shell out to the rados CLI to do this.

@jmolmo jmolmo requested review from liewegas and sebastian-philipp and removed request for dillaman April 7, 2021 12:46
spec = cast(IscsiServiceSpec, self.mgr.spec_store[service_name].spec)
try:
# remove service configuration from the pool
rm_pool_thread = Thread(target=remove_pool, args=(self.mgr, spec.pool))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to shell out. just a thread might not help us.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you suggest to shell out? multiprocessing module will add another mgr instance into the container and can produce problems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that to "shell out" the rados command only can be achieved through "_run_cephadm" ... ( or making possible in the mgr container to use ceph commands). if we try to execute the "rados" command directly in the mgr container we have this:

[ceph: root@cephlab2-node-00 /]# rados  -p myscsi ls
2021-04-12T14:45:42.483+0000 7f90c006cd00 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2021-04-12T14:45:42.483+0000 7f90c006cd00 -1 AuthRegistry(0x55fd4904f440) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
2021-04-12T14:45:42.484+0000 7f90c006cd00 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2021-04-12T14:45:42.484+0000 7f90c006cd00 -1 AuthRegistry(0x7ffd0e55ad00) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
2021-04-12T14:45:42.485+0000 7f90b821d700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
2021-04-12T14:45:42.486+0000 7f90b8a1e700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
2021-04-12T14:45:42.486+0000 7f90b921f700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
2021-04-12T14:45:42.486+0000 7f90c006cd00 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication
failed to fetch mon config (--no-mon-config to skip)
 
 
 
[ceph: root@cephlab2-node-00 /]# rados -c /etc/ceph/ceph.conf -p myscsi ls  
2021-04-12T14:47:23.129+0000 7f5b8cd5bd00 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2021-04-12T14:47:23.129+0000 7f5b8cd5bd00 -1 AuthRegistry(0x562c24dfb440) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
2021-04-12T14:47:23.130+0000 7f5b8cd5bd00 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
2021-04-12T14:47:23.130+0000 7f5b8cd5bd00 -1 AuthRegistry(0x7fff123a85e0) no keyring found at /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
2021-04-12T14:47:23.130+0000 7f5b85f0e700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
2021-04-12T14:47:23.131+0000 7f5b84f0c700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
2021-04-12T14:47:23.131+0000 7f5b8570d700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
2021-04-12T14:47:23.131+0000 7f5b8cd5bd00 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication
failed to fetch mon config (--no-mon-config to skip) 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the preferred implementation?
_run_cephadm
or
make mgr container able to execute ceph/rados commands?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-            cmd = ['radosgw-admin',
-                   '--key=%s' % keyring,
-                   '--user', 'rgw.%s' % rgw_id,
-                   'realm', 'list',
-                   '--format=json']
-            result = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
-            out = result.stdout
-            if not out:
-                return []
-            try:
-                j = json.loads(out)
-                return j.get('realms', [])
-            except Exception:
-                raise OrchestratorError('failed to parse realm info')

for the keyring, pass the keyring file for the mgr itself, which should be rados -k /var/lib/ceph/mgr/ceph-{{mgr_id}}/keyring -i {{mgr_id}} ..

hosts=self.mgr._hosts_with_daemon_inventory(),
daemons=[],
networks=self.mgr.cache.networks)
_, slots, _ = ha.place()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we just look at the existing daemons? self.mgr.cache.find_daemons_by_service(...)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we reach this part, all the iscsi daemons have been removed. The only thing we have about the deleted iscsi service is just the spec file. To calculate again the hosts assignment is the only way i found to be able to remove the different config setting that references hosts where the iscsi daemons were deployed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put it in the post_remove() method instead then? That takes a DaemonDescription for the just-removed daemon.

@jmolmo
Copy link
Member Author

jmolmo commented Apr 26, 2021

jenkins test make check

# remove service configuration from the pool
try:
mgr_id = self.mgr.get_mgr_id()
keyring = f'/var/lib/ceph/mgr/ceph-{mgr_id}/keyring'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just ended up doing this for the HA NFS PR.. .see 5a028ae#diff-b13a6b9b90f10d54533fcaa8b7d9db7f8d24ae39bb52cd611c28c9c39cb429ecR142

  • fetch the config option so it works with vstart
  • use the list form of subprocess.run instead of a string


# remove iscsi cert and key from ceph config
for iscsi_key, value in iscsi_config_dict.items():
if daemon.hostname in iscsi_key:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this check for a / or something? otherwise if the host is say isc every key will be removed

Remove gateway.conf from iscsi pool when service is removed
Remove iscsi ceph config keys
Remove iscsi dashboard gateways config from dashboard

fixes: https://tracker.ceph.com/issues/48930

Signed-off-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>
@jmolmo
Copy link
Member Author

jmolmo commented May 4, 2021

Errors in:
https://pulpito.ceph.com/kchai-2021-04-30_10:06:45-rados-wip-kefu-testing-2021-04-30-1335-distro-basic-smithi/

See:
https://pulpito.ceph.com/kchai-2021-04-30_10:06:45-rados-wip-kefu-testing-2021-04-30-1335-distro-basic-smithi/6086127/
https://pulpito.ceph.com/kchai-2021-04-30_10:06:45-rados-wip-kefu-testing-2021-04-30-1335-distro-basic-smithi/6086137/

Details:

2021-04-30T04:21:05.860 DEBUG:teuthology.orchestra.run.smithi141:> sudo /home/ubuntu/cephtest/cephadm --image docker.io/ceph/daemon-base:latest-pacific -v bootstrap --fsid 7ac5e406-a96b-11eb-821c-001a4aab830c --config /home/ubuntu/cephtest/seed.ceph.conf --output-config /etc/ceph/ceph.conf --output-keyring /etc/ceph/ceph.client.admin.keyring --output-pub-ssh-key /home/ubuntu/cephtest/ceph.pub --mon-id a --mgr-id y --orphan-initial-daemons --skip-monitoring-stack --mon-ip 172.21.15.141 --skip-admin-label && sudo chmod +r /etc/ceph/ceph.client.admin.keyring
2021-04-30T04:21:05.885 INFO:journalctl@ceph.mgr.y.smithi141.stdout:-- Logs begin at Fri 2021-04-30 04:13:11 UTC. --
2021-04-30T04:21:06.118 INFO:teuthology.orchestra.run.smithi141.stderr:usage: cephadm [-h] [--image IMAGE] [--docker] [--data-dir DATA_DIR]
2021-04-30T04:21:06.118 INFO:teuthology.orchestra.run.smithi141.stderr:               [--log-dir LOG_DIR] [--logrotate-dir LOGROTATE_DIR]
2021-04-30T04:21:06.118 INFO:teuthology.orchestra.run.smithi141.stderr:               [--unit-dir UNIT_DIR] [--verbose] [--timeout TIMEOUT]
2021-04-30T04:21:06.118 INFO:teuthology.orchestra.run.smithi141.stderr:               [--retry RETRY] [--env ENV] [--no-container-init]
2021-04-30T04:21:06.119 INFO:teuthology.orchestra.run.smithi141.stderr:               {version,pull,inspect-image,ls,list-networks,adopt,rm-daemon,rm-cluster,run,shell,enter,ceph-volume,unit,logs,bootstrap,deploy,check-host,prepare-host,add-repo,rm-repo,install,registry-login,gather-facts,exporter,host-maintenance,verify-prereqs}
2021-04-30T04:21:06.119 INFO:teuthology.orchestra.run.smithi141.stderr:               ...
2021-04-30T04:21:06.119 INFO:teuthology.orchestra.run.smithi141.stderr:cephadm: error: unrecognized arguments: --skip-admin-label

It seems that cephadm binary installed is old and does not have the new parameter "skip-admin-label" introduced in #40941

2021-04-30T10:27:36.607 INFO:teuthology.orchestra.run.smithi002.stdout:================================================================================
2021-04-30T10:27:36.607 INFO:teuthology.orchestra.run.smithi002.stdout: Package     Arch       Version                           Repository       Size
2021-04-30T10:27:36.607 INFO:teuthology.orchestra.run.smithi002.stdout:================================================================================
2021-04-30T10:27:36.607 INFO:teuthology.orchestra.run.smithi002.stdout:Installing:
2021-04-30T10:27:36.607 INFO:teuthology.orchestra.run.smithi002.stdout: cephadm     noarch     2:16.2.1-257.g717ce59b.el8        ceph-noarch      71 k

@sebastian-philipp
Copy link
Contributor

When backporting, please include 41181

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants