Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc fixes #170

Merged
merged 3 commits into from Jan 8, 2020
Merged

Misc fixes #170

merged 3 commits into from Jan 8, 2020

Conversation

mikechristie
Copy link
Contributor

This fixes some misc bugs:

  1. If no targets are mapped to a gateway rbd-target-gw crashes.
  2. rbd-target-api crashes in python3 setups if there are blacklist entries.
  3. gateway deletion does not fully cleanup the kernel on the local node.

Mike Christie added 3 commits January 7, 2020 18:43
If no targets are setup or the target is not mapped to a gateway,
then the rbd-target-gw metrics call will crash due to trying to
access target.wwn or a empty list of tpgs. This just adds some
checks to make sure we have a target and that its mapped to a
gateway.

Signed-off-by: Mike Christie <mchristi@redhat.com>
In python3 check_output returns result as a byte type, where in
python2 we got a string. In python3 there is an arg for check_output
to control the encoding, but we do not have it in python2. So this
has us rely on the the "ceph osd blacklist ..." command returning a
proper error code if it fails instead of parsing the output.

Signed-off-by: Mike Christie <mchristi@redhat.com>
If we are running the gateway deletion command always try to do the full
(lio and config) deletion if the gateway to be deleted is the local
host. The force option where we only clean up the config config only
makes sense if a remote node is dead. If we only cleanup the config
object on the local node, then we would leave the target running in the
kernel when we can always clean it up.

Signed-off-by: Mike Christie <mchristi@redhat.com>
format(blacklisted_ip=blacklisted_ip,
client_name=conf.cluster_client_name,
cephconf=conf.cephconf),
stderr=subprocess.STDOUT, shell=True)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need check=True to have it raise an exception? Does the ceph CLI use a non-zero return code for the "isn't blacklisted" case? Could also probably keep the old code but add a results = results.decode('utf-8') to ensure bytes are converted to a string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "isn't blacklisted" case returns 0.

You don't need to pass in check=True when using check_output:

https://docs.python.org/3/library/subprocess.html

says:

If the return code was non-zero it raises a CalledProcessError. The CalledProcessError object will have the return code in the returncode attribute and any output in the output attribute.

This is equivalent to:

run(..., check=True, stdout=PIPE).stdout

I tested the failure case for both python 3 and 2 and you get an exception.

Yes, we could do a "results.decode('utf-8')" instead.

Let me know if you want me to go the decode() route since you know the "ceph osd" command output a lot better than me.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean that it was always (silently) throwing an exception if ceph failed? If you have tested it on the happy path of blacklist successfully removed and blacklist IP didn't exist, I'm happy w/ it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It was silently throwing an exception. The thing is its kind of hard to make it return failure under normal circumstances, so we probably never saw it. For probably the common/easy failure case where the cluster is not reachable the command just hangs/waits for at least 5 minutes (I gave up waiting for a failure after 5 min).

I had to test it by disabling the network when doing the "ceph osd blacklist rm" command and adding the arg "--connect-timeout 5", so the command would return failures within a normal time. The "ceph osd blacklist rm" command would then return failure and check_output would raise an exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants