Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nautilus: mgr/PyModule: fix missing tracebacks in handle_pyerror() #34627

Merged
merged 1 commit into from Apr 24, 2020

Conversation

shyukri
Copy link
Contributor

@shyukri shyukri commented Apr 18, 2020

backport tracker: https://tracker.ceph.com/issues/45043


backport of #34366
parent tracker: https://tracker.ceph.com/issues/44799

this backport was staged using ceph-backport.sh version 15.1.1.389
find the latest version at https://github.com/ceph/ceph/blob/master/src/script/ceph-backport.sh

In certain cases, errors raised in mgr modules don't actually result in a
proper traceback in the mgr log; all you see is a message like "'Hello'
object has no a ttribute 'dneasdfasdf'", but you have no idea where that
came from, which is a complete PITA to debug.

Here's what's going on: handle_pyerror() calls PyErr_Fetch() to get
information about the error that occurred, then passes that information
back to python's traceback.format_exception() function to get the traceback.
If we write code in an mgr module that explicitly raises an exception
(e.g.: 'raise RuntimeError("that didn't work")'), the error value returned
by PyErr_Fetch() is of type RuntimeError, and traceback.format_exception()
does the right thing.  If however we accidentally write code that's just
broken (e.g.: 'self.dneasdfasdf += 1'), the error value returned is not
an actual exception, it's just a string.  So traceback.format_exception()
freaks out with something like "'str' object has no attribute '__cause__'"
(which we don't actually ever see in the logs), which in turn dumps us in a
"catch (error_already_set const &)" block, which just prints out the
single line error string.

https://docs.python.org/3/c-api/exceptions.html#c.PyErr_NormalizeException
tells us that "Under certain circumstances, the values returned by
PyErr_Fetch() below can be “unnormalized”, meaning that *exc is a class
object but *val is not an instance of the same class.".  And that's exactly
the problem we're having here.  We're getting a 'str', not an Exception.
Adding a call to PyErr_NormalizeException() turns the value back into a
proper Exception type and traceback.format_exception() now always does the
right thing.

I've also added calls to peek_pyerror() in the catch blocks, so if anything
else ever somehow causes traceback.format_exception to fail, we'll at least
have an idea of what it is in the log.

Fixes: https://tracker.ceph.com/issues/44799
Signed-off-by: Tim Serong <tserong@suse.com>
(cherry picked from commit dee5980)
@smithfarm smithfarm added this to the nautilus milestone Apr 21, 2020
@smithfarm smithfarm added the mgr label Apr 21, 2020
@smithfarm smithfarm added nautilus-batch-1 nautilus point releases needs-qa labels Apr 21, 2020
@sebastian-philipp sebastian-philipp requested review from tserong and removed request for sebastian-philipp April 21, 2020 11:13
@yuriw
Copy link
Contributor

yuriw commented Apr 22, 2020

@yuriw yuriw merged commit 43cb7d6 into ceph:nautilus Apr 24, 2020
@shyukri shyukri deleted the wip-45043-nautilus branch April 24, 2020 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants