New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/dashboard: Improve exception handling #21066

Merged
merged 9 commits into from May 9, 2018

Conversation

Projects
None yet
6 participants
@sebastian-philipp
Copy link
Member

sebastian-philipp commented Mar 27, 2018

  • Added dashboard_exception_handler() to catch our exceptions.
  • Changed the behaviour of ViewCache to raise exceptions
  • Added RadosReturnError raised in send_command()
  • Added unit tests

Signed-off-by: Sebastian Wagner sebastian.wagner@suse.com

TODO:

  • Split into multiple commits
  • Run API Tests
  • Fix the UI (although, I'd like to postpone this, until I get reviews for the backend implementation)

Also

Right now, we're building urls like rgw/proxy/{path:.*}/:path and this is of course broken. This fixes http://tracker.ceph.com/issues/23823 and replaces #21642

@sebastian-philipp sebastian-philipp requested review from s0nea and rjfd Mar 27, 2018

@sebastian-philipp sebastian-philipp force-pushed the sebastian-philipp:dashboard_error_handling branch 2 times, most recently from d449641 to 66920ee Mar 28, 2018

@sebastian-philipp sebastian-philipp force-pushed the sebastian-philipp:dashboard_error_handling branch 2 times, most recently from 528dcb1 to c5617b8 Apr 5, 2018

@sebastian-philipp sebastian-philipp force-pushed the sebastian-philipp:dashboard_error_handling branch 3 times, most recently from c37d1ff to 67da525 Apr 13, 2018

@sebastian-philipp sebastian-philipp force-pushed the sebastian-philipp:dashboard_error_handling branch 3 times, most recently from da6310e to fa6d11b Apr 24, 2018

@sebastian-philipp sebastian-philipp force-pushed the sebastian-philipp:dashboard_error_handling branch from fa6d11b to f20e948 Apr 25, 2018

@sebastian-philipp

This comment has been minimized.

Copy link
Member Author

sebastian-philipp commented Apr 25, 2018

Ran 54 tests in 1829.976s

@sebastian-philipp sebastian-philipp removed the DNM label Apr 25, 2018

@sebastian-philipp sebastian-philipp changed the title [WIP] mgr/dashboard: Improve exception handling mgr/dashboard: Improve exception handling Apr 25, 2018

@sebastian-philipp

This comment has been minimized.

Copy link
Member Author

sebastian-philipp commented Apr 25, 2018

Right. Regarding the status code 400 vs 409. I couldn't find real best-practice recommending 409 for general API errors. Instead, I found several recommending 400:

Regarding 409. There is a thread where someone uses 409. But I disagree with him.

@rjfd
Copy link
Contributor

rjfd left a comment

Overall I think this is the right approach. But with a change in the focus of where to declare which exception handler to use.

Currently the focus is on the controller method, but I think usually we want to use the same exception handlers for all methods of a controller class, and therefore the focus should be on the controller class.

The idea would be to declare a list of exceptions handlers in the controller class, and if some controller method exceptionally requires a different exception handler, then use a decorator on that method to indicate a different exception handler.

@@ -163,53 +163,41 @@ def _task_request(cls, method, url, data, timeout):
task_name = res['name']
task_metadata = res['metadata']

class Waiter(threading.Thread):

This comment has been minimized.

@rjfd

rjfd Apr 27, 2018

Contributor

why are you changing this part? did you find any bug while using a thread to poll for the task to finish?

This comment has been minimized.

@sebastian-philipp

sebastian-philipp Apr 27, 2018

Author Member

This waiter-thread obscured the function and thus made debugging of error codes more complex. As the thread is not needed, I removed it.

This comment has been minimized.

@rjfd

rjfd Apr 27, 2018

Contributor

But the new code changed a bit the semantics. While before the timeout value really represents the maximum time we will wait for the task to finish, with the new code, you may wait for much more time because you need to take into account the latency of each HTTP request.

Moreover, this is code that will be run in teuthology, and if you were having problems with debug information, you should instead improve the logging information of existing code in order for everyone to get better information when investigating testing failures.

This comment has been minimized.

@sebastian-philipp

sebastian-philipp Apr 27, 2018

Author Member

No. We shouldn't keep code that is not really necessary. Especially threads add lots of complexity. We should keep all code clean: Production code and QA code.

"detail": str(e), # E.g. "[errno -42] <some error message>"
"component": "rbd", # this can be null to represent a global error code
"code": 3, # Or a error name, e.g. "code": "some_error_key"
"errno": e.errno, # may be omitted

This comment has been minimized.

@rjfd

rjfd Apr 27, 2018

Contributor

why do we need to send the "errno" if we already have "code". I think the frontend will always look at the "code" key and will never look at the "errno" key.

This comment has been minimized.

@sebastian-philipp

sebastian-philipp Apr 30, 2018

Author Member

done.

"errno": e.errno, # may be omitted
}

This is a list of known error codes:

This comment has been minimized.

@rjfd

rjfd Apr 27, 2018

Contributor

The "Comment" column of the table below is just describing the OS errors, which is not very useful. I would even suggest to remove this table from the documentation because it's very hard to maintain.

But the frontend people would like to have this table to know what messages to show to the user, so what I would like to see would be, for instance, for "RBD" error code "17" a comment saying "Image already exists", or for error code "39" a comment saying "Image still has snapshots".

This comment has been minimized.

@sebastian-philipp

sebastian-philipp Apr 27, 2018

Author Member

Is there somewhere am (official) documentation that code 39 means "Image still has snapshots"?

This comment has been minimized.

@rjfd

rjfd Apr 27, 2018

Contributor

No, that's why I favor the option of completely removing that error code table from the documentation.

This comment has been minimized.

@sebastian-philipp

sebastian-philipp Apr 27, 2018

Author Member

@tspmelo How would you like to have this table of error messages?

This comment has been minimized.

@sebastian-philipp

sebastian-philipp Apr 30, 2018

Author Member

This is actually not outdated!

@@ -454,14 +482,14 @@ def isfunction(m):

methods = []
for k, v in cls._method_mapping.items():
if not k[1] and hasattr(cls, v[0]):
if not k[1] and (hasattr(cls, v[0]) or k[1] in ('GET', 'POST')):

This comment has been minimized.

@rjfd

rjfd Apr 27, 2018

Contributor

I don't understand the second part of the condition. k[1] should evaluate to a boolean value, why are you checking it against a list of string values?

And even if what you intended to test was k[0], I don't understand why are you checking if it's a "GET" or "POST".

This comment has been minimized.

@sebastian-philipp

sebastian-philipp Apr 30, 2018

Author Member

fixed.

methods.append(k[0])
if methods:
result.append((methods, None, '_collection', []))
methods = []
args = []
for k, v in cls._method_mapping.items():
if k[1] and hasattr(cls, v[0]):
if k[1] and (hasattr(cls, v[0]) or k[1] in ('GET', 'POST')):

This comment has been minimized.

@rjfd

rjfd Apr 27, 2018

Contributor

Same comment as above.

This comment has been minimized.

@sebastian-philipp

sebastian-philipp Apr 30, 2018

Author Member

done.

@@ -45,7 +45,7 @@ def get(self, svc_id):
}
service = CephService.get_service('rgw', svc_id)
if not service:
return daemon
raise cherrypy.NotFound('unknown RGW')

This comment has been minimized.

@rjfd

rjfd Apr 27, 2018

Contributor

nit: I would change the string message to: "Service 'rgw' is not available"

This comment has been minimized.

@sebastian-philipp

sebastian-philipp Apr 30, 2018

Author Member

done.


def dashboard_exception_handler(handler, *args, **kwargs):
try:
with handle_rados_error(component=None): # make the None controller the fallback.

This comment has been minimized.

@rjfd

rjfd Apr 27, 2018

Contributor

Just to make sure I understood the code. If no exception handler was defined as a decorator in the controller method (handler) we will always at least handle rados errors, right?

This comment has been minimized.

@sebastian-philipp
better_wraps = wraps
_getargspec = inspect.getfullargspec
else:
def better_wraps(func):

This comment has been minimized.

@rjfd

rjfd Apr 27, 2018

Contributor

nit: can we give a different prefix instead of better?

This comment has been minimized.

@sebastian-philipp

sebastian-philipp Apr 27, 2018

Author Member

py3_wraps? wraps_with_wrapped? Just wraps? controller_wraps? I'm heading towards just wraps

This comment has been minimized.

@rjfd

rjfd Apr 27, 2018

Contributor

I'm also fine with just wraps.

This comment has been minimized.

@sebastian-philipp

sebastian-philipp Apr 30, 2018

Author Member

done.

@@ -0,0 +1,136 @@
# -*- coding: utf-8 -*-

This comment has been minimized.

@rjfd

rjfd Apr 27, 2018

Contributor

We already have a file called exceptions.py in the root dir. We should move all classes that extend exceptions to that file. The remaining functions could stay here.

This comment has been minimized.

@sebastian-philipp

sebastian-philipp Apr 30, 2018

Author Member

done.

raise DashboardException(e, component=component)


def c2d(my_contextmanager, *cargs, **ckwargs):

This comment has been minimized.

@rjfd

rjfd Apr 27, 2018

Contributor

can we rename this decorator to ExceptionHandler?

This comment has been minimized.

@sebastian-philipp

sebastian-philipp Apr 27, 2018

Author Member

My favorite would be to fix @contentextmanager to do the right thing. I'm thinking about:

def correct_contextmanager(func):
    func = contextmanager(func):
    func.__call__ = c2d(func)
    return func

Then, it just works™

This comment has been minimized.

@sebastian-philipp

sebastian-philipp Apr 30, 2018

Author Member

done. Even though it was a bit more complicated.

@sebastian-philipp sebastian-philipp force-pushed the sebastian-philipp:dashboard_error_handling branch from 9fbcafe to 2b1087c Apr 30, 2018

@rjfd rjfd added the needs-qa label May 2, 2018

@sebastian-philipp sebastian-philipp force-pushed the sebastian-philipp:dashboard_error_handling branch 2 times, most recently from 5b5558f to d27b641 May 2, 2018

@s0nea

This comment has been minimized.

Copy link
Member

s0nea commented May 3, 2018

QA tests run was not successful: http://pulpito.ceph.com/tdehler-2018-05-03_06:37:29-rados:mgr-wip-tdehler-testing-distro-basic-mira/

Failure Reason:
Test failure: test_flatten (tasks.mgr.dashboard.test_rbd.RbdTest)

@sebastian-philipp

This comment has been minimized.

Copy link
Member Author

sebastian-philipp commented May 3, 2018

16:37.003 mgr.dashboard.helper:request POST to https://mira081.front.sepia.ceph.com:7789/api/block/image/rbd/img1/snap
16:38.032 mgr.dashboard.helper:task finished immediately
16:38.032 mgr.dashboard.helper:request PUT to https://mira081.front.sepia.ceph.com:7789/api/block/image/rbd/img1/snap/snapf
16:38.033 ceph.mgr.x.mira081.stdout:::ffff:172.21.0.51 - - [03/May/2018:07:16:38] "POST /api/block/image/rbd/img1/snap HTTP/1.1" 201 4 "" "python-requests/2.12.5"
16:38.112 mgr.dashboard.helper:task finished immediately
16:38.113 mgr.dashboard.helper:request POST to https://mira081.front.sepia.ceph.com:7789/api/block/image/rbd/img1/snap/snapf/clone
16:38.114 ceph.mgr.x.mira081.stdout:::ffff:172.21.0.51 - - [03/May/2018:07:16:38] "PUT /api/block/image/rbd/img1/snap/snapf HTTP/1.1" 200 4 "" "python-requests/2.12.5"
16:38.244 mgr.dashboard.helper:task finished immediately
16:38.244 mgr.dashboard.helper:request GET to https://mira081.front.sepia.ceph.com:7789/api/block/image/rbd_iscsi/img1_snapf_clone
16:38.246 ceph.mgr.x.mira081.stdout:::ffff:172.21.0.51 - - [03/May/2018:07:16:38] "POST /api/block/image/rbd/img1/snap/snapf/clone HTTP/1.1" 200 4 "" "python-requests/2.12.5"
16:38.305 ceph.mgr.x.mira081.stdout:::ffff:172.21.0.51 - - [03/May/2018:07:16:38] "GET /api/block/image/rbd_iscsi/img1_snapf_clone HTTP/1.1" 200 533 "" "python-requests/2.12.5"
16:38.306 mgr.dashboard.helper:request POST to https://mira081.front.sepia.ceph.com:7789/api/block/image/rbd_iscsi/img1_snapf_clone/flatten
16:38.880 mgr.dashboard.helper:task finished immediately
16:38.881 cephfs_test_runner:test_flatten (tasks.mgr.dashboard.test_rbd.RbdTest) ... FAIL
16:38.881 logy.orchestra.run.mira081:Running: "sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph log 'Ended test tasks.mgr.dashboard.test_rbd.RbdTest.test_flatten'"
16:38.882 ceph.mgr.x.mira081.stdout:::ffff:172.21.0.51 - - [03/May/2018:07:16:38] "POST /api/block/image/rbd_iscsi/img1_snapf_clone/flatten HTTP/1.1" 200 4 "" "python-requests/2.12.5"
16:42.447 cephfs_test_runner:FAIL: test_flatten (tasks.mgr.dashboard.test_rbd.RbdTest)
16:42.447 cephfs_test_runner:----------------------------------------------------------------------
16:42.447 cephfs_test_runner:Traceback (most recent call last):
16:42.447 cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-tdehler-testing/qa/tasks/mgr/dashboard/test_rbd.py", line 528, in test_flatten
16:42.448 cephfs_test_runner:    self.assertStatus(201)
16:42.448 cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_wip-tdehler-testing/qa/tasks/mgr/dashboard/helper.py", line 246, in assertStatus
16:42.448 cephfs_test_runner:    self.assertEqual(self._resp.status_code, status)
16:42.448 cephfs_test_runner:AssertionError: 200 != 201
@@ -525,7 +531,7 @@ def test_flatten(self):
self.assertIsNotNone(img['parent'])

self.flatten_image('rbd_iscsi', 'img1_snapf_clone')
self.assertStatus(200)
self.assertStatus([200, 201])

This comment has been minimized.

@rjfd

rjfd May 3, 2018

Contributor

The only status code returned by this operation is 200

This comment has been minimized.

This comment has been minimized.

@rjfd

rjfd May 4, 2018

Contributor

@sebastian-philipp you're right

@s0nea

This comment has been minimized.

Copy link
Member

s0nea commented May 4, 2018

@s0nea

s0nea approved these changes May 4, 2018

Good error handling is a key requirement in creating a good user experience
and providing a good API.

Dashboard code should not duplicate C++ code. Thus, if error handling in C++

This comment has been minimized.

@s0nea

s0nea May 4, 2018

Member

Just a suggestion:

"The Dashboard code should not duplicate the C++ code. Thus, if the error handling in C++ is sufficient, we shouldn't build your own wrapper. On the other hand, input validation is the best place to catch errors and generate the best error messages. If required, generate errors as soon as possible."

This comment has been minimized.

@sebastian-philipp

sebastian-philipp May 8, 2018

Author Member

fixed.

log the exception by adding ``logger.exception()`` with a description to the
handler.

Distinguish user errors from internal errors and programming errors. Using

This comment has been minimized.

@s0nea

s0nea May 4, 2018

Member

Another suggestion:

"We need to distinguish between user errors from internal errors and programming errors."

This comment has been minimized.

@sebastian-philipp

sebastian-philipp May 8, 2018

Author Member

fixed.

@tspmelo
Copy link
Contributor

tspmelo left a comment

I haven't look much into the python code, but I was able to successfully use this in the frontend.

Beside the few changes I requested, I notice you have a typo on one commit message, pop8, and some commits are missing the component prefix in the title.


Status Code: 400
{
"detail": str(e), # E.g. "[errno -42] <some error message>"

This comment has been minimized.

@tspmelo

tspmelo May 4, 2018

Contributor

Missing task.

This comment has been minimized.

@sebastian-philipp

sebastian-philipp May 8, 2018

Author Member

fixed.

"detail": str(e), # E.g. "[errno -42] <some error message>"
"component": "rbd", # this can be null to represent a global error code
"code": 3, # Or a error name, e.g. "code": "some_error_key"
'status': 3, # Indicating the @ViewCache error status

This comment has been minimized.

@tspmelo

tspmelo May 4, 2018

Contributor

missing task.

This comment has been minimized.

@sebastian-philipp

sebastian-philipp May 8, 2018

Author Member

fixed. Added task to a new section.

@@ -35,6 +35,7 @@ export class ApiInterceptorService implements HttpInterceptor {
case 404:
this.router.navigate(['/404']);
break;
case 400:
case 409:

This comment has been minimized.

@tspmelo

tspmelo May 4, 2018

Contributor

You can remove this line, since we don't use 409 anymore.

This comment has been minimized.

@sebastian-philipp

sebastian-philipp May 8, 2018

Author Member

fixed.

sebastian-philipp added some commits Apr 24, 2018

mgr/dashboard: Improve exception handling
* Added `dashboard_exception_handler()` to catch our exceptions.
* Changed the behaviour of `ViewCache` to raise exceptions
* Added `RadosReturnError` raised in `send_command()`
* Added unit tests

Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
mgr/dashboard: Applied Exception Handling
* Minor changes to CephFS, OSD, Pool, RbdMirroring, Summary and RGW

Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
mgr/dashboard: Improved Exception handling in Tasks
* Set default status code to 400
* Added tests
* Fixed globbing in `test_task.py`

Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
mgr/dashboard: Add task info to exception result
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
mgr/dashboard: Change to Status Code 400 by default.
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
@ricardoasmarques
Copy link
Member

ricardoasmarques left a comment

lgtm

sebastian-philipp added some commits Apr 24, 2018

mgr/dashboard: Applied Exception handling to RBDs
* DashboardTestCase: Removed waiter-thread

Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
mgr/dashboard: Exceptions: Added Documentation
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
mgr/dashboard: Exception handling: browsable API
* Added display of Exceptions
* Fixed missing sub-path
* Added delete-form
* Fixed default arguments

Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
mgr/dashboard: Fix duplicate params error
* Remove params, if they use the `{name:regex}` syntax.
* Fixes http://tracker.ceph.com/issues/23823

Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>

@sebastian-philipp sebastian-philipp force-pushed the sebastian-philipp:dashboard_error_handling branch from 64f23b1 to 6487b4b May 8, 2018

@sebastian-philipp sebastian-philipp removed the DNM label May 8, 2018

@sebastian-philipp

This comment has been minimized.

Copy link
Member Author

sebastian-philipp commented May 8, 2018

Addressed comments from @rjfd @tspmelo and @s0nea

@rjfd

rjfd approved these changes May 9, 2018

@tspmelo

tspmelo approved these changes May 9, 2018

@votdev votdev merged commit a3cf914 into ceph:master May 9, 2018

5 checks passed

Docs: build check OK - docs built
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
make check (arm64) make check succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment