mgr/dashboard: asynchronous task support #20870

rjfd · 2018-03-13T14:31:45Z

This PR introduces the support for executing long-running tasks by dashboard backend controllers.
The tasks are executed asynchronously and can be queried for their respecting executing status.

For detailed information please read the changes made to HACKING.rst file.

Signed-off-by: Ricardo Dias rdias@suse.com

ricardoasmarques · 2018-03-13T14:51:52Z

src/pybind/mgr/dashboard_v2/HACKING.rst

+
+To help in the development of the above scenario we added the support for
+asynchronous tasks. To trigger the execution of an asynchronous task we must
+use the follwoing class method of the ``TaskManager`` class::


s/follwoing/following/

votdev · 2018-03-13T14:34:40Z

src/pybind/mgr/dashboard_v2/HACKING.rst

+
+To help in the development of the above scenario we added the support for
+asynchronous tasks. To trigger the execution of an asynchronous task we must
+use the follwoing class method of the ``TaskManager`` class::


s/follwoing/following/

votdev · 2018-03-13T14:39:24Z

src/pybind/mgr/dashboard_v2/HACKING.rst

+* ``func`` is the python function that implements the operation code, which
+  will be executed asynchronously.
+
+* ``args`` and ``kwargs`` are the positional and named argguments that will be


s/argguments/arguments/

votdev · 2018-03-13T14:44:09Z

src/pybind/mgr/dashboard_v2/HACKING.rst

+is not created and you get the task object of the current running task.
+
+
+How to get the list of executing and finished asynchronous tasks?


Maybe 'running' is better than 'executing' here.

ricardoasmarques · 2018-03-13T15:03:12Z

src/pybind/mgr/dashboard_v2/HACKING.rst

+asynchronous tasks. To trigger the execution of an asynchronous task we must
+use the follwoing class method of the ``TaskManager`` class::
+
+  import ..tools import TaskManager


s/import ..tools import TaskManager/from .tools import TaskManager/

sebastian-philipp · 2018-03-13T16:08:14Z

src/pybind/mgr/dashboard_v2/controllers/task.py

+
+@ApiController('task')
+@AuthRequired()
+class TaskController(RESTController):


's/TaskController/Task/g' or add Controller to all controllers.

sebastian-philipp · 2018-03-13T16:43:38Z

src/pybind/mgr/dashboard_v2/controllers/task.py

+@ApiController('task')
+@AuthRequired()
+class TaskController(RESTController):
+    def list(self, namespace=None):


Regarding the namespace:

Do you have a requirement from the UI to add namespaces? If not, I'd add namespaces later, only if they are needed.
Looks like I can filter by namespace here. Why not instead by task name? In my experience, we won't have that many different tasks.

Namespaces are useful to group related tasks. For instance, to group all tasks related to a single component like RBD rbd/*.
Since the frontend is the main entity to consume the list of tasks, working with a "task name" is difficult. The frontend memory is volatile and might easily lose the task name upon refresh, therefore it is easier to use namespaces.

Also, you can define a unique namespace to mimic the same behavior of a "task name" or "task id".

The namespace parameter in this function accepts a glob expression, which makes it pretty good for filtering groups of tasks.

Just renaming namespace to name or task_name would make things clearer. Especially when moving the name to the task definition. We're talking about the same thing here.

I don't have strong feelings either way, but namespace does sound a bit better than name or task_name IF namespace here is supposed to allow more than just a specific task unique identifier; name and task_name, on the other hand, seem to convey something a bit too specific.

However, I don't particularly like namespace for this particular case, but I don't have a better name for it, so...

sebastian-philipp · 2018-03-13T16:48:39Z

src/pybind/mgr/dashboard_v2/HACKING.rst

+      @cherrypy.expose
+      @cherrypy.tools.json_out()
+      def default(self):
+          task = TaskManager.run("dummy/task", {}, self._dummy)


The name "dummy/task" should be part of the definition of the task, as it should constant across all usages.

Can you be more clear on what you mean by should be part of the definition of the task?

The namespace parameter is supposed to be a dynamic property on purpose, if the controller developer wants all its tasks to have the same namespace string then it can declare a constant variable with that string.

The task name is typically very similar to the method name plus class or package name. There is no need to make this a dynamic property, except if you have a clear usecase in mind.

@sebastian-philipp If you only have one single task_name as the sole identifier of a single task in the queue, how do you expect to handle different tasks performing the same base action if you are not also differentiating them based on their arguments?

You actually gave a pretty good example of how what you are proposing would not work (in another comment), even though you presented it as an argument against the current implementation:

TaskManager.run('set_pg_count_per_pool', 42, lambda: set_pg_count_per_pool(mypool, 42))
TaskManager.run('set_pg_count_per_pool', 12, lambda: set_pg_count_per_pool(mypool, 12))
TaskManager.run('set_pg_count_per_pool', 42, lambda: set_pg_count_per_pool(mypool, 42))

Whereas what is above would be expected to run two different tasks, and have the third call being returned the status of the first call (given it's an idempotent operation), how would you handle such a sequence of operations if you only had one single, static identifier task_name?

sebastian-philipp · 2018-03-13T17:06:43Z

src/pybind/mgr/dashboard_v2/HACKING.rst

+``value == None``, and if ``state == VALUE_EXCEPTION`` then ``value`` stores
+the exception object raised by the execution of function ``func``.
+
+The pair ``(namespace, metadata)`` should univocally identify the task being


why not use a task name instead? Why invent something like namespace?

(namespace, metadata) provides the ability to identify unequivocally a task without having to know the "task name" or "task id". This provides much more flexibility for the consumers of the list of executing/finished tasks, as they are not require to store the list of task names that they want keep track.

sebastian-philipp · 2018-03-13T17:13:11Z

src/pybind/mgr/dashboard_v2/HACKING.rst

+
+* ``VALUE_DONE = 0``
+* ``VALUE_EXECUTING = 1``
+* ``VALUE_EXCEPTION = 2``


Please remove VALUE_EXCEPTION as a valid return code. Instead raise the original exception in the caller's thread. If someone wants to catch a specific exception, he should use `try: ... except:' instead.

Right, maybe we don't need to pass the exception this way. Will try your suggestion.

sebastian-philipp · 2018-03-13T17:33:24Z

src/pybind/mgr/dashboard_v2/HACKING.rst

+  {
+    'namespace': "namespace",  # str
+    'metadata': { },  # dict
+    'begin_time': 0.0,  # float


again, ISO 8601

sebastian-philipp · 2018-03-13T17:33:29Z

src/pybind/mgr/dashboard_v2/HACKING.rst

+    'namespace': "namespace",  # str
+    'metadata': { },  # dict
+    'begin_time': 0.0,  # float
+    'end_time': 0.0,  # float


sebastian-philipp · 2018-03-13T17:34:15Z

src/pybind/mgr/dashboard_v2/HACKING.rst

+    'end_time': 0.0,  # float
+    'latency': 0.0,  # float
+    'progress': 0  # int (percentage)
+    'success': True,  # bool


I'd replace with state add unify executing and finished tasks. This would also improve the TaskManager itself.

sebastian-philipp · 2018-03-13T17:35:08Z

src/pybind/mgr/dashboard_v2/HACKING.rst

+    'latency': 0.0,  # float
+    'progress': 0  # int (percentage)
+    'success': True,  # bool
+    'ret_value': None,  # object, populated only if 'success' == True


I'd allow generic JSON data structures here. Sometimes, a simple Boolean is enough.

When I say object, I mean anything, it could be a boolean. Of course the restriction is that it must be something that can be serialized in JSON.

sebastian-philipp · 2018-03-13T17:36:18Z

src/pybind/mgr/dashboard_v2/controllers/task.py

+        return {
+            'executing_tasks': executing_t,
+            'finished_tasks': finished_t
+        }


I'd unify both lists into one and add a state property.

I can imagine this being useful from the UI point-of-view. Maybe a frontend dev could opinionate on what they would like to be returned here?

sebastian-philipp · 2018-03-14T10:30:11Z

src/pybind/mgr/dashboard_v2/tools.py

+        with self.lock:
+            if self.executor_thread is None:
+                self.executor_thread = AsyncTask.ExecutorThread(self)
+                self.executor_thread.start()


You're unconditionally creating a new posix thread per scheduled task here and dashboard_v2 already starts about ten posix threads.

But wouldn't each task have to be handled in a separate thread? How else would you execute them in parallel?

@LenzGr I think what @sebastian-philipp means is that with the current implementation the number of threads might grow unbounded. I agree that we can improve this part of code by using a thread queue that limits the number of worker threads. I think this improvement can be done in a separate PR and is not critical at this point.

sebastian-philipp · 2018-03-14T10:35:39Z

src/pybind/mgr/dashboard_v2/tools.py

+                val = self._task.fn(*self._task.fn_args, **self._task.fn_kwargs)
+                self._task.end_time = time.time()
+            except Exception as ex:
+                logger.exception("Error while calling %s: ex=%s", self._task, str(ex))


replace with

logger.exception("Error while calling %s", self._task)

as the exception is already printed by logger.excepton

sebastian-philipp · 2018-03-14T10:36:32Z

src/pybind/mgr/dashboard_v2/tools.py

+                    self._task.exception = ex
+            else:
+                with self._task.lock:
+                    self._task.latency = self._task.end_time - self._task.begin_time


s/latency/duration?

jecluis

Only focused on evaluating the general architecture and approach, which seemed sound and generic enough to me to make me happy. I will leave for the backend developers to figure out the code correctness, and frontend developers to ascertain if the interfaces make sense to them.

Some nits on grammar and english foo, but does look good to me.

jecluis · 2018-03-14T13:32:34Z

src/pybind/mgr/dashboard_v2/HACKING.rst

+
+* ``namespace`` is a string that can be used to group tasks. For instance
+  for RBD image creation tasks we could specify ``"rbd/create"`` as the
+  namespace, or conversly ``"rbd/remove"`` for RBD image removal tasks.


s/conversly/conversely/. Also, I don't think conversely applies here. I'd go with similarly instead, as you are not pointing out a reversion of the previous statement.

jecluis · 2018-03-14T13:36:06Z

src/pybind/mgr/dashboard_v2/HACKING.rst

+The pair ``(namespace, metadata)`` should univocally identify the task being
+run, which means that if you try to trigger a new task that matches the same
+``(namespace, metadata)`` pair of the currently running task, then the new task
+is not created and you get the task object of the current running task.


I would also mention the idempotent nature of running two calls to run() with the same namespace and metadata, with an example.

jecluis · 2018-03-14T13:37:54Z

src/pybind/mgr/dashboard_v2/HACKING.rst

+will return all executing and finished tasks which namespace starts with
+``rbd/``.
+
+To prevent the finished tasks list from growing unboundly, the finished tasks


s/unboundly/unbounded/

jecluis · 2018-03-14T13:40:28Z

src/pybind/mgr/dashboard_v2/HACKING.rst

+  }
+
+
+How to updated the execution progress of an asynchronous task?


Either s/updated/update/, or you the verb.

jecluis · 2018-03-14T13:42:22Z

src/pybind/mgr/dashboard_v2/HACKING.rst

+The ``inc_progress`` method receives as argument an integer value representing
+the delta we want to increment to the current execution progress percentage.
+
+Now we show a full example of a controller that triggers a new task and


Instead of

Now we show a full example of a controller [...]

I'd go with

Take the following example of a controler [...]

dismissed review

rjfd · 2018-03-14T15:59:11Z

@votdev @ricardoasmarques @jecluis I addressed your documentation comments.

@sebastian-philipp I addressed all the comments that are not under discussion.

rjfd · 2018-03-15T09:20:44Z

jenkins retest this please

rjfd · 2018-03-16T07:38:37Z

@ricardoasmarques I implemented the solution you proposed for the clean up of old finished tasks.

rjfd · 2018-03-16T11:29:32Z

jenkins retest this please

ricardoasmarques

I've tested this PR, and it meets all requirements that will be needed for the front-end. Lgtm.

sebastian-philipp · 2018-03-20T16:04:00Z

@rjfd How would you implement a task that monitors the successful creation of PGs? (This is the asynchronous implementation of that task in openATTIC). Would you start an extra thread for this task?

rjfd · 2018-03-21T12:27:43Z

@sebastian-philipp here's an example how to mimic the oA code you referred to using dashboard asynchronous tasks. Since getting the pg_status of a pool is a blocking operation we will be executing the monitor_pg_state function as a task using the default ThreadedExecutor, which internally spawns a thread.

def monitor_pg_state(pool_name, pg_num):
    pool_info = CephService.get_pool_info(pool_name)
    if not pool_info or 'pg_status' not in pool_info:
        return {'success': False, 'msg': "Could not get pool pg_status"}
    pg_status = pool_info['pg_status']
    active = 0
    if 'active+clean' in pg_status:
        if pg_status['active+clean'] == pg_num:
            return {'success': True, 'msg': "All PGs are active+clean"}
        else:
            active = pg_status['active+clean']

    progress = int(round(active * 100.0 / pg_num))
    TaskManager.current_task().set_progress(progress)

    time.sleep(1.0)  # sleep for a bit, no need to check every millisecond
    return monitor_pg_state(pool_name, pg_num)


@ApiController('test')
class Test(BaseController):
    @cherrypy.expose
    @cherrypy.tools.json_out()
    def pg_monitor(self, pool_name):
        pool_info = CephService.get_pool_info(pool_name)
        task = TaskManager.run("osd/pool/pg/monitor", {'pool_name': pool_name},
                               monitor_pg_state, [pool_name, pool_info['pg_num']])
        return task.wait(2.0)

rjfd · 2018-03-21T12:34:07Z

@sebastian-philipp (and others @jecluis @ricardoasmarques @jcsp ) here's an example of how to implement the "create pool" operation using the asynchronous nature of mgr.send_command and a custom task executor. The implementation is based on an asynchronous state machine that executes all steps asynchronously, and the task only finishes when all pools PGs are in active+clean state.

https://github.com/rjfd/ceph/blob/wip-dashboard-pr-20865/src/pybind/mgr/dashboard/controllers/pool.py

rjfd · 2018-03-21T14:52:45Z

Added cc5ef6a to this PR to avoid memory leaks because of dangling notificationqueue listeners registered by short lived objects like in the custom executor example in HACKING.rst. Updated the example accordingly.

ricardoasmarques · 2018-03-22T11:23:28Z

@rjfd I've retested this PR and it still lgtm.

sebastian-philipp · 2018-03-23T11:35:21Z

I just looked at /task and I'm still find it confusing to have two success bools contradicting each other:

{
  "executing_tasks": [],
  "finished_tasks": [
    {
      "exception": null,
      "end_time": "2018-03-23T12:30:37.318413Z",
      "success": true,
      "begin_time": "2018-03-23T12:30:37.317111Z",
      "duration": 0.0013020038604736328,
      "progress": 100,
      "ret_value": {
        "msg": "Could not get pool pg_status",
        "success": false
      },
      "namespace": "osd/pool/pg/monitor",
      "metadata": {
        "pool_name": ".rgw.root"
      }
    }
  ]
}

But I'm OK with it, if @ricardoasmarques is handling this in the UI.

jecluis · 2018-03-23T11:53:17Z

But... Are they not on different objects though?

sebastian-philipp · 2018-03-23T11:54:40Z

The first on is on the task and the second one is on the return value of the task.

Signed-off-by: Ricardo Dias <rdias@suse.com>

rjfd · 2018-03-28T07:23:43Z

Another round of QA tests were successful: http://pulpito.ceph.com/rdias-2018-03-27_15:04:07-rados:mgr-wip-rdias-testing-distro-basic-smithi/

comments have been addresses

rjfd added feature mgr dashboard labels Mar 13, 2018

rjfd requested review from jecluis, jcsp, sebastian-philipp, s0nea, p-se and ricardoasmarques March 13, 2018 14:31

rjfd force-pushed the wip-dashboard-tasks branch from 7d017f5 to 7a21c3a Compare March 13, 2018 14:35

rjfd requested a review from LenzGr March 13, 2018 14:36

ricardoasmarques reviewed Mar 13, 2018

View reviewed changes

votdev previously requested changes Mar 13, 2018

View reviewed changes

ricardoasmarques reviewed Mar 13, 2018

View reviewed changes

sebastian-philipp suggested changes Mar 13, 2018

View reviewed changes

sebastian-philipp previously requested changes Mar 14, 2018

View reviewed changes

jecluis approved these changes Mar 14, 2018

View reviewed changes

rjfd force-pushed the wip-dashboard-tasks branch from 7a21c3a to 84e3a5e Compare March 14, 2018 15:57

rjfd force-pushed the wip-dashboard-tasks branch 3 times, most recently from ddc38e0 to a9e8f20 Compare March 15, 2018 08:04

rjfd force-pushed the wip-dashboard-tasks branch 2 times, most recently from b29edb1 to 2801f25 Compare March 16, 2018 10:11

rjfd changed the title ~~mgr/dashboard_v2: asynchronous task support~~ mgr/dashboard: asynchronous task support Mar 16, 2018

ricardoasmarques approved these changes Mar 16, 2018

View reviewed changes

rjfd force-pushed the wip-dashboard-tasks branch from f14778b to 70abc8d Compare March 20, 2018 15:48

rjfd force-pushed the wip-dashboard-tasks branch 2 times, most recently from 4e0d0cd to f387e7b Compare March 21, 2018 14:49

rjfd added needs-qa wip-rdias-testing labels Mar 22, 2018

sebastian-philipp approved these changes Mar 23, 2018

View reviewed changes

rjfd added 9 commits March 27, 2018 13:43

mgr/dashboard: privatize NotificationQueue methods

ce5fa59

Signed-off-by: Ricardo Dias <rdias@suse.com>

mgr/dashboard: Support for handler priorities in NotificationQueue

6b0afa3

Signed-off-by: Ricardo Dias <rdias@suse.com>

mgr/dashboard: fix NotificationQueue waiting loop

11b6c93

Signed-off-by: Ricardo Dias <rdias@suse.com>

mgr/dashboard: implemented NotificationQueue listener removal

ce8c489

Signed-off-by: Ricardo Dias <rdias@suse.com>

mgr/dashboard: Asynchronous tasks implementation

404e251

Signed-off-by: Ricardo Dias <rdias@suse.com>

mgr/dashboard: async tasks controller

6d5279d

Signed-off-by: Ricardo Dias <rdias@suse.com>

mgr/dashboard: added tasks info to summary controller

067b86e

Signed-off-by: Ricardo Dias <rdias@suse.com>

mgr/dashboard: add task manager usage instructions to HACKING.rst

e2fb392

Signed-off-by: Ricardo Dias <rdias@suse.com>

mgr/dashboard: task manager unit tests implementation

9f11a68

Signed-off-by: Ricardo Dias <rdias@suse.com>

rjfd force-pushed the wip-dashboard-tasks branch from f387e7b to 9f11a68 Compare March 27, 2018 13:16

rjfd merged commit 5a861f5 into ceph:master Mar 28, 2018

rjfd removed needs-qa wip-rdias-testing labels Mar 28, 2018

rjfd mentioned this pull request Jun 28, 2018

mgr/dashboard: Add Pool update endpoint #21881

Merged

		is not created and you get the task object of the current running task.


		How to get the list of executing and finished asynchronous tasks?

		}


		How to updated the execution progress of an asynchronous task?

mgr/dashboard: asynchronous task support #20870

mgr/dashboard: asynchronous task support #20870

Conversation

rjfd commented Mar 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jecluis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rjfd commented Mar 14, 2018

rjfd commented Mar 15, 2018

rjfd commented Mar 16, 2018

rjfd commented Mar 16, 2018

ricardoasmarques left a comment

Choose a reason for hiding this comment

sebastian-philipp commented Mar 20, 2018

rjfd commented Mar 21, 2018

rjfd commented Mar 21, 2018

rjfd commented Mar 21, 2018

ricardoasmarques commented Mar 22, 2018

sebastian-philipp commented Mar 23, 2018

jecluis commented Mar 23, 2018

sebastian-philipp commented Mar 23, 2018

rjfd commented Mar 28, 2018