New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr/dashboard: asynchronous task support #20870
Conversation
7d017f5
to
7a21c3a
Compare
|
||
To help in the development of the above scenario we added the support for | ||
asynchronous tasks. To trigger the execution of an asynchronous task we must | ||
use the follwoing class method of the ``TaskManager`` class:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/follwoing/following/
|
||
To help in the development of the above scenario we added the support for | ||
asynchronous tasks. To trigger the execution of an asynchronous task we must | ||
use the follwoing class method of the ``TaskManager`` class:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/follwoing/following/
* ``func`` is the python function that implements the operation code, which | ||
will be executed asynchronously. | ||
|
||
* ``args`` and ``kwargs`` are the positional and named argguments that will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/argguments/arguments/
is not created and you get the task object of the current running task. | ||
|
||
|
||
How to get the list of executing and finished asynchronous tasks? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe 'running' is better than 'executing' here.
asynchronous tasks. To trigger the execution of an asynchronous task we must | ||
use the follwoing class method of the ``TaskManager`` class:: | ||
|
||
import ..tools import TaskManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/import ..tools import TaskManager/from .tools import TaskManager/
|
||
@ApiController('task') | ||
@AuthRequired() | ||
class TaskController(RESTController): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
's/TaskController/Task/g' or add Controller
to all controllers.
@ApiController('task') | ||
@AuthRequired() | ||
class TaskController(RESTController): | ||
def list(self, namespace=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the namespace:
Do you have a requirement from the UI to add namespaces? If not, I'd add namespaces later, only if they are needed.
Looks like I can filter by namespace here. Why not instead by task name? In my experience, we won't have that many different tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Namespaces are useful to group related tasks. For instance, to group all tasks related to a single component like RBD rbd/*
.
Since the frontend is the main entity to consume the list of tasks, working with a "task name" is difficult. The frontend memory is volatile and might easily lose the task name upon refresh, therefore it is easier to use namespaces.
Also, you can define a unique namespace to mimic the same behavior of a "task name" or "task id".
The namespace
parameter in this function accepts a glob expression, which makes it pretty good for filtering groups of tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just renaming namespace
to name
or task_name
would make things clearer. Especially when moving the name to the task definition. We're talking about the same thing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have strong feelings either way, but namespace
does sound a bit better than name
or task_name
IF namespace
here is supposed to allow more than just a specific task unique identifier; name
and task_name
, on the other hand, seem to convey something a bit too specific.
However, I don't particularly like namespace
for this particular case, but I don't have a better name for it, so...
@cherrypy.expose | ||
@cherrypy.tools.json_out() | ||
def default(self): | ||
task = TaskManager.run("dummy/task", {}, self._dummy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name "dummy/task"
should be part of the definition of the task, as it should constant across all usages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be more clear on what you mean by should be part of the definition of the task
?
The namespace
parameter is supposed to be a dynamic property on purpose, if the controller developer wants all its tasks to have the same namespace
string then it can declare a constant variable with that string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The task name is typically very similar to the method name plus class or package name. There is no need to make this a dynamic property, except if you have a clear usecase in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sebastian-philipp If you only have one single task_name
as the sole identifier of a single task in the queue, how do you expect to handle different tasks performing the same base action if you are not also differentiating them based on their arguments?
You actually gave a pretty good example of how what you are proposing would not work (in another comment), even though you presented it as an argument against the current implementation:
TaskManager.run('set_pg_count_per_pool', 42, lambda: set_pg_count_per_pool(mypool, 42))
TaskManager.run('set_pg_count_per_pool', 12, lambda: set_pg_count_per_pool(mypool, 12))
TaskManager.run('set_pg_count_per_pool', 42, lambda: set_pg_count_per_pool(mypool, 42))
Whereas what is above would be expected to run two different tasks, and have the third call being returned the status of the first call (given it's an idempotent operation), how would you handle such a sequence of operations if you only had one single, static identifier task_name
?
``value == None``, and if ``state == VALUE_EXCEPTION`` then ``value`` stores | ||
the exception object raised by the execution of function ``func``. | ||
|
||
The pair ``(namespace, metadata)`` should univocally identify the task being |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use a task name instead? Why invent something like namespace
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(namespace, metadata)
provides the ability to identify unequivocally a task without having to know the "task name" or "task id". This provides much more flexibility for the consumers of the list of executing/finished tasks, as they are not require to store the list of task names that they want keep track.
|
||
* ``VALUE_DONE = 0`` | ||
* ``VALUE_EXECUTING = 1`` | ||
* ``VALUE_EXCEPTION = 2`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove VALUE_EXCEPTION
as a valid return code. Instead raise the original exception in the caller's thread. If someone wants to catch a specific exception, he should use `try: ... except:' instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, maybe we don't need to pass the exception this way. Will try your suggestion.
{ | ||
'namespace': "namespace", # str | ||
'metadata': { }, # dict | ||
'begin_time': 0.0, # float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, ISO 8601
'namespace': "namespace", # str | ||
'metadata': { }, # dict | ||
'begin_time': 0.0, # float | ||
'end_time': 0.0, # float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ISO 8601
'end_time': 0.0, # float | ||
'latency': 0.0, # float | ||
'progress': 0 # int (percentage) | ||
'success': True, # bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd replace with state
add unify executing and finished tasks. This would also improve the TaskManager itself.
'latency': 0.0, # float | ||
'progress': 0 # int (percentage) | ||
'success': True, # bool | ||
'ret_value': None, # object, populated only if 'success' == True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd allow generic JSON data structures here. Sometimes, a simple Boolean is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I say object
, I mean anything, it could be a boolean. Of course the restriction is that it must be something that can be serialized in JSON.
return { | ||
'executing_tasks': executing_t, | ||
'finished_tasks': finished_t | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd unify both lists into one and add a state
property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can imagine this being useful from the UI point-of-view. Maybe a frontend dev could opinionate on what they would like to be returned here?
src/pybind/mgr/dashboard_v2/tools.py
Outdated
with self.lock: | ||
if self.executor_thread is None: | ||
self.executor_thread = AsyncTask.ExecutorThread(self) | ||
self.executor_thread.start() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're unconditionally creating a new posix thread per scheduled task here and dashboard_v2 already starts about ten posix threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But wouldn't each task have to be handled in a separate thread? How else would you execute them in parallel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LenzGr I think what @sebastian-philipp means is that with the current implementation the number of threads might grow unbounded. I agree that we can improve this part of code by using a thread queue that limits the number of worker threads. I think this improvement can be done in a separate PR and is not critical at this point.
src/pybind/mgr/dashboard_v2/tools.py
Outdated
val = self._task.fn(*self._task.fn_args, **self._task.fn_kwargs) | ||
self._task.end_time = time.time() | ||
except Exception as ex: | ||
logger.exception("Error while calling %s: ex=%s", self._task, str(ex)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace with
logger.exception("Error while calling %s", self._task)
as the exception is already printed by logger.excepton
src/pybind/mgr/dashboard_v2/tools.py
Outdated
self._task.exception = ex | ||
else: | ||
with self._task.lock: | ||
self._task.latency = self._task.end_time - self._task.begin_time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/latency/duration
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only focused on evaluating the general architecture and approach, which seemed sound and generic enough to me to make me happy. I will leave for the backend developers to figure out the code correctness, and frontend developers to ascertain if the interfaces make sense to them.
Some nits on grammar and english foo, but does look good to me.
|
||
* ``namespace`` is a string that can be used to group tasks. For instance | ||
for RBD image creation tasks we could specify ``"rbd/create"`` as the | ||
namespace, or conversly ``"rbd/remove"`` for RBD image removal tasks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/conversly/conversely/. Also, I don't think conversely
applies here. I'd go with similarly
instead, as you are not pointing out a reversion of the previous statement.
The pair ``(namespace, metadata)`` should univocally identify the task being | ||
run, which means that if you try to trigger a new task that matches the same | ||
``(namespace, metadata)`` pair of the currently running task, then the new task | ||
is not created and you get the task object of the current running task. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also mention the idempotent nature of running two calls to run()
with the same namespace and metadata, with an example.
will return all executing and finished tasks which namespace starts with | ||
``rbd/``. | ||
|
||
To prevent the finished tasks list from growing unboundly, the finished tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/unboundly/unbounded/
} | ||
|
||
|
||
How to updated the execution progress of an asynchronous task? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either s/updated/update/, or you the verb.
The ``inc_progress`` method receives as argument an integer value representing | ||
the delta we want to increment to the current execution progress percentage. | ||
|
||
Now we show a full example of a controller that triggers a new task and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of
Now we show a full example of a controller [...]
I'd go with
Take the following example of a controler [...]
7a21c3a
to
84e3a5e
Compare
@votdev @ricardoasmarques @jecluis I addressed your documentation comments. @sebastian-philipp I addressed all the comments that are not under discussion. |
ddc38e0
to
a9e8f20
Compare
jenkins retest this please |
@ricardoasmarques I implemented the solution you proposed for the clean up of old finished tasks. |
b29edb1
to
2801f25
Compare
jenkins retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested this PR, and it meets all requirements that will be needed for the front-end. Lgtm.
f14778b
to
70abc8d
Compare
@rjfd How would you implement a task that monitors the successful creation of PGs? (This is the asynchronous implementation of that task in openATTIC). Would you start an extra thread for this task? |
@sebastian-philipp here's an example how to mimic the oA code you referred to using dashboard asynchronous tasks. Since getting the def monitor_pg_state(pool_name, pg_num):
pool_info = CephService.get_pool_info(pool_name)
if not pool_info or 'pg_status' not in pool_info:
return {'success': False, 'msg': "Could not get pool pg_status"}
pg_status = pool_info['pg_status']
active = 0
if 'active+clean' in pg_status:
if pg_status['active+clean'] == pg_num:
return {'success': True, 'msg': "All PGs are active+clean"}
else:
active = pg_status['active+clean']
progress = int(round(active * 100.0 / pg_num))
TaskManager.current_task().set_progress(progress)
time.sleep(1.0) # sleep for a bit, no need to check every millisecond
return monitor_pg_state(pool_name, pg_num)
@ApiController('test')
class Test(BaseController):
@cherrypy.expose
@cherrypy.tools.json_out()
def pg_monitor(self, pool_name):
pool_info = CephService.get_pool_info(pool_name)
task = TaskManager.run("osd/pool/pg/monitor", {'pool_name': pool_name},
monitor_pg_state, [pool_name, pool_info['pg_num']])
return task.wait(2.0) |
@sebastian-philipp (and others @jecluis @ricardoasmarques @jcsp ) here's an example of how to implement the "create pool" operation using the asynchronous nature of |
4e0d0cd
to
f387e7b
Compare
Added cc5ef6a to this PR to avoid memory leaks because of dangling notificationqueue listeners registered by short lived objects like in the custom executor example in |
@rjfd I've retested this PR and it still lgtm. |
I just looked at {
"executing_tasks": [],
"finished_tasks": [
{
"exception": null,
"end_time": "2018-03-23T12:30:37.318413Z",
"success": true,
"begin_time": "2018-03-23T12:30:37.317111Z",
"duration": 0.0013020038604736328,
"progress": 100,
"ret_value": {
"msg": "Could not get pool pg_status",
"success": false
},
"namespace": "osd/pool/pg/monitor",
"metadata": {
"pool_name": ".rgw.root"
}
}
]
} But I'm OK with it, if @ricardoasmarques is handling this in the UI. |
But... Are they not on different objects though? |
The first on is on the task and the second one is on the return value of the task. |
Signed-off-by: Ricardo Dias <rdias@suse.com>
Signed-off-by: Ricardo Dias <rdias@suse.com>
Signed-off-by: Ricardo Dias <rdias@suse.com>
Signed-off-by: Ricardo Dias <rdias@suse.com>
Signed-off-by: Ricardo Dias <rdias@suse.com>
Signed-off-by: Ricardo Dias <rdias@suse.com>
Signed-off-by: Ricardo Dias <rdias@suse.com>
Signed-off-by: Ricardo Dias <rdias@suse.com>
Signed-off-by: Ricardo Dias <rdias@suse.com>
f387e7b
to
9f11a68
Compare
Another round of QA tests were successful: http://pulpito.ceph.com/rdias-2018-03-27_15:04:07-rados:mgr-wip-rdias-testing-distro-basic-smithi/ |
This PR introduces the support for executing long-running tasks by dashboard backend controllers.
The tasks are executed asynchronously and can be queried for their respecting executing status.
For detailed information please read the changes made to
HACKING.rst
file.Signed-off-by: Ricardo Dias rdias@suse.com