flow.async: perf: Concurrent_task_loop hierarchy should be de-virtualized and templatized. #72
Labels
enhancement
New feature or request
from-akamai-pre-open
Issue origin is Akamai, before opening source
Filed by @ygoldfeld pre-open-source:
flow::async has been quite the work-horse for (projects) (EDIT: now also in Flow-IPC). It's held up under pressure IMO. However, as I make changes to it, I learn some lessons - and one of those is that I probably made a (non-fatal) design mistake in basing the Concurrent_task_loop hierarchy on a classic Java-style interface/virtual basis.
For ease of discussion, I'll pretend that the only way of posting a task is to use
L->post(F)
, or else....async_X(..., asio_handler_via_op(L, O, F))
. The schedule_*() stuff is similar, as is asio_handler_timed(), respectively.Currently there is a hierarchy:
Ctl::post()
is virtual andoverride
s in the other 2.** Since it is virtual, it cannot take a tparam Handler H; it takes a
boost::function<void()>
(alias Task). This creates some unnecessary computing, as such an object must always be constructed (implicitly typically).** Furthermore, it is not possible to write something like post() that would not lose a bound executor (e.g., strand) of H; constructing a
function<>
always loses any bound executor - silently and evilly, as it is hard to detect resulting bugs (I speak from experience). While this is not a problem for public post() itself, it would be a problem if one wanted to wrap a handler with some added processing without losing bound executor (like strand).L->post(O)
s the H(params...) call.** This is a perf loss; ideally it would return a handler that would just bind_executor() from the O (the strand within) in the case of Cross_thread_task_loop; without any added async delay from the added post(). And for Segregated_thread_task_loop it could just return the handler unchanged.
*** But it cannot do this by delegating different behavior depending on which C_t_l subclass it is; as any virtual behavior cannot take a tparam Handler (see above).
*** Note: This applies to any other thing like asio_handler_timed() and anything else of that nature we might want in the future.
What we should do IMO is to get rid of the C_t_l interface and replace it with a concept (it could still be rigorously documented like today). And then the individual subclasses wouldn't be subclasses anymore; they'd just behave the way a C_t_l must behave (duck typing).
This would make further development quite a bit more natural, I've found. I ran into issues (in implementing asio_handler_timed() and related) that wouldn't have been issues with this change. Vaguely speaking, the template-y nature of dealing with boost.asio handlers tends to come in conflict with the virtual interface design here.
And it would improve performance - in quite a critical place; who knows how many processor cycles (services that use flow.async) could get back.
What would be lost? The following things actively use the virtual/interface design:
** But in reality - even though such use is documented in my doc headers - no code demands this ability that I know of. (EDIT: Flow is now open-source; that could change rapidly, in theory anyway.)
** Moreover, even if run-time swap-in/swap-out is required somewhere, a virtual hierarchy can still exist by wrapping a subset of the modified classes. I.e., virtual on top of template = possible; but the reverse = not possible.
** But it could without much pain just be rewritten to be templated. Hell, the main class (
Timed_concurrent_task_loop_impl<Accumulator>
) is already a template anyway; it would just gain an arg (ideally on ctor only, but failing that on the class).In terms of priority, can tackle this when there is a bit of free time. It's not urgent. The current setup works fine. But the sooner the better, both for field perf and before further flow.async changes are needed and/or more flow.async users exist.
The text was updated successfully, but these errors were encountered: