New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement local-only primary namespace service #2232
Conversation
- adding agas::detail::local_primary_namespace - simplified and unified generation of gids - refactored performance counters for server::primary_namespace - moved part of implementation for agas::detail::hosted_data_type and agas::detail::bootstrap_data_type into source files
- refactoring primary_namespace into a base-class component and two derived classes - split generate_unique_ids into two implementations - disable AGAS caching when in local mode
- hello_world is running, more testing is required
- this also disables all networking and will not expect any other localities to connect - the command line option --hpx:expect-connecting-localities now can take an (optional) argument - fly-by: removed docs of configurations setting for IPC and VERBS parcelports - fly-by: simply implementation of performance counters for primary AGAS namespaces
- enabled performance counters for local primary AGAS namespace
Which kind of applications would benefit from such an optimization? What's the actual performance increase when using this? |
Any application which is potentially distributed but has to run on a single locality (embedded devices?) This patch allows for the global addresses being used in a more efficient way when running on one locality. This will also benefit our comparisons with libraries like TBB.
I have not done any solid performance analysis. Here are the performance counter results for agas/primary_namespace for hello_world with and without this optimization, though:
So it looks like that in this case both, the number of calls and the time required to execute those are reduced. |
That was mainly the reason why I asked for performance data ... I am not sure that this optimization buy us a lot in general. Especially in comparison with TBB, where we would compare a solution written for distributed with a completely local solution (apples and oranges). Our parallel algorithms etc. don't really depend on AGAS anyway. Having this option enabled might also bring a distorted picture once you go to distributed. Given that the overheads are indeed less, you need to adapt grainsizes etc. once again. I am not sure if this optimization is really worth it in the end.
Why do we have less requests all of sudden? But the timing looks nice. Would be nice to have a baseline with master. P.S.: I am not through with reading through the code ;) |
I agree. I'm not sure myself how many applications are written using AGAS in mind which might have to run on single localities in the end.
Not quite. The segmented algorithms depend on it.
Less operations are needed for the initialization of things in this case. So in the end it's a one-time benefit.
I absolutely agree, even more as the overall runtime is probably determined by the console IO anyways. Overall, this PR has potential benefits, the question is probably how big would be the maintenance burden to have it in. As I think it wouldn't be too large, I'd like to have that available... |
I would like to go ahead and merge this. Are there any objections, still? |
On Dienstag, 26. Juli 2016 06:26:44 CEST Hartmut Kaiser wrote:
My main objection still is if this is really worth it, we add a significant |
Am 26.07.2016 3:49 nachm. schrieb "Thomas Heller" thom.heller@gmail.com:
The transpose example might be a could candidate to give an indication of |
If we want to go ahead with this, we need to extend the test suite to run
the local only case as well.
|
Here are some results from running transpose_block:
So the difference is not significant, but measurable. |
In addition to the posted results I did a similar and compared it to master: I ran the following command:
master:
fixing_1591:
fixing_1591: (Adding
In order to increase the load onto AGAS, I changed the number of blocks:
master:
fixing_1591:
fixing_1591: (Added
So the overall result is that adding The run on fixing_1591 without --hpx:run-locally unfortunately is sometimes failing with the following exception:
|
This PR implements an optimization of AGAS for local-only operation. In this mode (command line option
--hpx:run-locally
) all networking is disabled and the local virtual addresses of global objects are directly encoded in their global identifiers. This removes a large part of the AGAS overheads introduced by global reference counting and global address resolution.This fixes #1591