New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significant performance mismatch between MPI and HPX in SMP for allgather example #445
Comments
[comment by manderson] [Trac time Fri Jul 6 17:01:05 2012] The performance mismatch becomes even more significant in distributed. It shows no sign of improving even as the number of processors increases: all timings in seconds; startup costs for both codes are not reported; only actual allgather communication cost.
|
[comment by hkaiser] [Trac time Sun Jul 8 20:53:42 2012] The MPI and HPX codes are not comparable. While the MPI version uses MPI_AllGather, which has a complexity of O(N), where N is the number of participants, the algorithm implemented in the HPX example exposes a complexity of O(N*N), it even gathers the local values. What needs to be done is to develop a new algorithm specificly targeted towards HPX (or in general terms, targetted towards message driven models). Additionally, what's interesting from your numbers is that the 8 worker MPI version runs 20 times slower than the version with 1 worker (which shouldn't do anything, btw), while the HPX example's performance is only deterioating 8 times. |
[comment by manderson] [Trac time Sun Jul 8 21:35:59 2012] The MPI and HPX codes do the same thing and are comparable:
Removing the local gather has no impact on the reported results. Removing the O(N*N) complexity in the HPX call would remove the ability to extract asynchrony and defeat the purpose of using HPX. It is difficult to say anything bad about the MPI numbers by using the HPX results since they are orders of magnitude slower. If HPX ran as fast as MPI, would it's scaling behavior be the same? |
[comment by blelbach] [Trac time Mon Jul 9 14:02:24 2012] In each compute iteration, this code passes the GIDs to all the components as an argument to each future. This is probably significantly affecting performance, as the GIDs end up being split every 8 iterations. These GIDs are never updated throughout the lifetime of the computation, so there's absolutely no need to pass them to each call to compute_async. Instead, they should be copied once into a data member of the allgather component (or some similar approach). |
[comment by manderson] [Trac time Mon Jul 9 14:06:35 2012] There is only one iteration in this example. np is the number of components and each component needs to receive the gids of all other components in order to use the stubs in the asynchronous allgather. There are no extraneous gids sent as suggested above. Further, this is not the source of the performance slowdown. You can easily verify this (no need to speculate as above) by simply commenting out the gather in compute. The performance is near optimal then. |
Is this still an issue? Can someone re-run the aforementioned numbers on the top of trunk? |
That seems to be resolved now. Here is the message from Matt:
|
Those GTC numbers were for distributed incorporate many more collectives than allgather. I have re-done the allgather HPX and MPI comparison and the significant mismatch remains regardless of the good results in comparing GTC as a whole in HPX and MPI. This ticket was closed prematurely. |
HPX will always perform worse than MPI with that type of code. It's a matter of the programming model differences between MPI and HPX, not a intrinsic problem of HPX. |
[reported by manderson] [Trac time Fri Jul 6 16:40:11 2012]
ea07d6f
Boost 1.48.0
g++ 4.4
OpenMPI 1.4.2 (for MPI equivalent code)
Release Mode
examples/allgather
examples/allgather/mpi_equivalent
MPI Allgather versus HPX Allgather show HPX unexpectedly an order of magnitude slower than MPI in SMP mode for simple allgather operations.
Performance Results:
tasks MPI HPX
1 4.0E-6 1.1E-4
2 1.3E-5 1.9E-4
4 1.4E-5 3.7E-4
8 9.9E-5 8.2E-4
To reproduce:
The text was updated successfully, but these errors were encountered: