Skip to content

Conversation

@artpol84
Copy link
Contributor

@artpol84 artpol84 commented Jun 21, 2019

Status

Details:

  • As MPI implementation, Open MPI was used.
  • The following simple application [link] was used to test the implementation.
  • It will be updated to increase the coverage.

Test no. 1: Multi-threaded point-to-point

  • cmdline: mpirun -np 2 -x LD_PRELOAD ./mt_test1 -m -t 2 -n 1000 -v p2p-sb-rb
    • -m enables MPI Thread multiple
    • -t 2 defines 2 threads to use on each rank
    • -n 1000 - number of sends/receives each thread is performing
    • -v p2p-sb-rb - test type: each thread of rank=0 performs 1000 blocking sends (using integer thread id as the tag). rank=1 performs matching receives.

Results (mpiP/master)

---------------------------------------------------------------------------
@--- Aggregate Time (top twenty, descending, milliseconds) ----------------
---------------------------------------------------------------------------
Call                 Site       Time    App%    MPI%      Count    COV
Recv                    2       10.6     inf   71.18        876   0.00
Send                    1        4.3     inf   28.82        243   0.00

Number of Send and Recv invocations should be 2000 (1000 iteration * 2 threads),
App time is incorrectly calculated.

Results (mpiP/PR#13)

---------------------------------------------------------------------------
@--- Aggregate Time (top twenty, descending, milliseconds) ----------------
---------------------------------------------------------------------------
Call                 Site       Time    App%    MPI%      Count    COV
Recv                    2       78.2   71.26   54.53       2000   0.00
Send                    1       65.2   59.41   45.47       2000   0.00

The number of Send and Recv is correct, application % is not screwed.

@artpol84 artpol84 mentioned this pull request Jun 21, 2019
@artpol84
Copy link
Contributor Author

I performed the initial performance measurements.
This implementation suffers from the unwind overhead. It looks like dwarf library (used by libunwind in my case) has internal pthread locks protecting some cache.

@artpol84
Copy link
Contributor Author

After reducing the number of TLS accesses (c2c1ccf) the performance of the profile-free MPI seems to be comparable to the case where mpiP is running with -k 0 (no callsite resolution).

@artpol84
Copy link
Contributor Author

I am now working through the test suite coming with mpiP to make sure that it's not broken.
Seeing some issues and will update once fixed.

@artpol84 artpol84 force-pushed the topic/mt branch 6 times, most recently from 252b093 to 8d9ec1b Compare June 27, 2019 23:28
@artpol84
Copy link
Contributor Author

All is cleared now

1. Add arch directory with required atomics implementations (for ARM64,
PPC and x86_64). TODO: add other architectures later.
2. Add a thread-safe list that will be used for tracking TLS objects.
3. Update Makefile.in to build thread-safe list
4. Add stubs for mpiP threads support

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
Introduce a new "mt" (Multi-Threaded) layer for statistics to abstract
multi-threaded isolation logic from the upper layers as well as from the
single-threaded statistics collection.

NOTE: this commit only introduces the layer and internally a 1-to-1
bypass to a single-threaded layer is performed.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
The nested calls flag is located at the single-thread level.
Move the check to the same level as the flag.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
This functionality may be used on different layers.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
@artpol84
Copy link
Contributor Author

@cchambreau, rebased

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants