# GOAL OF LABORATORY WORK
The goal of this task was to show the working of MPI_SEND, MPI_Ssend, MPI_Bsend,
MPI_Rsend) and conclude the performance analysis of each of them.

# TASK DEFINITION
We develop a simple application which sends few bytes of data from one process to another
one.

# BRIEF THEORY

The Open MPI Project is an open source Message Passing Interface implementation that is
developed and maintained by a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise, technologies, and resources from all across the High Performance Computing community in order to build the best MPI library available. Open MPI offers advantages for system and software vendors, application developers and computer science researchers.

MPI has a number of different "send modes." These represent different choices of buffering (where is the data kept until it is received) and synchronization (when does a send complete). In the following, I use "send buffer" for the user-provided buffer to send. 

- MPI_Send
  MPI_Send will not return until you can use the send buffer. It may or may not block (it is allowed to buffer, either on the sender or receiver side, or to wait for the matching receive).
- MPI_Bsend 
  May buffer; returns immediately and you can use the send buffer. A late add-on to the MPI specification. Should be used only when absolutely necessary. 
- MPI_Ssend
  will not return until matching receive posted
- MPI_Rsend
  May be used ONLY if matching receive already posted. User responsible for writing a correct program.
-  MPI_Isend
   Nonblocking send. But not necessarily asynchronous. You can NOT reuse the send buffer until either a successful, wait/test or you KNOW that the message has been received (see MPI_Request_free). Note also that while the I refers to immediate, there is no performance requirement on MPI_Isend. An immediate send must return to the user without requiring a matching receive at the destination. 
   
 An implementation is free to send the data to the destination before returning, as long as the send call does not block waiting for a matching receive. Different
strategies of when to send the data offer different performance advantages and disadvantages that will depend on the application
- MPI_Ibsend buffered nonblocking
- MPI_Issend Synchronous nonblocking. Note that a Wait/Test will complete only when the matching receive is posted.
- MPI_Irsend As with MPI_Rsend, but nonblocking.

Note that "nonblocking" refers ONLY to whether the data buffer is available for reuse after the call. No part of the MPI specification, for example, mandates concurent operation of data transfers and computation.

# RESULT AND EXPERIMENTS

In [1]:
import subprocess
import os


def compile(*defs, **defskw):
    args = [f"-D{k}" for k in defs] + [f"-D{k}={v}" for k, v in defskw.items()]
    _cmd = 'mpicc -o hello hello.c'.split() + args
    # print(' '.join(_cmd))
    cmd = subprocess.run(_cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  
    if(cmd.stdout): print('cmd.stdout', cmd.stdout)
    if(cmd.stderr): print('cmd.stderr', cmd.stderr)
    
def run(env=None):
    cmd = subprocess.run('mpiexec -np 2 ./hello'.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE, env=env)
    if(cmd.stderr): print('cmd.stderr', cmd.stderr)
    


## Message length

Lets analyze the effect the message lenght has on the perofmance of the applocaion.

In [6]:

for i in range(8):
    compile(MSG_LEN=10**i)
    print(f"Using message length {MSG_LEN}")
    %timeit run()
    print()


mpicc -o hello hello.c -DMSG_LEN=32
12 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
mpicc -o hello hello.c -DMSG_LEN=61
12 ms ± 113 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
mpicc -o hello hello.c -DMSG_LEN=128
12.2 ms ± 604 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
mpicc -o hello hello.c -DMSG_LEN=256
12 ms ± 50.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
mpicc -o hello hello.c -DMSG_LEN=512
16.5 ms ± 4.27 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
mpicc -o hello hello.c -DMSG_LEN=1024
20.4 ms ± 3.01 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)


## MPI send method

Lets analyze the effect the send method has on the perofmance of the application.

In [25]:
for snd in 'MPI_Rsend MPI_Ssend MPI_Send'.split():
    
    print(f"Using message length {MSG_LEN}")
    compile(MSG_LEN=10**i, SEND_FN=snd)
    %timeit run()


---
mpicc -o hello hello.c -DMSG_LEN=1 -DSEND_FN=MPI_Rsend
13.6 ms ± 1.46 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
mpicc -o hello hello.c -DMSG_LEN=10 -DSEND_FN=MPI_Rsend
12.4 ms ± 834 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
mpicc -o hello hello.c -DMSG_LEN=100 -DSEND_FN=MPI_Rsend
12.2 ms ± 250 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
mpicc -o hello hello.c -DMSG_LEN=1000 -DSEND_FN=MPI_Rsend
12.7 ms ± 731 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
mpicc -o hello hello.c -DMSG_LEN=10000 -DSEND_FN=MPI_Rsend
14.5 ms ± 705 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
mpicc -o hello hello.c -DMSG_LEN=100000 -DSEND_FN=MPI_Rsend
25.4 ms ± 1.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
mpicc -o hello hello.c -DMSG_LEN=1000000 -DSEND_FN=MPI_Rsend
156 ms ± 14.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
mpicc -o hello hello.c -DMSG_LEN=10000000 -DSEND_FN=MPI_Rsend
153 ms ± 6.14 ms per loop (mean 

In [27]:

for snd in 'ASYNC SEND_RECV'.split():
    print('---')
    for i in range(5):
        compile(snd, MSG_LEN=10**i)
        %timeit run()

---
mpicc -o hello hello.c -DASYNC -DMSG_LEN=1


KeyboardInterrupt: 

# CONCLUSION

The best performance is likely if you can write your program so that you could use
just MPI_Ssend for larger data while for smaller MPI)Send performs better because
for larger data MPI_Ssend can completely avoid buffering data. While MPI_Send
allows the MPI implementation the maximum flexibility in choosing how to deliver
your data. Use MPI_Bsend only when it is too inconvienent to use MPI_Isend as
MPI_Bsend returns the buffer immediately. The remaining routines, MPI_Rsend,
MPI_Issend, etc., are rarely used but may be of value in writing system-dependent
message-passing code entirely within MPI.