# Part 3: MPI for Python

MPI is the __Message Passing Interface__ - a standard used for parallel programming involving communication between separate parallel processes each with their own separate memory allocation. MPI processes have to pass messages between themselves to invoke code execution and share data between with each other.  

MPI is commonly used in distributed memory systems, computer systems composed of computer nodes, each with their own separate physical memory, such as high-performance computers or cluster machines. Equally, MPI will run just as well on your laptop computer and is not tied to any particular architecture. 

In this mini tutorial we introduce the `mpi4py` Python package, which provides an interface to the MPI libraries, similar to the corresponding C, C++, and Fortran MPI interfaces. 

If you are familiar with existing C/C++ MPI bindings, you will find picking up the syntax in the Python API very similar. However, don't worry if you are completely new to MPI - we only introduce the very basics of the standard here as an example of a Python parallel programming interface.

## Following this tutorial

Jupyter notebooks do not work particularly well with `mpi4py` (It is possible to set them up to work together, but that is beyond the scope of this tutorial.) We will demonstrate a few simple examples in this notebook, but to actually run the code it is easier to copy or type out the examples into a Python script and save it as a `.py` file. You would then execute the python script using the mpi launcher, usually called `mpirun` on most systems. For example to run a sample mpi python script over 4 processes, we would run the python script like so:

```
mpirun -np 4 python mpi_pi.py
```

To get started with any MPI script, we would add to our Python code the following import statement:

In [1]:
from mpi4py import MPI

Then a simple test of `mpi4py` would print "Hello world!" or similar from each MPI process:

In [2]:
from mpi4py import MPI

comm = MPI.COMM_WORLD

print("Hello! I'm rank {} from {} running in total...".format(comm.rank, comm.size))

# Wait for everyone to syncronize here:
comm.Barrier()

Hello! I'm rank 0 from 1 running in total...


So what's going on here?

Firstly we set up a `comm` object, which is a communicator that lets us get information about and talk to the other processes. (You will see it in most MPI programs.)

In the print function, we make a call to `comm.rank`, which gets the ID of the current process in the communicator (its 'rank') and then `comm.size` which gives is the total number of running processes in the communicator object.

At the end we have to call `comm.Barrier` to synchronise the processes in the communicator.

Unfortunately, if running this in a jupyter notebook, we will only likely see rank 0 and 1 total process, because of the way Python is running inside a single process. To see the effects of multiple MPI processes, we would ideally run the above code as a script from the terminal as outlined above.

## Broadcasting example

Another example that demonstrates the interface with MPI is broadcasting a vector to `n` MPI processes. Broadcasting is a commonly used techniqe in MPI where a copy of some data is sent to every process (to be used to perform some further calculation on it by each process).

In this example, we are going to take a 1D numpy array and _broadcast_ it from rank 0 to the other ranks (processes). 

***Note**: Perhaps confusingly MPI and NumPy both use the term _broadcasting_ to mean different things.*

In [3]:
import numpy as np
from mpi4py import MPI

comm = MPI.COMM_WORLD

print("-"*78)
print(" Running on %d cores" % comm.size)
print("-"*78)

comm.Barrier()

# Prepare a vector of N=5 elements to be broadcasted...
N = 5
if comm.rank == 0:
    A = np.arange(N, dtype=np.float64)    # rank 0 has the proper data
else:
    A = np.empty(N, dtype=np.float64)     # all other ranks just an empty array

# Broadcast array A from rank 0 to everybody
comm.Bcast( [A, MPI.DOUBLE] )
# The list argument contains the array to be broadcast and the corresponding
# MPI data type: MPI.DOUBLE

# Everybody should now have the same...
print("[%02d] %s" % (comm.rank, A))

------------------------------------------------------------------------------
 Running on 1 cores
------------------------------------------------------------------------------
[00] [0. 1. 2. 3. 4.]


Note how if you are running this in a Jupyter notebook, you are only going to see `1 core` reported. This is because we are running the python code through a standard Python interpreter, rather than invoking it with a command like `mpirun`. To see the effects of this with multiple MPI processes running at once, you need to run the code above in a script called `mpi_broadcast.py` from the command line with:

`mpirun -np 4 python mpi_broadcast.py`

This command now invokes the `python` interpreter through the `mpirun` executable, allowing the mpi4py python library to interact with the MPI API, and run on multiple cores on a multicore CPU.

The output (from the terminal) should now look something like:

```
------------------------------------------------------------------------------
 Running on 4 cores
------------------------------------------------------------------------------
[00] [0. 1. 2. 3. 4.]
------------------------------------------------------------------------------
 Running on 4 cores
------------------------------------------------------------------------------
[01] [0. 1. 2. 3. 4.]
------------------------------------------------------------------------------
 Running on 4 cores
------------------------------------------------------------------------------
[02] [0. 1. 2. 3. 4.]
------------------------------------------------------------------------------
 Running on 4 cores
------------------------------------------------------------------------------
[03] [0. 1. 2. 3. 4.]
```

Note how now each process has received a copy of the same array from the broadcasting operation. In a real program, we might then proceed to mainpulate this array or perform calculations on it. 

## MPI Pi Example



In [5]:
from mpi4py import MPI
from math   import pi as PI
from numpy  import array

def get_n():
    #prompt  = "Enter the number of intervals: (0 quits) "
    try:
        #n = int(input(prompt))
        n = 1000
        if n < 0: n = 0
    except:
        n = 0
    return n

def comp_pi(n, myrank=0, nprocs=1):
    h = 1.0 / n
    s = 0.0
    for i in range(myrank + 1, n + 1, nprocs):
        x = h * (i - 0.5)
        s += 4.0 / (1.0 + x**2)
    return s * h

def prn_pi(pi, PI):
    message = "pi is approximately %.16f, error is %.16f"
    print  (message % (pi, abs(pi - PI)))

comm = MPI.COMM_WORLD
nprocs = comm.Get_size()
myrank = comm.Get_rank()

n    = array(0, dtype=int)
pi   = array(0, dtype=float)
mypi = array(0, dtype=float)

#while True:
def main():
    if myrank == 0:
        _n = get_n()
        n.fill(_n)
    comm.Bcast([n, MPI.INT], root=0)
#    if n == 0:
#        break
    _mypi = comp_pi(n, myrank, nprocs)
    mypi.fill(_mypi)
    comm.Reduce([mypi, MPI.DOUBLE], [pi, MPI.DOUBLE],
                op=MPI.SUM, root=0)
    if myrank == 0:
        prn_pi(pi, PI)

main()

pi is approximately 3.1415927369231227, error is 0.0000000833333296


## Summary and taking it further

**mpi4py** is a Python interface to the MPI library for message-passing parallel programming. It provides an interface to the powerful Message-Passing Interface standard, a parallel programming standard commonly used in distribute memory parallel programming.

In _Part 1: Multiprocessing_, you may recall we discussed the `multiprocessing` module, another pytohn module that creates separate processes and can be used to distribute parallel tasks. Multiprocessing is limited to creating OS-level processes on a shared-memory computing environment, and (to my knowledge) is not easy to apply across multi-node cluster computers, something which MPI (and by extension `mpi4py`) was designed specifically to do.

So if your problem size requires the resources of larger compute clusters, `mpi4py` may be the more appropriate choice, though the learning curve is inevitably steeper due to requiring knowledge of the MPI standard to get the most out of the Python implementation.

### Useful resources for taking it further

Introductory MPI texts and tutorials would be just as useful for an overview of MPI in greater depth. The current documentation for mpi4py, although good, does tend to assume some knowledge of the concepts behind message passing and distributed memory programming. Therefore, starting with a basic MPI tutorial may be a good first move. E.g.: http://mpitutorial.com/tutorials/ is a nice set of introductory tutorials.

MPI for Python documentation: https://mpi4py.readthedocs.io/en/stable/tutorial.html

Excellent Python MPI tutorial (goes into more depth) https://nyu-cds.github.io/python-mpi/

