## Exercise 1 Hello World

1. Write an MPI program displaying the number of processes used for the execution and the rank of each process.
2. Test the programs obtained with different numbers of threads for the parallel program.

**Output Example**
```shell
Hello from the rank 2 process
Hello from the rank 0 process
Hello from the rank 3 process
Hello from the rank 1 process
Parallel execution of hello_world with 4 process
```
*Note that the output order maybe different*

In [1]:
# !pip install mpi4py

In [2]:
%%file hello.py
from mpi4py import MPI
COMM = MPI.COMM_WORLD
RANK = COMM.Get_rank()
SIZE = COMM.Get_size()

print()
print("Hello from the rank {rank} process".format(rank=RANK))
COMM.Barrier()
if RANK == SIZE-1:
    print("Parallel execution of hello_world with {} process".format(SIZE))

Overwriting hello.py


In [3]:
# enter command for compile and run the program
! mpirun -n 4 python hello.py

Invalid MIT-MAGIC-COOKIE-1 key
Hello from the rank 2 process

Hello from the rank 3 process

Hello from the rank 0 process

Hello from the rank 1 process
Parallel execution of hello_world with 4 process


## Exercise 2 Sharing Data 

A common need is for one process to get data from the user, either by reading from the terminal or command line arguments, and then to distribute this information to all other processors.

Write a program that reads an integer value from the terminal and distributes the value to all of the MPI processes. Each process should print out its rank and the value it received. Values should be read until a negative integer is given as input.

You may want to use these MPI routines in your solution:
`Get_rank` `Bcast` 

**Output Example**
```shell
10
Process 0 got 10
Process 1 got 10
```

In [4]:
%%file sharing.py
from mpi4py import MPI
COMM = MPI.COMM_WORLD
RANK = COMM.Get_rank()

recv = 1

while recv > 0:
    if RANK == 0 :
        send = int(input()) 
    else: 
        send = None
    
    recv = COMM.bcast(send, root=0)
    print("Process {rank} got {recv} ".format(rank=RANK, recv=recv))

Overwriting sharing.py


In [6]:
# enter command for compile and run the program
# ! mpirun -n 2  python3 sharing.py

## Exercise 3 Sending in a ring (broadcast by ring)

Write a program that takes data from process zero and sends it to all of the other processes by sending it in a ring. That is, process i should receive the data and send it to process i+1, until the last process is reached.
Assume that the data consists of a single integer. Process zero reads the data from the user.
![](../data/ring.gif)

You may want to use these MPI routines in your solution:
`Send` `Recv` 

In [7]:
%%file ring.py
from mpi4py import MPI
COMM = MPI.COMM_WORLD
RANK = COMM.Get_rank()
SIZE = COMM.Get_size()

if RANK == 0 :
    print()
    sendbuf = int(input()) 
    COMM.send(sendbuf, dest = RANK+1)

if RANK != 0:
    recvbuf = COMM.recv(source=RANK-1)
    print("Process : ", RANK, "obtain ",recvbuf, "from ",RANK-1)

    if RANK < SIZE - 1:
        COMM.send(recvbuf, dest = RANK+1)

Overwriting ring.py


In [8]:
# enter command for compile and run the program
# ! mpirun -n 4  python ring.py

## Exercise 4 Matrix vector product

1. Use the `MatrixVectorMult.py` file to implement the MPI version of matrix vector multiplication.
2. Process 0 compares the result with the `dot` product.
3. Plot the scalability of your implementation. 

**Output Example**
```shell
CPU time of parallel multiplication using 2 processes is  174.923446
The error comparing to the dot product is : 1.4210854715202004e-14
```

In [7]:
%%file MatrixVectorMult_V0.py
import time 
import numpy as np
from scipy.sparse import lil_matrix
from numpy.random import rand, seed
from numba import njit
from mpi4py import MPI

COMM = MPI.COMM_WORLD
nbproc = COMM.Get_size()
RANK = COMM.Get_rank()

seed(42)

# function générique pour le produit matrice vecteur
def matVectMult(A, b, C):
    
    row, col = A.shape
    for i in range(row):
        a = A[i]
        for j in range(col):
            C[i] += a[j] * b[j]

    return 0


# créons la matrice A et le vecteur b
SIZE = 1000
local_size = SIZE // nbproc


# counts est la liste contenant les blocks de chaque  proc
proc_block = local_size * SIZE
counts =  [proc_block for i in range(nbproc)]

if RANK == 0:
    A = lil_matrix((SIZE, SIZE))
    A[0, :100] = rand(100)
    A[1, 100:200] = A[0, :100]
    A.setdiag(rand(SIZE))
    A = A.toarray()
    b = rand(SIZE)
else :
    A = None
    b = None

## on envoie une copie de b sur chaque proc et on distribue une partie de  A 
## à chaque processus

localMatrix = np.empty((local_size, SIZE), dtype = np.float64)
b = COMM.bcast(b, root = 0)

COMM.Scatterv([A, counts, MPI.DOUBLE], localMatrix, root = 0)

## Au niveau de chaque processus on fait un produit entre la matrice locale et b
localC = np.zeros(local_size)
start = MPI.Wtime()
matVectMult(localMatrix, b, localC)
stop = MPI.Wtime()
if RANK == 0:
    print("\n\n CPU time of parallel multiplication using", nbproc,"processes is ", (stop - start)*1000)


## On rassemble maintenant les résultats obtenus au niveau de chaque processus

sendcounts = [local_size for i in range(nbproc)] 
if RANK == 0: 
    C = np.empty(SIZE, dtype = np.float64)
else :
    C = None

# rassemblement des résultats dans C
COMM.Gatherv(localC,[C, sendcounts, MPI.DOUBLE], root = 0)

if RANK == 0 :
    C_ = A.dot(b)
    print("The error comparing to the dot product is :", np.max(C_ - C))

Overwriting MatrixVectorMult_V0.py


In [9]:
# enter command for compile and run the program
!mpirun -n 2 python3 MatrixVectorMult_V0.py

Invalid MIT-MAGIC-COOKIE-1 key

 CPU time of parallel multiplication using 2 processes is  174.909336
The error comparing to the dot product is : 1.4210854715202004e-14


## Exercise 5 Calculation of π (Monte Carlo)

1. Use the `PiMonteCarlo.py` file to implement the calculation of PI using Monte Carlo.
2. Process 0 prints the result.
3. Plot the scalability of your implementation. 

In [10]:
%%file PiMonteCarlo_V0.py
# write your program here
import random 
import timeit
from mpi4py import MPI


COMM = MPI.COMM_WORLD
nbproc = COMM.Get_size()
RANK = COMM.Get_rank()

INTERVAL = 1000 ** 2

local_int = INTERVAL //nbproc 
random.seed(42)  

def gen_points():
     
    
    nbpoints= 0

    
    for i in range(local_int): 
      
        # on choisit de générer les points x et y 
        # suivant une loi uniforme sur [0,2]
                
        x= random.uniform(0, 2) 
        y= random.uniform(0, 2) 
      
        # Distance entre chaque point (x, y) et le centre du cercle O(1,1)
        dist_centre= (x - 1)**2 + (y - 1)**2
      
        # on vérifie si (x, y) est à l'intérieur du cercle 
        if dist_centre<= 1: 
            nbpoints+= 1
    
    return nbpoints


start = timeit.default_timer()
nb_points = gen_points()
end = timeit.default_timer()

# on fait la somme de tous les points obtenus au niveau de chaque processus
nb_points = COMM.reduce(nb_points, op = MPI.SUM, root = 0)
if RANK == 0:
    
    pi = 4 * nb_points/ INTERVAL
    print('\n')
    print("Circle points number :",nb_points)
    print("Final Estimation of Pi=", pi, "cpu time :",(end-start) * 1000) 


Overwriting PiMonteCarlo_V0.py


In [11]:
# enter command for compile and run the program
!mpirun -n 2 python3 PiMonteCarlo_V0.py

Invalid MIT-MAGIC-COOKIE-1 key

Circle points number : 785596
Final Estimation of Pi= 3.142384 cpu time : 319.21276000502985
