## Exercise 1 Hello World

1. Write an MPI program displaying the number of processes used for the execution and the rank of each process.
2. Test the programs obtained with different numbers of threads for the parallel program.

**Output Example**
```shell
Hello from the rank 2 process
Hello from the rank 0 process
Hello from the rank 3 process
Hello from the rank 1 process
Parallel execution of hello_world with 4 process
```
*Note that the output order maybe different*

In [1]:
from mpi4py import MPI

In [28]:
%%file hello_mpi.py
from mpi4py import MPI
COMM = MPI.COMM_WORLD
n = COMM.Get_size()

RANK = COMM.Get_rank()
print("hello from the rank {RANK} thread".format(RANK = RANK))

Overwriting hello_mpi.py


In [31]:
#mpirun -n 3 python hello_mpi.py

Results : 

hello from the rank 0 thread

hello from the rank 1 thread

hello from the rank 2 thread



## Exercise 2 Sharing Data


A common need is for one process to get data from the user, either by reading from the terminal or command line arguments, and then to distribute this information to all other processors.

Write a program that reads an integer value from the terminal and distributes the value to all of the MPI processes. Each process should print out its rank and the value it received. Values should be read until a negative integer is given as input.

You may want to use these MPI routines in your solution: Get_rank Bcast

In [2]:
%%file sharing.py
from mpi4py import MPI
import numpy as np
COMM = MPI.COMM_WORLD
RANK = COMM.Get_rank( )
if RANK == 0 :
    sendbuf = int(input())
else :
    sendbuf = None

recvbuf = COMM.bcast(sendbuf , root=0 )
print( "the rank : {RANK} got the value : {recvbuf}".format(RANK=RANK,recvbuf=recvbuf))


Overwriting sharing.py


In [51]:
# enter command for compile and run the program
#mpirun -n 6 python hati.py

Results : 

2

the rank : 2 got the value : 2

the rank : 3 got the value : 2

the rank : 4 got the value : 2

the rank : 5 got the value : 2

the rank : 0 got the value : 2

the rank : 1 got the value : 2

## Exercise 3 Sending in a ring (broadcast by ring)

Write a program that takes data from process zero and sends it to all of the other processes by sending it in a ring. That is, process i should receive the data and send it to process i+1, until the last process is reached.
Assume that the data consists of a single integer. Process zero reads the data from the user.
![](../data/ring.gif)

You may want to use these MPI routines in your solution:
`Send` `Recv` 

In [3]:
%%file ring.py
from mpi4py import MPI
COMM = MPI.COMM_WORLD
SIZE = COMM.Get_size()
RANK = COMM.Get_rank()
#print(RANK)
sendbuf =  1000
tag = 9
if RANK == 0 :
    COMM.send(sendbuf,dest= RANK + 1 , tag=tag )
    print( "Process "+str(RANK)+" got "+str(sendbuf)+"\n")
elif RANK < SIZE - 1 :
    COMM.send(sendbuf,dest= RANK + 1 , tag=tag )
    print( "Process "+str(RANK)+" got "+str(sendbuf)+" from the process : "+str(RANK-1)+"\n")
else :
    COMM.recv( source= RANK-1,tag=tag)
    print( "Process "+str(RANK)+" got "+str(sendbuf)+" from the process : "+str(RANK-1)+"\n")   

Overwriting ring.py


In [None]:
#mpirun -n 6 python ring.py

Results : 

Process 3 got 1000 from the process : 2

Process 4 got 1000 from the process : 3

Process 5 got 1000 from the process : 4

Process 0 got 1000

Process 1 got 1000 from the process : 0

Process 2 got 1000 from the process : 1

## Exercise 4 Matrix vector product

1. Use the `MatrixVectorMult.py` file to implement the MPI version of matrix vector multiplication.
2. Process 0 compares the result with the `dot` product.
3. Plot the scalability of your implementation. 

**Output Example**
```shell
CPU time of parallel multiplication using 2 processes is  174.923446
The error comparing to the dot product is : 1.4210854715202004e-14
```

In [5]:
 %%file MatrixVectorMult_V0.py
 # write your program here
import numpy as np
from scipy.sparse import lil_matrix
from numpy.random import rand, seed

from mpi4py import MPI
COMM = MPI.COMM_WORLD
nbOfproc = COMM.Get_size()
RANK = COMM.Get_rank()

seed(42)

def matrixVectorMult(A, b, x):
    
    row, col = A.shape
    for i in range(row):
        a = A[i]
        for j in range(col):
            x[i] += a[j] * b[j]

    return 0

########################initialize matrix A and vector b ######################
#matrix sizes
SIZE = 1000
#Local_size = 

# counts = block of each proc
#counts = 

if RANK == 0:
    A = lil_matrix((SIZE, SIZE))
    A[0, :100] = rand(100)
    A[1, 100:200] = A[0, :100]
    A.setdiag(rand(SIZE))
    A = A.toarray()
    b = rand(SIZE)
else :
    A = None
    b = None

start = MPI.Wtime()
matrixVectorMult(LocalMatrix, b, LocalX)
stop = MPI.Wtime()
if RANK == 0:
    print("CPU time of parallel multiplication is ", (stop - start)*1000)
if RANK == 0 :
    X_ = A.dot(b)
    print("The result of A*b using dot is :", np.max(X_ - X))
    # print("The result of A*b using parallel version is :", X)
    

Writing MatrixVectorMult_V0.py


In [89]:
 %%file MatrixVectorMult_V0.py
import numpy as np
from scipy.sparse import lil_matrix
from numpy.random import rand, seed

from mpi4py import MPI


''' This program compute parallel csc matrix vector multiplication using mpi '''
def split(a, n):
    k, m = divmod(len(a), n)
    return np.array([a[i * k + min(i, m):(i + 1) * k + min(i + 1, m)] for i in range(n)])


COMM = MPI.COMM_WORLD
nbOfproc = COMM.Get_size()
RANK = COMM.Get_rank()

#print(nbOfproc)
#print(RANK)
seed(42)

def matrixVectorMult(A, b, x):
    
    row, col = A.shape
    for i in range(row):
        a = A[i]
        for j in range(col):
            x[i] += a[j] * b[j]

    return 0

########################initialize matrix A and vector b ######################
#matrix sizes
SIZE = 1000
Local_size = 500

# counts = block of each proc
counts = 4 

if RANK == 0:
    A = lil_matrix((SIZE, SIZE))
    A[0, :100] = rand(100)
    A[1, 100:200] = A[0, :100]
    A.setdiag(rand(SIZE))
    A = A.toarray(order='C')
    b = rand(SIZE)
    X_ = A.dot(b)
    A = split(A, nbOfproc)
else :
    A = None
    b = None


#########Send b to all procs and scatter A (each proc has its own local matrix#####
recvbuf = COMM.bcast(b , root=0 )
# Scatter the matrix A
LocalMatrix = COMM.scatter(A,root=0)
COMM.Barrier()
#print("I'm the rank : "+str(RANK)+" I got "+str(LocalMatrix))
#####################Compute A*b locally#######################################
LocalX = np.ones((Local_size,1))

start = MPI.Wtime()
#print(RANK)
matrixVectorMult(LocalMatrix, recvbuf, LocalX)
stop = MPI.Wtime()
if RANK == 0:
    print("CPU time of parallel multiplication is ", (stop - start)*1000)
##################Gather te results ###########################################
# sendcouns = local size of result
sendcounts = LocalX.shape
if RANK == 0: 
    X = LocalX
else :
     X = LocalX

# Gather the result into X
X = COMM.gather(X,root=0)
if RANK == 0 :
    print("The result of A*b using dot is :", X_ )
    print("The result of A*b using parallel version is :", X)


Overwriting MatrixVectorMult_V0.py


In [90]:
# mpirun -n 3 python MatrixVectorMult_V0.py

The result for 3 threads is : 

CPU time of parallel multiplication is  1056.589517


The result of A*b using dot is : [2.63781326e+01 2.13071154e+01 2.77823379e-01 4.71827620e-01
 9.02944984e-01 4.33507343e-02 1.62610958e-01 5.72887944e-01
 1.59248241e-01 1.18468930e-02 2.36388833e-01 3.61845979e-02
 2.08082711e-01 4.33939974e-01 3.75570382e-01 5.05522269e-01
 7.35254166e-02 1.63707925e-01 2.37063708e-01 6.98528680e-02
 7.17610888e-01 8.56350993e-01 2.74159578e-01 8.90888020e-02
 ...

The result of A*b using parallel version is : [array([[27.37813265],
       [22.30711543],
       [ 1.27782338],
       [ 1.47182762],
       [ 1.90294498],
       [ 1.04335073],
       [ 1.16261096],
       [ 1.57288794],...

## Exercise 5 Calculation of π (Monte Carlo)

1. Use the `PiMonteCarlo.py` file to implement the calculation of PI using Monte Carlo.
2. Process 0 prints the result.
3. Plot the scalability of your implementation. 

In [99]:
%%file PiMonteCarlo_V0.py
 # write your program here
import random 
import timeit
from mpi4py import MPI
import numpy as np
COMM = MPI.COMM_WORLD
size = COMM.Get_size()
rank= COMM.Get_rank()

INTERVAL= 1000

random.seed(42)  
if rank == 0:
    total = np.zeros(1)
else:
    total = None
def compute_points():  
    
    circle_points= np.zeros(1)

    # Total Random numbers generated= possible x 
    # values* possible y values 
    num_per_rank = INTERVAL**2 // size # the floor division // rounds the result down to the nearest whole number.
    lower_bound = 1 + rank * num_per_rank
    upper_bound = 1 + (rank + 1) * num_per_rank
    print("This is processor ", rank, "and I am executing the loop from", lower_bound," to ", upper_bound - 1, flush=True)
    COMM.Barrier()
    for i in range(lower_bound, upper_bound):
        
        # Randomly generated x and y values from a 
        # uniform distribution 
        # Rannge of x and y values is -1 to 1 
                
        rand_x= random.uniform(-1, 1) 
        rand_y= random.uniform(-1, 1) 
      
        # Distance between (x, y) from the origin 
        origin_dist= rand_x**2 + rand_y**2
      
        # Checking if (x, y) lies inside the circle 
        if origin_dist<= 1: 
            circle_points[0]+= 1
      
        # Estimating value of pi, 
        # pi= 4*(no. of points generated inside the  
        # circle)/ (no. of points generated inside the square) 
    
    COMM.Barrier() 
    # collect the partial results and add to the total sum
    COMM.Reduce(circle_points, total, op=MPI.SUM, root=0)
    
    return total

start = timeit.default_timer()
circle_points = compute_points()
end = timeit.default_timer()


pi = 4* circle_points/ INTERVAL**2 
print("Circle points number :",circle_points)
print("Final Estimation of Pi=", pi, "cpu time :",end-start) 

Overwriting PiMonteCarlo_V0.py


In [None]:
# enter command for compile and run the program

The result with 6 process :

This is processor  3 and I am executing the loop from 499999  to  666664

This is processor  4 and I am executing the loop from 666665  to  833330

This is processor  0 and I am executing the loop from 1  to  166666

This is processor  2 and I am executing the loop from 333333  to  499998

This is processor  1 and I am executing the loop from 166667  to  333332

This is processor  5 and I am executing the loop from 833331  to  999996

Circle points number : [785028.]
Final Estimation of Pi= [3.140112] cpu time : 0.2647160682827234