## Exercise 1 Hello World

1. Write an MPI program displaying the number of processes used for the execution and the rank of each process.
2. Test the programs obtained with different numbers of threads for the parallel program.

**Output Example**
```shell
Hello from the rank 2 process
Hello from the rank 0 process
Hello from the rank 3 process
Hello from the rank 1 process
Parallel execution of hello_world with 4 process
```
*Note that the output order maybe different*

In [1]:
%%file hello.py
#hello.py
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
N=MPI.COMM.Get_size()
print("hello world from process ", rank)
comm.Barrier()
#comm.Bcast(buf, root=0)
if rank==0:
    print("Parallel execution of hello_world with ",N," process")

Overwriting hello.py


In [7]:
!mpirun -n 2  hello.py

## Exercise 2 Sharing Data 

A common need is for one process to get data from the user, either by reading from the terminal or command line arguments, and then to distribute this information to all other processors.

Write a program that reads an integer value from the terminal and distributes the value to all of the MPI processes. Each process should print out its rank and the value it received. Values should be read until a negative integer is given as input.

You may want to use these MPI routines in your solution:
`Get_rank` `Bcast` 

**Output Example**
```shell
10
Process 0 got 10
Process 1 got 10
```

In [24]:
%%file sharing.py
# write your program here
comm = MPI.COMM_WORLD
RANK = comm.Get_rank( )
token = 0
while(token == 0):

    if RANK == 0 :
        sendbuf = input("Give an integer")
    else :
        sendbuf = None
                   
    recvbuf = comm.bcast( sendbuf , root=0 )
    print("I am process number ",RANK," I received ", sendbuf)
    if recvbuf < 0 : break

Writing sharing.py


In [28]:
# enter command for compile and run the program
!mpirun -n 2  sharing.py

## Exercise 3 Sending in a ring (broadcast by ring)

Write a program that takes data from process zero and sends it to all of the other processes by sending it in a ring. That is, process i should receive the data and send it to process i+1, until the last process is reached.
Assume that the data consists of a single integer. Process zero reads the data from the user.
![](../data/ring.gif)

You may want to use these MPI routines in your solution:
`Send` `Recv` 

In [8]:
%%file send_recv.py
comm = MPI.COMM_WORLD
RANK = comm.Get_rank()
N=MPI.COMM.Get_size()
tag = 0
if RANK == 0:
    sendbuf = int(input("Give an integer"))
    recvbuf=COMM.sendrecv(sendbuf, RANK + 1,sendtag = tag ,recvtag= tag)
    print("I am process ", RANK + 1," Received ", recvbuf," from process ", Rank)
else :
    if RANK != N-1:
        recvbuf=COMM.sendrecv(sendbuf, RANK + 1,sendtag = tag ,recvtag = tag)
        print("I am process ", RANK +1," Received ", recvbuf," from process ", RANK)

Writing send_recv.py


In [None]:
# enter command for compile and run the program
!mpirun -n 2 python3 send_recv.py

## Exercise 4 Matrix vector product

1. Use the `MatrixVectorMult.py` file to implement the MPI version of matrix vector multiplication.
2. Process 0 compares the result with the `dot` product.
3. Plot the scalability of your implementation. 

**Output Example**
```shell
CPU time of parallel multiplication using 2 processes is  174.923446
The error comparing to the dot product is : 1.4210854715202004e-14
```

In [29]:
 %%file MatrixVectorMult_V0.py
 # write your program here
import numpy as np
from scipy.sparse import lil_matrix
from numpy.random import rand, seed
from mpi4py import MPI


''' This program compute parallel csc matrix vector multiplication using mpi '''

COMM = MPI.COMM_WORLD
nbOfproc = COMM.Get_size()
RANK = COMM.Get_rank()

seed(42)

def matrixVectorMult(A, b, x):
    
    row, col = A.shape
    for i in range(row):
        a = A[i]
        for j in range(col):
            x[i] += a[j] * b[j]

    return 0

########################initialize matrix A and vector b ######################
#matrix sizes
SIZE = 1000
Local_size = int(SIZE/nbOfproc)

# counts = block of each 
proc_block = Local_size * SIZE
counts = [proc_block for i in range(nbOfproc)]

if RANK == 0:
    A = lil_matrix((SIZE, SIZE))
    A[0, :100] = rand(100)
    A[1, 100:200] = A[0, :100]
    A.setdiag(rand(SIZE))
    A = A.toarray()
    b = rand(SIZE)
else :
    A = None
    b = None



#########Send b to all procs and scatter A (each proc has its own local matrix#####
LocalMatrix = np.zeros((Local_size, SIZE), dtype = np.float64)
# Scatter the matrix A
COMM.Scatterv([A, counts, MPI.DOUBLE], LocalMatrix, root = 0)

#####################Compute A*b locally#######################################
LocalX = np.zeros(Local_size) 

start = MPI.Wtime()
matrixVectorMult(LocalMatrix, b, LocalX)
stop = MPI.Wtime()
if RANK == 0:
    print("CPU time of parallel multiplication is ", (stop - start)*1000)

##################Gather te results ###########################################
sendcounts = [Local_size for i in range(nbOfproc)] 

if RANK == 0: 
     X = np.zeros(SIZE, dtype = np.float64)
else :
     X = None

# Gather the result into X
COMM.Gatherv(LocalX,[X, sendcounts, MPI.DOUBLE], root = 0)
##################Print the results ###########################################

if RANK == 0 :
    X_ = A.dot(b)
    print("The result of A*b using dot is :", np.max(X_ - X))

Overwriting MatrixVectorMult_V0.py


In [30]:
# enter command for compile and run the program
! mpirun -n 2 python MatrixVectorMult_V0.py

## Exercise 5 Calculation of π (Monte Carlo)

1. Use the `PiMonteCarlo.py` file to implement the calculation of PI using Monte Carlo.
2. Process 0 prints the result.
3. Plot the scalability of your implementation. 

In [2]:
%%file PiMonteCarlo_V0.py
# write your program here
import random 
import timeit

INTERVAL= 1000

random.seed(42)  

def compute_points():
    
    random.seed(42)  
    
    circle_points= 0

    # Total Random numbers generated= possible x 
    # values* possible y values 
    for i in range(INTERVAL**2): 
      
        # Randomly generated x and y values from a 
        # uniform distribution 
        # Rannge of x and y values is -1 to 1 
                
        rand_x= random.uniform(-1, 1) 
        rand_y= random.uniform(-1, 1) 
      
        # Distance between (x, y) from the origin 
        origin_dist= rand_x**2 + rand_y**2
      
        # Checking if (x, y) lies inside the circle 
        if origin_dist<= 1: 
            circle_points+= 1
      
        # Estimating value of pi, 
        # pi= 4*(no. of points generated inside the  
        # circle)/ (no. of points generated inside the square) 
    
     
    
    return circle_points




start = timeit.default_timer()
circle_points = compute_points()
end = timeit.default_timer()



if RANK == 0:
    pi = 4* circle_points/ INTERVAL**2 
    print("Circle points number :",circle_points)
    print("Final Estimation of Pi=", pi, "cpu time :",end-start) 

Writing PiMonteCarlo_V0.py


In [None]:
# enter command for compile and run the program
! mpirun -n 2 python PiMonteCarlo_V0.py