## $\textbf{Exercise 1 Hello World}$

1. Write an MPI program displaying the number of processes used for the execution and the rank of each process.
2. Test the programs obtained with different numbers of threads for the parallel program.

$\textbf{Output Example}$
```shell
Hello from the rank 2 process
Hello from the rank 0 process
Hello from the rank 3 process
Hello from the rank 1 process
Parallel execution of hello_world with 4 process
```
*Note that the output order maybe different*

In [2]:
%%file hello.py
from mpi4py import MPI
if __name__ == '__main__':
    
    COMM = MPI.COMM_WORLD
    SIZE =  RANK = COMM.Get_size()
    RANK = COMM.Get_rank()

    print(f"Hello from the rank {RANK} process")

    COMM.Barrier()
    if RANK == 0:

        print(f"Parallel execution of hello_world with {SIZE} processes")
        

Overwriting hello.py


In [3]:
# enter command for compile and run the program
#!mpirun -n 2 python hello.py
!mpirun -n 3 --allow-run-as-root python hello.py

Invalid MIT-MAGIC-COOKIE-1 keyHello from the rank 2 process
Hello from the rank 0 process
Hello from the rank 1 process
Parallel execution of hello_world with 3 processes


In [4]:
!mpirun -n 4 --allow-run-as-root python hello.py

Invalid MIT-MAGIC-COOKIE-1 keyHello from the rank 1 process
Hello from the rank 3 process
Hello from the rank 2 process
Hello from the rank 0 process
Parallel execution of hello_world with 4 processes


In [5]:
!mpirun -n 2 --allow-run-as-root python hello.py

Invalid MIT-MAGIC-COOKIE-1 keyHello from the rank 1 process
Hello from the rank 0 process
Parallel execution of hello_world with 2 processes


## $\textbf{Exercise 2 Sharing Data }$

A common need is for one process to get data from the user, either by reading from the terminal or command line arguments, and then to distribute this information to all other processors.

Write a program that reads an integer value from the terminal and distributes the value to all of the MPI processes. Each process should print out its rank and the value it received. Values should be read until a negative integer is given as input.

You may want to use these MPI routines in your solution:
`Get_rank` `Bcast` 

**Output Example**
```shell
10
Process 0 got 10
Process 1 got 10
```

In [105]:
%%file sharing.py

from mpi4py import MPI

if __name__ == '__main__':
    COMM = MPI.COMM_WORLD
    RANK = COMM.Get_rank()

    if RANK == 0 :
        data_send = int(input(" Enter data : "))
    else :
        data_send = None

    received = COMM.bcast(data_send , root=0)
    print("\n")
    print(f"Proces {RANK} got {received}".format(RANK, received))

Overwriting sharing.py


In [106]:
# enter command for compile and run the program
!mpirun -n 3 python sharing.py run in a terminal for the cas of the input function

Invalid MIT-MAGIC-COOKIE-1 key

Proces 0 got 77


Proces 1 got 77


Proces 2 got 77


## $\textbf{Exercise 3 Sending in a ring (broadcast by ring)}$

Write a program that takes data from process zero and sends it to all of the other processes by sending it in a ring. That is, process i should receive the data and send it to process i+1, until the last process is reached.
Assume that the data consists of a single integer. Process zero reads the data from the user.
![](../data/ring.gif)

You may want to use these MPI routines in your solution:
`Send` `Recv` 

In [97]:
%%file ring.py

from mpi4py import MPI

COMM = MPI.COMM_WORLD
RANK = COMM.Get_rank()
SIZE = COMM.Get_size()

if RANK == 0:
    data_send = int(input("Enter data : "))
    COMM.send(data_send, dest = RANK+1)

if RANK != 0:
    RECEIVER = RANK - 1
    received = COMM.recv(source=RECEIVER)
    print("\n")
    print(f"Process {RANK} got {received} from process {RECEIVER}".format(RANK, received, RECEIVER))

    if RANK < SIZE - 1:
        COMM.send(received, dest = RANK+1)

Overwriting ring.py


In [98]:
!mpirun -n 4 python ring.py 

Invalid MIT-MAGIC-COOKIE-1 key

Process 1 got 77 from process 0


Process 2 got 77 from process 1


Process 3 got 77 from process 2


## $\textbf{Exercise 4 Matrix vector product}$

1. Use the `MatrixVectorMult.py` file to implement the MPI version of matrix vector multiplication.
2. Process 0 compares the result with the `dot` product.
3. Plot the scalability of your implementation. 

**Output Example**
```shell
CPU time of parallel multiplication using 2 processes is  174.923446
The error comparing to the dot product is : 1.4210854715202004e-14
```

In [93]:
%%file MatrixVectorMult_V0.py

import time 
import numpy as np
from scipy.sparse import lil_matrix
from numpy.random import rand, seed
from numba import njit
from mpi4py import MPI

COMM = MPI.COMM_WORLD
PRECESSES = COMM.Get_size()
RANK = COMM.Get_rank()


def MatProduct(A, B, C):
    
    rows, columns = A.shape
    for i in range(rows):
        a = A[i]
        for j in range(columns):
            C[i] += a[j] * B[j]

    return 0


LENGTH = 1000
LOCAL_LENGTH = LENGTH // PRECESSES

process_block = LOCAL_LENGTH * LENGTH
counts =  [process_block for i in range(PRECESSES)]

if RANK == 0:
    A = lil_matrix((LENGTH, LENGTH))
    A[0, :100] = rand(100)
    A[1, 100:200] = A[0, :100]

    A.setdiag(rand(LENGTH))
    A = A.toarray()
    b = rand(LENGTH)
else :
    A = None
    b = None

localMatrix = np.empty((LOCAL_LENGTH, LENGTH), dtype = np.float64)
b = COMM.bcast(b, root = 0)

COMM.Scatterv([A, counts, MPI.DOUBLE], localMatrix, root = 0)

localC = np.zeros(LOCAL_LENGTH)
START = MPI.Wtime()
MatProduct(localMatrix, b, localC)

END = MPI.Wtime()
if RANK == 0:
    print("\n")
    print("CPU time of parallel multiplication using", PRECESSES,"processes is ", (END - START)*1000)


sendcounts = [LOCAL_LENGTH for i in range(PRECESSES)] 
if RANK == 0: 
    C = np.empty(LENGTH, dtype = np.float64)
else :
    C = None

COMM.Gatherv(localC,[C, sendcounts, MPI.DOUBLE], root = 0)

if RANK == 0 :
    C_TRUE = A.dot(b)
    print("The error comparing to the dot product is :", np.max(C_TRUE - C))

Overwriting MatrixVectorMult_V0.py


In [94]:
# enter command for compile and run the program
!mpirun -n 2 python MatrixVectorMult_V0.py

Invalid MIT-MAGIC-COOKIE-1 key

CPU time of parallel multiplication using 2 processes is  189.78453100000002
The error comparing to the dot product is : 3.552713678800501e-15


In [95]:
# enter command for compile and run the program
!mpirun -n 3 python MatrixVectorMult_V0.py

Invalid MIT-MAGIC-COOKIE-1 key

CPU time of parallel multiplication using 3 processes is  122.155525
The error comparing to the dot product is : 0.11809239991176056


In [96]:
# enter command for compile and run the program
!mpirun -n 4 python MatrixVectorMult_V0.py

Invalid MIT-MAGIC-COOKIE-1 key

CPU time of parallel multiplication using 4 processes is  149.14102300000002
The error comparing to the dot product is : 1.7763568394002505e-14


## $\textbf{Exercise 5 Calculation of π (Monte Carlo)}$

1. Use the `PiMonteCarlo.py` file to implement the calculation of PI using Monte Carlo.
2. Process 0 prints the result.
3. Plot the scalability of your implementation. 

In [91]:
%%file PiMonteCarlo_V0.py

import random 
import timeit
from mpi4py import MPI

COMM = MPI.COMM_WORLD
PROCESSES = COMM.Get_size()
RANK = COMM.Get_rank()

INTERVAL = 1000 ** 2

LOCAL_INT = INTERVAL // PROCESSES 
random.seed(42)  

def generate_points():

    points = 0
    
    for _ in range(LOCAL_INT):

        x = random.uniform(0, 2) 
        y = random.uniform(0, 2) 

        if (x - 1)**2 + (y - 1)**2 <= 1: 
            points += 1
    
    return points 


start = timeit.default_timer()
POINTS = generate_points()
end = timeit.default_timer()

POINTS = COMM.reduce(POINTS, op = MPI.SUM, root = 0)
if RANK == 0:
    
    PI = 4 * POINTS/ INTERVAL
    print("\n")
    print("Circle points => ", POINTS)
    print("PI estimation => ", PI)
    print("CPU TIME => ", (end - start) * 1000)

Overwriting PiMonteCarlo_V0.py


In [92]:
# Running the above function
!mpirun -n 4 python PiMonteCarlo_V0.py

Invalid MIT-MAGIC-COOKIE-1 key

Circle points =>  785032
PI estimation =>  3.140128
CPU TIME =>  177.258093999626
