# MPI Assignments

### Exercise 1: Hello World
1. Write an MPI program which prints the message "Hello World"
2. Modify your program so that each process prints out both its rank and the total number of processes P that the code is running on, i.e. the size of `MPI_COMM_WORLD`.
3. Modify your program so that only a single controller process (e.g. rank 0) prints out a message (very useful when you run with hundreds of processes).
4. What happens if you omit the final MPI procedure call in your program?


In [3]:
from mpi4py import MPI

#Communicator, Rank and size
COMM = MPI.COMM_WORLD
SIZE = COMM.Get_size()
RANK = COMM.Get_rank()

print("I am the proccess {RANK} among {SIZE}".format(RANK =RANK, SIZE =SIZE))

I am the proccess 0 among 1


In [10]:
from mpi4py import MPI

#Initialize MPI environment
comm = MPI.COMM_WORLD

#Get the total number of processes
world_size = comm.Get_size()

#Get the rank of the current process
rank = comm.Get_rank()

#print "Hello World " message from each process
print(f"Hello World from process {rank} of {world_size}")



### Q3

if rank == 0:
    print("***"*10,"Q4","***"*10)
    print(f"I am the process {rank} of {world_size}")

Hello World from process 0 of 1
****************************** Q4 ******************************
I am the process 0 of 1


If you omit the final MPI procedure call MPI_Finalize() in your MPI program, the MPI environment will not be properly finalized before the program exits. This may lead to undefined behavior or even cause the program to hang indefinitely.

The MPI_Finalize() procedure is responsible for cleaning up any MPI-related resources that were initialized during MPI_Init(). This includes freeing up memory, closing communication channels, and releasing other resources that may have been allocated by the MPI implementation. If you don't call MPI_Finalize(), the MPI environment may not be able to properly release these resources, leading to potential memory leaks or other issues.

It is always good practice to include the MPI_Finalize() call at the end of your MPI program to ensure that the MPI environment is properly finalized and all resources are cleaned up before the program exits.


### Exercise 2: Sharing Data
Create a program that obtains an integer input from the terminal and distributes it to all the MPI processes.
Each process must display its rank and the received value. 
Keep reading values until a negative integer is entered.
**Output Example**
```shell
10
Process 0 got 10
Process 1 got 10
```


In [12]:
from mpi4py import MPI
import numpy as np
COMM = MPI.COMM_WORLD
RANK = COMM.Get_rank()

while True:
    if RANK == 0:
        sendbuf = int(input("Enter a number: "))
    else:
        sendbuf = 0
    sendbuf = COMM.bcast(sendbuf, root=0)
    if sendbuf <= 0:
        break
    if RANK == 0:
        print("I am the process 0")
    else:
        print("I am the process {rank}, I received data {data} from 0".format(rank=RANK, data=sendbuf))

Enter a number: 45
I am the process 0
Enter a number: 45
I am the process 0
Enter a number: 67
I am the process 0
Enter a number: 98
I am the process 0
Enter a number: 0



### Exercise 3 Sending in a ring (broadcast by ring)

Write a program that takes data from process zero and sends it to all of the other processes by sending it in a ring. That is, process i should receive the data add the rank of the process to it then send it to process i+1, until the last process is reached.
Assume that the data consists of a single integer. Process zero reads the data from the user.
print the process rank and the value received.


![ring](../data/ring.gif)

You may want to use these MPI routines in your solution:
`Send` `Recv` 




In [None]:
from mpi4py import MPI
import numpy as np
COMM = MPI.COMM_WORLD
RANK = COMM.Get_rank()
size = COMM.Get_size()

while True:
    if RANK == 0:
        x = int(input("Put a number: "))
        COMM.send(x, RANK +1)
    else:
        x = COMM.recv(source = RANK-1)
        print("processus recepteur est : ",RANK)
        if RANK < size -1 :
            if x <0 : x-= RANK
            COMM.send(x + RANK, RANK +1)
    if x < 0 :
        break
    print("rank : ", RANK, "data ",x)

In [1]:
import numpy as np

arr1 = np.array([[1, 2, 3], [4, 5, 6]])

print(f'Original Array:\n{arr1}')

arr1_transpose = arr1.transpose()

print(f'Transposed Array:\n{arr1_transpose}')

Original Array:
[[1 2 3]
 [4 5 6]]
Transposed Array:
[[1 4]
 [2 5]
 [3 6]]



### Exercise 4: Scattering Matrix
1. Create an n by m matrix A on processor 0.
2. Use MPI_Scatterv to send parts of the matrix to the other processors.
3. Processor 1 receives A(i,j) for i=0 to (n/2)-1 and j=m/2 to m-1.
4. Processor 2 receives A(i,j) for i=n/2 to n-1 and j=0 to (m/2)-1.
5. Processor 3 receives A(i,j) for i=n/2 to n-1 and j=m/2 to m-1.
**Example:** using n=m=8 for simplicity.


![N2utM.png](attachment:N2utM.png)


In [3]:
from mpi4py import MPI
import numpy as np

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
n = 8
m = 8
if rank == 0:
    n = 8
    m = 8
    A = np.zeros((n, m))
    for i in range(n):
        for j in range(m):
            A[i, j] = i * m + j + 1
    print("Original matrix on processor 0:")
    print(A)
# Divide the matrix into parts to send to each processor
    sendcounts = np.zeros(size, dtype=int)
    displs = np.zeros(size, dtype=int)
    sendcounts[1] = (n // 2) * (m - m // 2)
    sendcounts[2] = (n - n // 2) * (m // 2)
    sendcounts[3] = (n - n // 2) * (m - m // 2)
    displs[1] = (n // 2)
    displs[2] = (m //2)
    displs[3] = (n - n // 2) * (m - m // 2)
else:
    A = None
    sendcounts = None
    displs = None
# Scatter the matrix parts to each processor
recvA = np.zeros((n // 2, m // 2))
recvcounts = (n // 2) * (m // 2)
print(np.transpose(A))
comm.Scatterv([np.transpose(A) , sendcounts, displs, MPI.DOUBLE], recvA, root=0)
if rank == 1:
    print("Received matrix on processor 1:")
    print(recvA)
elif rank == 2:
    print("Received matrix on processor 2:")
    print(recvA)
elif rank == 3:
    print("Received matrix on processor 2:")
    print(recvA)

Original matrix on processor 0:
[[ 1.  2.  3.  4.  5.  6.  7.  8.]
 [ 9. 10. 11. 12. 13. 14. 15. 16.]
 [17. 18. 19. 20. 21. 22. 23. 24.]
 [25. 26. 27. 28. 29. 30. 31. 32.]
 [33. 34. 35. 36. 37. 38. 39. 40.]
 [41. 42. 43. 44. 45. 46. 47. 48.]
 [49. 50. 51. 52. 53. 54. 55. 56.]
 [57. 58. 59. 60. 61. 62. 63. 64.]]


IndexError: index 1 is out of bounds for axis 0 with size 1

In [11]:
A = lil_matrix((SIZE, SIZE))
A[0, :100] = rand(100)
A[1, 100:200] = A[0, :100]

In [12]:
A.shape

(1000, 1000)



### Exercise 5 Matrix vector product

1. Use the `MatrixVectorMult.py` file to implement the MPI version of matrix vector multiplication.
2. Process 0 compares the result with the `dot` product.
3. Plot the scalability of your implementation. 

**Output Example**
```shell
CPU time of parallel multiplication using 2 processes is  174.923446
The error comparing to the dot product is : 1.4210854715202004e-14
```


In [5]:
import numpy as np
from scipy.sparse import lil_matrix
from numpy.random import rand, seed
from numba import njit
from mpi4py import MPI
SIZE = 1000
A = lil_matrix((SIZE, SIZE))
A

<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
	with 0 stored elements in List of Lists format>

In [9]:
def matrixVectorMult(A, b, x):
    
    row, col = A.shape
    for i in range(row):
        a = A[i]
        for j in range(col):
            x[i] += A[i,j] * b[j]

    return 0

In [10]:
LocalMatrix = lil_matrix((SIZE, SIZE))
# Scatter the matrix A
b = rand(SIZE)
#####################Compute A*b locally#######################################
LocalX = np.zeros(SIZE)

start = MPI.Wtime()
matrixVectorMult(LocalMatrix, b, LocalX)
stop = MPI.Wtime()


### Exercise 6: Pi calculation
An approximation to the value π can be obtained from the following expression

![Pi expression](../data/pi.PNG)

where the answer becomes more accurate with increasing N. Iterations over i are independent so the
calculation can be parallelized.

For the following exercises you should set N = 840. This number is divisible by 2, 3, 4, 5, 6, 7 and 8
which is convenient when you parallelize the calculation!

1. Create a program where each process independently computes the value of `π` and prints it to the screen. Check that the values are correct (each process should print the same value)
2. Now arrange for different processes to do the computation for different ranges of i. For example, on two processes: rank 0 would do i = 0, 1, 2, . . . , N/2 - 1; rank 1 would do i = N/2, N/2 + 1, . . . , N-1.
Print the partial sums to the screen and check the values are correct by adding them up by hand.
3. Now we want to accumulate these partial sums by sending them to the controller (e.g. rank 0) to add up:
- all processes (except the controller) send their partial sum to the controller
- the controller receives the values from all the other processes, adding them to its own partial sum
1. Use the function `MPI_Wtime` (see below) to record the time it takes to perform the calculation. For a given value of N, does the time decrease as you increase the number of processes? Note that to ensure that the calculation takes a sensible amount of time (e.g. more than a second) you will probably have to perform the calculation of `π` several thousands of times.
2. Ensure your program works correctly if N is not an exact multiple of the number of processes P

