<img src="../../img/ICHEC_Logo.png" alt="Drawing" style="width: 500px;"/>

## Exercise 1 - Hello World
- Write a parallel Python program using MPI that prints out the number of processes and the MPI rank of each process. 
  - Use 6 cores.

## Exercise 2 - Simple Message Exchange (general Python objects)
- Write a program where 2 processes send and recieve a message between each other using `send` and `recv`.
  - The message should be a dictionary with the key `'rank': myrank`, where myrank is the rank of the sending process. 
  - After recieving the message, each process should print out the rank of the process and the values in the recieved dictionary.

## Exercise 3 - Simple Message Exchange (NumPy arrays)
- Write a program where in each process a 100,000 element NumPy array is initialised to the rank of process. 
- Send and recieve the array (using `Send` and `Recv`).
- After recieving, print out the rank of the process along with the first element of the recieved array.

## Exercise 4 - Message Chain
- Write a simple program where every MPI task sends data to the next one. Let **ntasks** be the number of the tasks, and **myid** the rank of the current task. Your program should work as follows:
  - Every task with a rank less than ntasks-1 sends a message to task myid+1.
For example, task 0 sends a message to task 1.
  - The message content is an integer array where each element is initialized to myid.
  - The sender prints out the number of elements it sends.
  - All tasks with rank ≥ 1 receive messages.
  - Each receiver prints out their myid, and the first element in the received array.

- Implement the program described above using `Send` and `Recv`.

- Implement again but use Sendrecv instead of `Send` and `Recv` when sending and receiving.

- Can the code be simplified using `MPI.PROC_NULL`?
  - Use 10 cores.

In [None]:
%%writefile message_chain.py

from mpi4py import MPI
import sys
import numpy as np
from numpy.random import randint, seed

"""Insert MPI variables below"""
###

###

if rank == 0:
    t_start = MPI.Wtime()

'''Matrix size inputted through the terminal'''
numberRows = int( sys.argv[1])
numberColumns = int( sys.argv[2])

"""checks"""
assert numberRows == numberColumns
assert numberRows % (worldSize) == 0 #make sure it is divisible

#Calculate the slice per worker
if (worldSize == 1):
    Slice = int(numberRows)
else:
    Slice = int(numberRows / (worldSize)) 

assert Slice >= 1

"""Set Matrix"""
# Set seed to ensure same matrices when comparing the different techniques.
# Welcome to change how matrix is populated. 
seed(30) 
def populateMatrix():
    p = randint(0,10,size=(numberRows,numberColumns))
    return p

"""Initialising matrix a"""  
if rank == 0:
    ###
    a = populateMatrix()
    recv_data = a
    ###

"""Distributing the work to the processes"""
    # Rank 0 sends the slices to the other ranks, keeping the first slice for itself
    for i in range(1, worldSize):
        offset = i * Slice
        row = recv_data[offset,:]
    
        """send the value of the offset and the group of rows to each rank"""
        ###

        ###

b = populateMatrix()
"""Recieve the offset and group of rows from rank 0"""
# Rank 0 doesn't recieve anything
# Hint: Use comm.Send() for arrays of data

if rank!= 0:
    ###
    # Remember for comm.Send() - need buffer
    ###

"""Calculation"""
b = populateMatrix()
solution_chunk = np.dot(recv_chunk,b)

"""Send the chunks of solution back to rank 0"""
# Hint: Use comm.Send() for arrays of data
###

###

if rank == 0:  
    # Stack solution chunk from rank 0
    product_solution = np.vstack(solution_chunk)
        """Recieve the rest"""
    for i in range(1, worldSize):
        """Remember for comm.Send() - need buffer"""
        ###

        ###
        print ("Received response from %d.\n" %(i))
        solution = np.vstack((product_solution,buffer_sol_i))
    
    print ("Result AxB.\n")
    print (solution)
    
    print ("Parallel send and recieves - Total Time:", MPI.Wtime()-t_start)

## Exercise 5 - Non-blocking communication
- Implement the message chain program used in exercise 4 using non-blocking communication.

## Exercise 6 - Collective Operations
- In this exercise you will use different routines for collective communication. Use the skeleton code to get started.
  
  
A) First, write a program where rank 0 sends an array containing integers from 0 to 7 to all other ranks using collective communication. Use four cores.
[Hint: Use broadcasting]

- From these arrays create the initial arrays shown below.

B)

|        |  |  |  |  |  |  |  |  |
|--------|--|--|--|--|--|--|--|--|
|Task 0: | 0| 1| 2| 3| 4| 5| 6| 7|
|Task 1: | 8| 9|10|11|12|13|14|15|
|Task 2: |16|17|18|19|20|21|22|23|
|Task 3: |24|25|26|27|28|29|30|31|


- Each task should recieve a buffer for eight elements with each one initialised to -1. 

- Implement a program that sends and receives values from the data arrays to recieve buffers using single collective routines so that the recieve buffers will have the following values;

C)

|        |  |  |  |  |  |  |  |  |
|--------|--|--|--|--|--|--|--|--|
|Task 0: | 0| 1|-1|-1|-1|-1|-1|-1|
|Task 1: | 2| 3|-1|-1|-1|-1|-1|-1|
|Task 2: | 4| 5|-1|-1|-1|-1|-1|-1|
|Task 3: | 6| 7|-1|-1|-1|-1|-1|-1|

- [Hint: Use `comm.Scatter()`]

D)

|        |  |  |  |  |  |  |  |  |
|--------|--|--|--|--|--|--|--|--|
|Task 0: |-1|-1|-1|-1|-1|-1|-1|-1|
|Task 1: | 0| 1| 8| 9|16|17|24|25|
|Task 2: |-1|-1|-1|-1|-1|-1|-1|-1|
|Task 3: |-1|-1|-1|-1|-1|-1|-1|-1|

- [Hint: Use `comm.Gather()`]

E)

|        |  |  |  |  |  |  |  |  |
|--------|--|--|--|--|--|--|--|--|
|Task 0: | 8|10|12|14|16|18|20|22|
|Task 1: |-1|-1|-1|-1|-1|-1|-1|-1|
|Task 2: |40|42|44|46|48|50|52|54|
|Task 3: |-1|-1|-1|-1|-1|-1|-1|-1|

- [Hint: Create two communicators and use `comm.Reduce()`]

In [None]:
%%writefile collective_operations.py

from mpi4py import MPI
import numpy
from sys import stdout

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

assert size == 4, 'Number of MPI tasks has to be 4.'

if rank == 0:
    print('A)Broadcast:')

# TODO: create data vector at task 0 and send it to everyone else
#       using collective communication
if rank == 0:
    data = ...
else:
    data = ...
...
print('  Task {0}: {1}'.format(rank, data))


# Prepare data vectors ..
data = ...  # TODO: create the data vectors
# .. and receive buffers
buff = numpy.full(8, -1, int)

# ... wait for every rank to finish ...
comm.barrier()
if rank == 0:
    print('')
    print('-' * 32)
    print('')
    print('B) Initial Data vectors:')
print('  Task {0}: {1}'.format(rank, data))
comm.barrier()
if rank == 0:
    print('')
    print('-' * 32)
    print('')
    print('C) Scatter:')

# TODO: how to get the desired receive buffer using a single collective
#       communication routine?
...
print('  Task {0}: {1}'.format(rank, buff))

# ... wait for every rank to finish ...
buff[:] = -1
comm.barrier()
if rank == 0:
    print('')
    print('-' * 32)
    print('')
    print('D) Gather:')

# TODO: how to get the desired receive buffer using a single collective
#       communication routine?

...
print('  Task {0}: {1}'.format(rank, buff))

# ... wait for every rank to finish ...
buff[:] = -1
comm.barrier()
if rank == 0:
    print('')
    print('e)')

# TODO: how to get the desired receive buffer using a single collective
#       communication routine?
...
print('  Task {0}: {1}'.format(rank, buff))

## Exercise 7 - Matrix Multiplier

- Write a MPI script that multiplies two matrices together. 
  1) Divide up the tasks using send and recieves.
  2) Divide up the tasks using collective operations. 
- Start with the provided skeleton scripts. 
- Submit a job that multiplies two 4000 x 4000 size matrices, using 20 processes. 

In [None]:
%%writefile matrix_multiplier.py

from mpi4py import MPI
import sys
import numpy as np
from numpy.random import randint, seed

"""Insert MPI variables below"""
###

###

if rank == 0:
    t_start = MPI.Wtime()

"""Matrix size inputted through the terminal"""
numberRows = int( sys.argv[1])
numberColumns = int( sys.argv[2])

"""checks"""
assert numberRows == numberColumns
assert numberRows % (worldSize) == 0

#Calculate the slice per worker
if (worldSize == 1):
    Slice = int(numberRows)
else:
    Slice = int(numberRows / (worldSize)) 
    
assert Slice >= 1

"""Set Matrix"""
# Set seed to ensure same matrices when comparing the different techniques.
# Welcome to change how matrix is populated. 
seed(30)
def populateMatrix():
    p = randint(0,10,size=(numberRows,numberColumns))
    return p

"""Initialising matrix a"""  
if rank == 0:
    ###

    ###
else:
    a = None
 
"""Scatter matrix a from rank 0 to the other ranks"""
###
# recv_chunk = ...
###

"""Calculation"""
b = populateMatrix()
solution_chunk = np.dot(recvd_chunk, b)

"""Gather all the solution chunks"""
###

###

"""Print the solution"""
if rank == 0: 
    print ("Result AxB.\n")
    ###

    ###
    print ("Parallel collective operations - Total Time", MPI.Wtime() - t_start)