## <center>Synchronous Computing</center>
### <center> Linh B. Ngo </center>
### <center> CPSC 3620 </center>

In [1]:
import ipyparallel
c=ipyparallel.Client(profile="mpicluster")
print(c.ids)

[0, 1, 2, 3, 4, 5, 6, 7]


#### <center> Synchronous Computation </center>

In a (fully) synchronous computation, all the processes synchronized at regular points, usually to exchange data or to making sure every process has gone through the same set of procedures (to update their own data) before proceeding.

In [2]:
!mpirun -np 8 python nobarrier.py

process 1 is here
process 2 is here
process 3 is here
process 4 is here
process 5 is here
process 6 is here
process 7 is here
process 0 is here


In [3]:
!mpirun -np 8 python barrier.py

process 0 is here
process 1 is here
process 2 is here
process 3 is here
process 4 is here
process 5 is here
process 6 is here
process 7 is here


#### <center> Barrier </center>

- A basic mechanism for synchronizing processes - inserted at the point in each process where it must wait
- All processes can continue from this point when all the processes have reached it. 

Comm.Barrier()

Parameters:
- Comm (MPI comm) – communicator on which we are to block processes

<center> <img src="pictures/treebarrier1.png" width="700"/> 
<sub>Wilkinson, Barry, and Michael Allen. Parallel programming. 2nd Ed. 2003. </sub>
</center>

<center> <img src="pictures/treebarrier2.png" width="700"/> 
<sub>Wilkinson, Barry, and Michael Allen. Parallel programming. 2nd Ed. 2003. </sub>
</center>

<center> <img src="pictures/butterflybarrier1.png" width="700"/> 
<sub>Wilkinson, Barry, and Michael Allen. Parallel programming. 2nd Ed. 2003. </sub>
</center>

#### <center> Prefix Sum Problem </center>

Given a list of numbers, $x_0, ..., x_{n-1}$, compute all partial summations, i.e:
- $x_0 + x_1$
- $x_0 + x_1 + x_2$
- $x_0 + x_1 + x_2 + x_3$
- $x_0 + x_1 + x_2 + x_3 + x_4$
- ...

Widely studied with practical applications in process allocation, data compaction, sorting, and polynomial evaluation. 

<center> <img src="pictures/prefixsum.png" width="700"/> 
<sub>Wilkinson, Barry, and Michael Allen. Parallel programming. 2nd Ed. 2003. </sub>
</center>

In [4]:
import ipyparallel
c=ipyparallel.Client(profile="mpicluster")
print(c.ids)

[0, 1, 2, 3, 4, 5, 6, 7]


In [8]:
%%px
import numpy as np
import math
from mpi4py import MPI
comm = MPI.COMM_WORLD

rank = comm.Get_rank(); size = comm.Get_size(); N = 16

local_nums = np.zeros(int(N/size), dtype="int")
recv_sum = np.zeros(1, dtype="int")
local_sums = np.zeros(int(N/size), dtype="int")

for i in range(0,int(N/size)):
    local_nums[i] = rank * int(N/size) + i
    local_sums[i] += np.sum(local_nums[0:(i+1)])

print("Process ", rank, " has ", local_nums)
print("Process ", rank, " has ", local_sums)

for i in range(0, int(math.log2(size))):
    distance = int(math.pow(2,i))
    if (rank == 0):
        print (distance)
    if (rank < (size - distance)):
        comm.Send(local_sums[int(N/size) - 1], dest = rank + distance, tag = 0)
#        print ("Process ", rank, " sends to ", rank + distance)
    if (rank >= distance):
        status = MPI.Status()
        comm.Recv(recv_sum, source = rank - distance, tag = 0, status = status);
#        print ("Process ", rank, " receives from ", rank - distance, " values ", recv_nums)
        for j in range(0,int(N/size)):
            local_sums[j] += recv_sum[0]
    print("Process ", rank, " has ", local_sums)


[stdout:0] 
Process  3  has  [6 7]
Process  3  has  [ 6 13]
Process  3  has  [15 22]
Process  3  has  [21 28]
Process  3  has  [21 28]
[stdout:1] 
Process  1  has  [2 3]
Process  1  has  [2 5]
Process  1  has  [3 6]
Process  1  has  [3 6]
Process  1  has  [3 6]
[stdout:2] 
Process  7  has  [14 15]
Process  7  has  [14 29]
Process  7  has  [39 54]
Process  7  has  [77 92]
Process  7  has  [105 120]
[stdout:3] 
Process  5  has  [10 11]
Process  5  has  [10 21]
Process  5  has  [27 38]
Process  5  has  [49 60]
Process  5  has  [55 66]
[stdout:4] 
Process  0  has  [0 1]
Process  0  has  [0 1]
1
Process  0  has  [0 1]
2
Process  0  has  [0 1]
4
Process  0  has  [0 1]
[stdout:5] 
Process  2  has  [4 5]
Process  2  has  [4 9]
Process  2  has  [ 9 14]
Process  2  has  [10 15]
Process  2  has  [10 15]
[stdout:6] 
Process  6  has  [12 13]
Process  6  has  [12 25]
Process  6  has  [33 46]
Process  6  has  [63 76]
Process  6  has  [78 91]
[stdout:7] 
Process  4  has  [8 9]
Process  4  has  [ 8 17]