### Parallel Average (Python) [4 points]

Consider the following program: It uses two _worker threads_ in Python to search for an element in an array. The program won't be faster than a sequential one, but it illustrates the concepts. The two workers do not communicate with each other, but the main program collects the results. Thus this is an example of "embarrassing parallelism"; concurrency is used to potentially achieve a speedup.

In [None]:
from threading import Thread

class Worker(Thread):
    def __init__(self, a, x, l, u):
        super().__init__();
        self.a, self.x, self.l, self.u = a, x, l, u
    def run(self):
        for i in range(self.l, self.u):
            if self.a[i] == self.x: self.found = True; return
        self.found = False

def parallelfind():
    # populate array a with N "random" values
    N = 100; a = [i for i in range(N)]

    # search in parallel for 42
    w0 = Worker(a, 42, 0, N // 2) # w0 searches lower half
    w1 = Worker(a, 42, N // 2, N) # w1 searches upper half
    w0.start(); w1.start()
    w0.join(); w1.join()
    print(w0.found, w1.found)

Creating a thread only runs the constructor of the class; method `start()` needs to be called to execute method `run()` concurrently with the caller.

Run the next cell to test whether `42` appears in the lower half or upper half:

In [None]:
parallelfind()

---

The task is to compute the average of `n` numbers `a(0)`, ..., `a(n – 1)`. For example, for `n = 5`, the average can be computed in different ways:

      (a(0) + a(1) + a(2) + a(3) + a(4)) / 5
    = a(0) / 5 + a(1) / 5 + a(2) / 5 + a(3) / 5 + a(4) / 5
    = (a(0) + a(1) + a(2)) / 5 + (a(3) + a(4)) / 5

The last variant suggests a computation in parallel: one thread computes `(a(0) + a(1) + a(2)) / 5`, and a second thread computes `(a(3) + a(4)) / 5`; the main program collects the results of the two threads and adds them.

The program below computes the average of `n` random integers sequentially; you are asked to complete the parallel computation with two workers, following `parallelfind`. The average is computed in both ways, and the times the sequential and parallel computation take are printed. [2 points]

In [13]:
from threading import Thread
import random, time

class Worker(Thread):
    def __init__(self, a, l, u):
        super().__init__(); self.a, self.l, self.u = a, l, u
    def run(self):
        self.average = sum(self.a[self.l:self.u]) / len(self.a);
            
def sequentialaverage(a):
    return sum(a) / len(a);

def parallelaverage(a):
    n = len(a)
    m0 = Worker(a, 0, n//2); m1 = Worker(a, n//2, n)
    m0.start(); m1.start()
    m0.join(); m1.join()
    return m0.average + m1.average

def average(n):
    a = [random.randint(0, 1000) for i in range(n)]
    
    start = time.time_ns() / 1000
    avg = sequentialaverage(a)
    end  = time.time_ns() / 1000
    print("Sequential:", avg, "Time:", end - start, "µs")

    start = time.time_ns() / 1000
    avg = parallelaverage(a)
    end  = time.time_ns() / 1000
    print("Parallel:", avg, "Time:", end - start, "µs")

Test your implementation with the cells below; you may use more cells.

In [14]:
average(10)

Sequential: 432.4 Time: 3.0 µs
Parallel: 432.4 Time: 2400.75 µs


Run your implementation with the following values of `n`; you may also include more values. As each run can produce different timing results, run your implementation with the same value of `n` several times. The above program measures the elapsed time, not the CPU time. If there are other processes (users) on the same CPU, the elapsed time will be larger than the CPU time. If you are using a server, choose a time of the day with few other users. In multiple runs with the same parameter, smaller times approximate the CPU time better.

In [5]:
average(10)

Sequential: 330.1 Time: 3.0 µs
Parallel: 330.1 Time: 2086.25 µs


In [6]:
average(100)

Sequential: 470.28 Time: 3.0 µs
Parallel: 470.28 Time: 1086.5 µs


In [7]:
average(1000)

Sequential: 500.205 Time: 8.75 µs
Parallel: 500.205 Time: 810.0 µs


In [8]:
average(10000)

Sequential: 501.8552 Time: 84.25 µs
Parallel: 501.85519999999997 Time: 1508.0 µs


In [9]:
average(100000)

Sequential: 499.02917 Time: 691.25 µs
Parallel: 499.02917 Time: 2090.0 µs


In [10]:
average(1000000)

Sequential: 500.129648 Time: 9811.0 µs
Parallel: 500.129648 Time: 20299.75 µs


In [11]:
average(10000000)

Sequential: 500.0328096 Time: 160008.0 µs
Parallel: 500.0328096 Time: 179737.0 µs


In [12]:
average(100000000)

Sequential: 500.0117042 Time: 1919722.75 µs
Parallel: 500.0117042 Time: 1956827.0 µs


What do you observe about the execution time of the sequential and parallel implementations? Explain, citing resources of the Python documentation! [2 points]

When running the tests, i observed that list size big or small the sequential computation is always faster than the parallel computation. After doing some research online python has something called a "GIL - global interpreter lock" which only allows one thread to run at a time during execution which doesn't allow for true parallelism in python. 

source: https://realpython.com/python-gil/