### Parallel Average (Java) [6 points]

Consider the following program: It uses two _worker threads_ in Java to search for an element in an array. The program won't necessarily be faster than a sequential one, but it illustrates the concepts. The two workers do not communicate with each other, but the main program collects the results. Thus this is an example of "embarrassing parallelism"; concurrency is used to potentially achieve a speedup.

In [5]:
%%writefile ParallelFind.java
class Worker extends Thread {
    int[] a;
    int x, l, u;
    boolean found;
    public Worker(int[] a, int x, int l, int u) {
        // a.length > 0 && 0 <= l <= u <= a.length
        this.a = a; this.x = x;
        this.l = l; this.u = u;
    }
    public void run() {
        for (int i = l; i < u; i++)
            if (a[i] == x) {found = true; return;}
        found = false;
    }
}

public class ParallelFind {
    public static void main(String args[]) {
        // populate array a with N "random" values
        final int N = 100;
        final int[] a = new int[N];
        for (int i = 0; i < N; i++) a[i] = i;

        // search in parallel for 42
        Worker w0 = new Worker(a, 42, 0, N / 2); // w0 searches lower half
        Worker w1 = new Worker(a, 42, N / 2, N); // w1 searches upper half
        w0.start(); w1.start();
        try {
            w0.join(); w1.join();
        } catch (Exception e) {}
        System.out.println(w0.found + " " + w1.found);
    }
}

Overwriting ParallelFind.java


Creating a thread only runs the constructor of the class; method `start()` needs to be called to execute method `run()` concurrently with the caller.

Run the next cells to test whether `42` appears in the lower half or upper half:

In [10]:
!javac ParallelFind.java

In [11]:
!java ParallelFind

true false


---

The task is to compute the average of `n` numbers `a(0)`, ..., `a(n – 1)`. For example, for `n = 5`, the average can be computed in different ways:

      (a(0) + a(1) + a(2) + a(3) + a(4)) / 5
    = a(0) / 5 + a(1) / 5 + a(2) / 5 + a(3) / 5 + a(4) / 5
    = (a(0) + a(1) + a(2)) / 5 + (a(3) + a(4)) / 5

The last variant suggests a computation in parallel: one thread computes `(a(0) + a(1) + a(2)) / 5`, and a second thread computes `(a(3) + a(4)) / 5`; the main program collects the results of the two threads and adds them.

The program below computes the average of `n` random integers sequentially; you are asked to complete the parallel computation with two workers, following `ParallelFind`. The average is computed in both ways, and the times the sequential and parallel computation take are printed. The program reads `n` from the command line to make testing easier. [4 points]

In [29]:
%%writefile Average.java
import java.util.Random;

class Worker extends Thread {
    int a[];
    int l;
    int u;
    double average;

    public Worker(int a[], int l, int u){
        this.a = a;
        this.l = l;
        this.u = u;
    }

    public void run(){
        double s = 0;
        for (int i = l; i < u; i++) s += a[i];
        average = s / a.length;
    }
}

public class Average {
    static double sequentialaverage(int a[]) {
        // a.length > 0
        double s = 0;
        for (int i = 0; i < a.length; i++) s += a[i];
        return s / a.length;
    }
    static double parallelaverage(int a[]) {
        // a.length > 0
        int midpoint = a.length / 2;
        Worker w0 = new Worker(a, 0, midpoint);
        Worker w1 = new Worker(a, midpoint, a.length);
        w0.start(); w1.start();
        try{
            w0.join(); w1.join();
        } catch (Exception e) {}
        return w0.average + w1.average;
    }
    public static void main(String args[]) {
        int n = Integer.parseInt(args[0]); // compute the average of n random numbers
        int[] a = new int[n];
        Random rand = new Random();
        for (int i = 0; i < n; i++) a[i] = rand.nextInt(10000);
        
        long start = System.currentTimeMillis();
        double avg = sequentialaverage(a);
        long end = System.currentTimeMillis();
        System.out.println("Sequential: " + avg + " Time: " + (end - start) + " ms");

        start = System.currentTimeMillis();
        avg = parallelaverage(a);
        end = System.currentTimeMillis();
        System.out.println("Parallel: " + avg + " Time: " + (end - start) + " ms");
    }
}

Overwriting Average.java


Test your implementation with the cells below; you may use more cells.

In [31]:
!javac Average.java

192414.94s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


Run your implementation with the following values of `n`; you may also include more values. As each run can produce different timing results, run your implementation with the same value of `n` several times. The above program measures the elapsed time, not the CPU time. If there are other processes (users) on the same CPU, the elapsed time will be larger than the CPU time. If you are using a server, choose a time of the day with few other users. In multiple runs with the same parameter, smaller times approximate the CPU time better.

In [18]:
!java Average 10

192250.54s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
Sequential: 5204.1 Time: 0 ms
Parallel: 5204.1 Time: 1 ms


In [19]:
!java Average 100

192255.76s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
Sequential: 4959.14 Time: 0 ms
Parallel: 4959.14 Time: 1 ms


In [20]:
!java Average 1000

192260.98s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
Sequential: 5004.766 Time: 0 ms
Parallel: 5004.766 Time: 2 ms


In [21]:
!java Average 10000

192266.20s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
Sequential: 5020.5233 Time: 0 ms
Parallel: 5020.5233 Time: 2 ms


In [22]:
!java Average 100000

192271.42s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
Sequential: 5003.14827 Time: 2 ms
Parallel: 5003.14827 Time: 4 ms


In [23]:
!java Average 1000000

192276.65s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
Sequential: 4997.399788 Time: 4 ms
Parallel: 4997.399788000001 Time: 6 ms


In [24]:
!java Average 10000000

192281.89s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
Sequential: 4999.8944607 Time: 15 ms
Parallel: 4999.894460699999 Time: 12 ms


In [25]:
!java Average 100000000

192287.28s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
Sequential: 4999.25893807 Time: 112 ms
Parallel: 4999.25893807 Time: 67 ms


In [26]:
!java Average 1000000000

192294.01s - pydevd: Sending message related to process being replaced timed-out after 5 seconds
Sequential: 4999.56641348 Time: 1169 ms
Parallel: 4999.566413480001 Time: 593 ms


How large has `n` to be such that there is a speedup of the parallel version? Add additional cells as you like. State your answer in the cell below! State the processor (model, frequency, number of cores) on which you ran the test; research how to find that out from the command line. [2 points]

From the tests it looks like that Sequential Average is actually faster for n of size ~1000000, however after that at n of size ~10000000 Parallel average starts to speedup and become much faster. We notice this significantly at n of size 1000000000 where we see that Parallel is 596ms faster than the sequential run

Processor: Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz, 16 cores


In [2]:
!lscpu

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         43 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  16
  On-line CPU(s) list:   0-15
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz
    CPU family:          6
    Model:               79
    Thread(s) per core:  1
    Core(s) per socket:  1
    Socket(s):           16
    Stepping:            1
    BogoMIPS:            5993.05
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
                         a cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall n
                         x pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtop
                         ology tsc_reliable nonstop_tsc cpuid tsc_known_freq pni
                          pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic mov
                         be popcnt tsc_deadline_timer aes xsave 