### Reductions and Strong Scaling

Reduction is the process of applying an operator to all elements in a vector/matrix typically to output a scalar.  Wikipedia says, 'Reduce is a collective communication primitive used in the context of a parallel programming model to combine multiple vectors into one, using an associative binary operator $\oplus$.  It is easiest to envision this operation on a tree with addition

![this](https://upload.wikimedia.org/wikipedia/commons/e/ee/Binomial_tree.gif)

The most common way to execute a reduction is a tree-hierarchy of blocks (chunks) of elements.
  1. reduce each chunk to a scalar
  2. build new chunks of scalars ouput from step 1. 
  3. repeat
There are more complex approaches that involve pipelining results that might be faster on some architectures. Not important.

Dask and other frameworks do this implicitly when calling aggregation functions, <code>mean, min, max, sum</code> and explicitly with a user defined function <code>dask.bag.fold()</code> and <code>dask.bag.foldby()</code>.

Let's compute an aggregate on our turbulent field.

In [None]:
import dask.array as da
import h5py

# load a file and grab the data
f = h5py.File("../../input/isotropic4096.h5","r")
d = f['u00000']

# convert data into a dask array and take the velocity magnitude
uvec = da.from_array(d[0,:,:,:], chunks=(512, 512, 3))
umag = da.linalg.norm(uvec, axis=2)

# hint dask to keep this around
umag.persist()

# compute the maximum velocity 
umag.max().compute()

Design an experiment to measure the speedup of this computation for 1, 2, 4, 8 workers.  From that speedup, infer the Amdahl number of this reduction.

In [None]:
import time
import pandas as pd

# compute once to get any caches warm
umag.max().compute()

# list to store timings
exptimes=[]

for cores in [1,2,4,8]:
    for trials in range(20):
        start = time.perf_counter()
        # ... tell dask how many cores to use
        tdiff = time.perf_counter() - start
        exptimes.append([cores,tdiff])    

df = pd.DataFrame(exptimes, columns = ["cores","time"])
df

### Look at the raw data

Describe the data frame to show summary statistics and then plot the raw data.

In [None]:
df.groupby('cores').describe()

In [None]:
%matplotlib inline
df.plot(x='cores', y='time', kind='scatter')

### Speedup Chart

Use your data to make a speeedup chart.

In [None]:
# TODO 

### Parallel Efficiency Chart

Same with a parallel efficiency chart.

In [None]:
# TODO 

# this line of code is helpful in converting the index data back into data
df.reset_index(inplace=True)


### Estimate the Amdahl Number

In [None]:
#TODO

### Discussion

* Why did scaling stop after 4 cores?
  * My laptop has 4 cores and 8 threads, what does this mean?
* Why is the Amdahl number so low?
