# Adding up a big array

In [None]:
from __future__ import print_function
import time
import numpy as np

Let's make a numpy array of $10^8$ random numbers, and add them up. How long does it take?

In [None]:
n_values = int(1e8)

start = time.time()

arr = np.random.random(n_values)
result = np.sum(arr)
print("Result: ", result)

duration = time.time() - start
print(duration * 1000, "ms")

Given how fast was, how many numbers could we add up in an hour?

In [None]:
# n_values / duration is number of values we sum up per second
# multiply by 60 sec/min; 60 min/hour for number per hour
print("{:g}".format(n_values / duration * 60*60))

Questions:

* Could we just put in that number for `n_values`, run it, and have the sum completed in an hour or so?
* How much memory would be needed for that array?

## Using dask

If you worked out how much memory would be required for that, you should have ended up with a number of the order of terabytes. My laptop definitely does not have terabytes of RAM. By splitting the problem up into smaller tasks, dask can can solve the whole problem with much less memory usage.

Compare the code below to the code above -- nearly identical! Dask array mimic the numpy array API (as much as possible), making it easy to convery numpy code to dask code. Under the hood, dask uses numpy within each task, because numpy is already very fast.

In [None]:
import dask.array as da

In [None]:
%%time
d_arr = da.random.random(n_values, chunks=n_values/10)
dask_sum = da.sum(d_arr)
print("Result:", dask_sum.compute())

In [None]:
dask_sum.visualize()

Questions:

* Why does the dask graph group these tasks into groups of 4 (on my laptop)?
* What is the speedup for this process? How does that compare to the theoretical speedup?