<a href="https://colab.research.google.com/github/AlexBB999/Thinkful/blob/master/31_4_Dask.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**ASSIGNMENT ANSWERS AT BOTTOM**

##**Assignments**

In the following exercises, you'll be working with the code snippet below:

%%timeit
x = da.random.random((10000, 10000), chunks=(1000, 1000))

y = x + x.T

z = y[::2, 5000:].mean(axis=1)

z.compute()


To complete this assignment, create a Jupyter Notebook containing your solutions to the following tasks and submit as a link on Github.

Change the code above by setting chunks=(250, 250).

 How long does it take to run?

Now, set the parameter to chunks=(500, 500).

 How long does it take to run? 

Does this one or the previous one run quickly? Why?

In [2]:
!pip install --upgrade "dask[complete]"

Collecting dask[complete]
[?25l  Downloading https://files.pythonhosted.org/packages/2a/00/1d9d5a0a6e9b500dd82b280298966db03d32f133bba7805a8a459ef486b6/dask-2.16.0-py3-none-any.whl (802kB)
[K     |████████████████████████████████| 808kB 4.9MB/s 
[?25hCollecting partd>=0.3.10; extra == "complete"
  Downloading https://files.pythonhosted.org/packages/44/e1/68dbe731c9c067655bff1eca5b7d40c20ca4b23fd5ec9f3d17e201a6f36b/partd-1.1.0-py3-none-any.whl
Collecting distributed>=2.0; extra == "complete"
[?25l  Downloading https://files.pythonhosted.org/packages/c0/7f/58454dac9c2603f926b9a5abd260a7c7818ece31cee4c3e46dad6aad8bb3/distributed-2.16.0-py3-none-any.whl (629kB)
[K     |████████████████████████████████| 634kB 28.5MB/s 
Collecting locket
  Downloading https://files.pythonhosted.org/packages/d0/22/3c0f97614e0be8386542facb3a7dcfc2584f7b83608c02333bced641281c/locket-0.2.0.tar.gz
Collecting contextvars; python_version < "3.7"
  Downloading https://files.pythonhosted.org/packages/83/96/55b82

In [3]:
pip install -U ipykernel

Requirement already up-to-date: ipykernel in /usr/local/lib/python3.6/dist-packages (5.2.1)


In [1]:
import warnings
warnings.filterwarnings("ignore")

from dask.distributed import Client, progress

client = Client(n_workers=4, threads_per_worker=2, memory_limit='2GB')
client

0,1
Client  Scheduler: tcp://127.0.0.1:37361  Dashboard: http://127.0.0.1:8787/status,Cluster  Workers: 4  Cores: 8  Memory: 8.00 GB


##**Using Dask as if it's NumPy**
We'll be using Dask arrays by creating random arrays and do mathematical calculations on them. 

We also do the same thing using NumPy arrays and compare the run times.

###**Creating a random array**
In order to use Dask arrays, we need to import it as follows:

In [0]:
import dask.array as da
import numpy as np

Below we create a 10000x10000 array of random numbers.

 Note that we set chunks parameter to (1000, 1000).
 
 This is something different than what we normally do when generating random arrays with NumPy.
 
By setting chunks, we tell Dask that it should represent as many numpy arrays of size 1000x1000 (or smaller if the array cannot be divided evenly).

In our case, there will be 100 numpy arrays of size 1000x1000.

What we do below is:

**We first create a random Dask array of size 10000X10000.**

**Then we add this array to its transpose**.

**Last, we filter the resulting array and calculate its mean**.

As usual, we call .compute() to make Dask evaluate the results.

Note that we calculate the run time of the following cell using jupyter notebook's **magic command %%time**

In [5]:
%%time
x = da.random.random((10000, 10000), chunks=(1000, 1000))
y = x + x.T
z = y[::2, 5000:].mean(axis=1)
z.compute()

CPU times: user 372 ms, sys: 69.4 ms, total: 441 ms
Wall time: 1.56 s


**Now, let's do the same thing using NumPy arrays:**

In [4]:
%%time
x = np.random.random((10000, 10000))
y = x + x.T
z = y[::2, 5000:].mean(axis=1)

CPU times: user 1.61 s, sys: 745 ms, total: 2.35 s
Wall time: 2.26 s


**CHUNK METHOD MUCH FASTER**

**/////////////////////////////////////////////////////////////////**

##**Persisting data in memory -- IF POSSIBLE**

**First, we make our computations without persisting the array**

In [0]:
x = da.random.random((10000, 10000), chunks=(1000, 1000))

In [7]:
%%time
y = x + x.T
z = y[::2, 5000:].mean(axis=1)
z.compute()

CPU times: user 265 ms, sys: 54.7 ms, total: 320 ms
Wall time: 1.42 s


Now, we do the same thing this time **after persisting our array into the memory**

In [8]:
x = da.random.random((10000, 10000), chunks=(1000, 1000))
# This persists the x array into the memory
x.persist()

Unnamed: 0,Array,Chunk
Bytes,800.00 MB,8.00 MB
Shape,"(10000, 10000)","(1000, 1000)"
Count,100 Tasks,100 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 800.00 MB 8.00 MB Shape (10000, 10000) (1000, 1000) Count 100 Tasks 100 Chunks Type float64 numpy.ndarray",10000  10000,

Unnamed: 0,Array,Chunk
Bytes,800.00 MB,8.00 MB
Shape,"(10000, 10000)","(1000, 1000)"
Count,100 Tasks,100 Chunks
Type,float64,numpy.ndarray


**And we run the same computations above after persisting the array into the memory**

In [9]:
%%time
y = x + x.T
z = y[::2, 5000:].mean(axis=1)
z.compute()

CPU times: user 200 ms, sys: 30 ms, total: 230 ms
Wall time: 856 ms


**FASTER BUT NOT HALF THE TIME**

---



**////////////////////////////////////////////////////////////////////////////////**

In [10]:
%%timeit x = da.random.random((10000, 10000), chunks=(1000, 1000))

y = x + x.T

z = y[::2, 5000:].mean(axis=1)

z.compute()

1 loop, best of 3: 1.41 s per loop


**Change the code above by setting chunks=(250, 250).**

In [11]:
%%timeit x = da.random.random((10000, 10000), chunks=(250, 250))

y = x + x.T

z = y[::2, 5000:].mean(axis=1)

z.compute()

1 loop, best of 3: 5.23 s per loop


**MUCH SLOWER**

**Now, set the parameter to chunks=(500, 500).**

In [12]:
%%timeit x = da.random.random((10000, 10000), chunks=(500, 500))

y = x + x.T

z = y[::2, 5000:].mean(axis=1)

z.compute()

1 loop, best of 3: 2.28 s per loop


**MUCH FASTER**

**NOT ASSIGNED BUT TRYING CHUNK SIZE=2000**

In [14]:
%%timeit x = da.random.random((10000, 10000), chunks=(2000, 2000))

y = x + x.T

z = y[::2, 5000:].mean(axis=1)

z.compute()

1 loop, best of 3: 1.37 s per loop


**EVEN FASTER**