!["Anaconda"](img/anaconda-logo.png)
<br>
*Copyright Continuum 2012-2016 All Rights Reserved.*

# Dask and Imperative Programming

The **dask `do()` function** helps you to construct custom dask graphs using more typical coding styles than the explicit construction of a dictionary.

# Table of Contents
* [Dask and Imperative Programming](#Dask-and-Imperative-Programming)
* [Imperative Programming](#Imperative-Programming)
	* [Custom graphs with `do`](#Custom-graphs-with-do)
	* [A Familiar Example](#A-Familiar-Example)
* [Exercise 1: Create a Graph](#Exercise-1:-Create-a-Graph)
* [Exercise 2: Parallel Estimation of Pi](#Exercise-2:-Parallel-Estimation-of-Pi)
* [Exercise 3: Small Tasks in Parallel](#Exercise-3:-Small-Tasks-in-Parallel)
* [Exercise 4: GIL vs Multiprocessing](#Exercise-4:-GIL-vs-Multiprocessing)


# Imperative Programming

Many problems don't fit cleanly into `ndarray` or `DataFrame` abstractions.  How can we use dask to parallelize more custom workloads?

We can always fall back to creating dictionaries manually:

    dsk = {'load-1': (load, filename1), 'clean-1': (clean, 'load-1'), ...,
           'load-2': (load, filename2), 'clean-2': (clean, 'load-2'), ...,
           ...}
    
Manual dictionary creation has some drawbacks:

* can be tedious
* is prone to programmer error
* feels foreign to many developers. 

## Custom graphs with `do`

The `do` function delays a function evaluation, producing a lazily evaluated result.  One wraps a function with a `do` call

*  Before:  

        result = f(a, b, c=10)
*  After:  

        result = do(f)(a, b, c=10)
        
The result of a call to `do(function)` is a lazy `Value` object that we can use in future `do` calls or eventually call `.compute()`

    >>> result.compute()

## A Familiar Example

To explore this abstraction we revisit our examples from the [Foundations Notebook](Foundations.ipynb)

In [None]:
def inc(x):
    return x + 1

def add(x, y):
    return x + y

a = 1
b = inc(a)

x = 10
y = inc(x)

z = add(b, y)
z

Originally we parallelized this by constructing a dask graph explicitly

In [None]:
dsk = {'a': 1, 
       'b': (inc, 'a'),
       
       'x': 10,
       'y': (inc, 'x'),
       
       'z': (add, 'b', 'y')}

Now we can also use the `do()` function to construct the dask graph with more traditional programming.

In [None]:
from dask import do

a = 1
b = do(inc)(a)

x = 10
y = do(inc)(x)

z = do(add)(b, y)
z

In [None]:
z.compute()

These value objects build up the dask graph as they go.  These graphs are less interpretable but fine for normal execution.

In [None]:
z.dask

In [None]:
z._visualize()

# Exercise 1: Create a Graph

Consider our first exercise reading three CSV files with `pd.read_csv` and then measuring their total length.  

In [None]:
from src.dask_prep import accounts_csvs  # Prep data if it doesn't exist
accounts_csvs(3, 1000000, 500)

In [None]:
import pandas as pd
import os
filenames = [os.path.join('tmp', 'accounts.%d.csv' % i) 
                for i in [0, 1, 2]]
filenames

In [None]:
%%time 
a = pd.read_csv(filenames[0])
b = pd.read_csv(filenames[1])
c = pd.read_csv(filenames[2])

na = len(a)
nb = len(b)
nc = len(c)

total = sum([na, nb, nc])
total

In the first notebook we constructed a dask graph from this computation and then executed it in parallel using multiple processes to get a speedup

In [None]:
%load solutions/Foundations-01.py

In [None]:
from dask.multiprocessing import get
%time  get(dsk, 'total')

Your task is to recreate this graph again using the `do` function on the original Python code.

In [None]:
a = do(pd.read_csv)(filenames[0])
#...

#total = ...

%time total.compute(get=get) # use multiprocessing get function in call to compute

In [None]:
%load solutions/Imperative-01.py

# Exercise 2: Parallel Estimation of Pi

<img src="https://upload.wikimedia.org/wikipedia/commons/8/84/Pi_30K.gif" align="right" width="40%">



Below is a function that approximates $\pi$ using a [Monte Carlo method](https://en.wikipedia.org/wiki/Monte_Carlo_method). It works by generating random points in a 1 x 1 square, and then counts those that are inside a quarter circle of radius one (as seen in the image to the right). Since the area of the full circle is $\pi$, then this can be estimated by 

$$4 \times \frac{\mathrm{points–in–circle}}{\mathrm{total–points}}$$

In [None]:
from __future__ import division
from random import random

def is_inside_circle():
    """Generates a random x, y point, returns 1 if in circle, else returns 0."""
    x = random()
    y = random()
    if x**2 + y**2 <= 1:
        return 1
    else:
        return 0


def estimate_pi(nsamples):
    count = [is_inside_circle() for i in range(nsamples)]
    return 4. * sum(count) / nsamples

In [None]:
estimate_pi(10000)

Your task is to use `dask.do()` to make a parallel version of `estimate_pi()` by using `do()` on the `is_inside_circle()` calls and anything else that needs to be delayed as a result.

Test out your function as we did above on the serial version.  What does your function return?  How does this perform compared to the serial version?

In [None]:
%load solutions/Imperative-02.py


# Exercise 3: Small Tasks in Parallel

Your parallel version probably runs significantly slower than the sequential version.  This is true even though there is a large amount of available parallelism.

This is because each of our tasks is *very small*.  The dask schedulers add an overhead of around 1ms per task, making them good at *medium grained parallelism* where tasks take around 100ms or so.  When the task size gets to be much smaller than this then then the scheduler overhead dominates.

This can be fixed by bundling up many calls to `is_inside_circle()` into a single task.  Make a new function that calls `is_inside_circle()` many times (and figure out how many is a good number) and then rewrite your parallel function to call this function instead.

Do you get a speedup?

In [None]:
%load solutions/Imperative-03.py

# Exercise 4: GIL vs Multiprocessing

Finally, these computations are all happening in Python and so we are bound by the Global Interpreter Lock (GIL) when we use the default threaded scheduler.  

Use the multiprocessing scheduler instead and see how your performance changes.

In [None]:
from dask.multiprocessing import get


<br>
*Copyright Continuum 2012-2016 All Rights Reserved.*