Custom Workloads with Futures
-----------------------------

<img src="http://dask.readthedocs.io/en/latest/_images/dask_horizontal.svg" 
     width="30%" 
     align=right
     alt="Dask logo">

Dask futures provide fine-grained real-time execution for custom situations.  This is the foundation for other APIs like Dask arrays and dataframes.

## Start Dask Client

Unlike for arrays and dataframes, you need the Dask client to use the Futures interface.  Additionally the client provides a dashboard which 
is useful to gain insight on the computation.

In [None]:
from dask.distributed import Client, progress
client = Client(processes=False, threads_per_worker=4, n_workers=1)
client

If running from Binder you can access the dashboard here:

-  [Dask Diagnostic Dashboard](../proxy/8787/status)

We recommend having it open on one side of your screen while using your notebook on the other side.  This can take some effort to arrange your windows, but seeing them both at the same is very useful when learning.

## Create simple functions

These functions do simple operations like add two numbers together, but they sleep for a random amount of time to simulate real work.

In [None]:
from time import sleep

def inc(x):
    from random import random
    sleep(random())
    return x + 1

def double(x):
    from random import random
    sleep(random())
    return 2 * x
    
def add(x, y):
    from random import random
    sleep(random())
    return x + y 

We can run them locally

In [None]:
inc(1)

Or we can submit them to run remotely with Dask.  This immediately returns a future that points to the ongoing computation, and eventually to the stored result.

In [None]:
future = client.submit(inc, 1)  # returns immediately with pending future
future

If you wait a second, and then check on the future again, you'll see that it has finished.

In [None]:
future  # scheduler and client talk constantly

You can block on the computation and gather the result with the `.result()` method.

In [None]:
future.result()

## Submit many tasks

So we've learned how to run Python functions remotely.  This becomes useful when we add two things:

1.  We can submit thousands of tasks per second
2.  Tasks can depend on each other by consuming futures as inputs

We submit many tasks that depend on each other in a normal Python for loop

In [None]:
%%time
zs = []
for i in range(256):
    x = client.submit(inc, i)     # x = inc(i)
    y = client.submit(double, x)  # y = inc(x)
    z = client.submit(add, x, y)  # z = inc(y)
    zs.append(z)
    
total = client.submit(sum, zs)

To make this go faster, add an additional workers with more cores 

(although we're still only working on our local machine, this is more practical when using an actual cluster)

In [None]:
for i in range(10):
    client.cluster.start_worker(ncores=4)

In [None]:
total.result()

### Custom computation: Tree summation

As an example of a non-trivial algorithm, consider the classic tree reduction.  We accomplish this with a nested for loop and a bit of normal Python logic.

```
finish           total             single output
    ^          /        \
    |        c1          c2        neighbors merge
    |       /  \        /  \
    |     b1    b2    b3    b4     neighbors merge
    ^    / \   / \   / \   / \
start   a1 a2 a3 a4 a5 a6 a7 a8    many inputs
```

In [None]:
L = zs
while len(L) > 1:
    new_L = []
    for i in range(0, len(L), 2):
        future = client.submit(add, L[i], L[i + 1])  # add neighbors
        new_L.append(future)
    L = new_L                                   # swap old list for new
   