# Examples - Distributed Concurrent.futures - Custom workflows
https://gist.github.com/mrocklin/ef9ccd29a6ec5f4de84d6192be95042a

## Custom Workflows 

We submit tasks directly to the task scheduler.  This demonstrates the flexibility that can be achieved with the **`submit`** function and normal Python for loops.

Later on we map functions across Python queues to construct data processing pipelines.

In [28]:
from dask.distributed import Client, progress

e = Client()
e

0,1
Client  Scheduler: tcp://127.0.0.1:62038  Dashboard: http://127.0.0.1:62039/status,Cluster  Workers: 4  Cores: 4  Memory: 8.48 GB


In [29]:
from time import sleep
from random import random

def inc(x):
#     from random import random
    sleep(random())
    return x + 1

def double(x):
#     from random import random
    sleep(random())
    return 2 * x
    
def add(x, y):
#     from random import random
    sleep(random())
    return x + y 

In [30]:
inc(1)

2

In [31]:
future = e.submit(inc, 1)  # returns immediately with pending future
future

In [32]:
future  # scheduler and client talk constantly

In [33]:
future.result()

2

In [34]:
future.executor

0,1
Client  Scheduler: tcp://127.0.0.1:62038  Dashboard: http://127.0.0.1:62039/status,Cluster  Workers: 4  Cores: 4  Memory: 8.48 GB


### Submit many tasks

We submit many tasks that depend on each other in a normal Python for loop

In [35]:
%%time

zs = []

for i in range(16):
    x = e.submit(inc, i)     # x = inc(i)
    y = e.submit(double, x)  # y = inc(x)
    z = e.submit(add, x, y)  # z = inc(y)
    zs.append(z)

Wall time: 45.1 ms


In [36]:
zs[0]

In [37]:
e.gather(zs)

[3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48]

### Custom computation: Tree summation

As an example of a non-trivial algorithm, consider the classic tree reduction.  We accomplish this with a nested for loop and a bit of normal Python logic.

```
finish           total             single output
    ^          /        \
    |        c1          c2        neighbors merge
    |       /  \        /  \
    |     b1    b2    b3    b4     neighbors merge
    ^    / \   / \   / \   / \
start   a1 a2 a3 a4 a5 a6 a7 a8    many inputs
```

In [38]:
L = zs

while len(L) > 1:
    
    new_L = []
    
    for i in range(0, len(L), 2):
        future = e.submit(add, L[i], L[i + 1])  # add neighbors
        new_L.append(future)
        
    L = new_L                                   # swap old list for new
    
progress(L)

In [39]:
L

[<Future: status: pending, key: add-5e8b42843257692ac469617cd2ced2f0>]

In [40]:
L[0]

In [41]:
e.gather(L)

[408]

Example with data streams
----------------------------

The executor can **map functions over lists or queues**.  This is nothing more than **calling `submit` many times**.  We can chain maps on queues together to construct **simple data processing pipelines**.

All of this logic happens on the client-side.  None of this logic was hard-coded into the scheduler.  This simple streaming system is a good example of the kind of system that becomes easy for users to build when given access to custom task scheduling.

In [42]:
from queue import Queue
from threading import Thread

def multiplex(n, q, **kwargs):
    """ Convert one queue into several equivalent Queues
    
    >>> q1, q2, q3 = multiplex(3, in_q)
    """
    out_queues = [Queue(**kwargs) for i in range(n)]
    
    def f():
        while True:
            x = q.get()
            for out_q in out_queues:
                out_q.put(x)
                
    t = Thread(target=f)
    t.daemon = True
    t.start()
    return out_queues

```
           ----inc---->
          /            \ 
in_q --> q              \_add__ results
          \             / 
           ---double-->/
```

In [43]:
Queue()

<queue.Queue at 0x1ba10ad9908>

In [44]:
in_q = Queue()
in_q

<queue.Queue at 0x1ba10b665f8>

In [45]:
in_q.put(1)
in_q.get()

1

In [47]:
# in_q 是普通的 Queue
# 經過 Client.scatter 的轉換， q 是 in_q 的 Dask 對應
# 物件放入 in_q，然後經過 q 拿出來的時候，就只是一個 Future 物件

q = e.scatter(in_q)
q

<queue.Queue at 0x1ba10b8bb70>

In [48]:
in_q.put(1)
future = q.get()
future

In [49]:
future.result()

1

In [22]:
# q_1, q_2 是 q 的複製，
# 由 in_q 放入的物件 可以由 q 取出，也就可以從 q_1, q_2 取出
q_1, q_2 = multiplex(2, q)

# q_1 被 e.map 上一個 inc 的 function，成為另外一個 inc_q 的 Queue
inc_q = e.map(inc, q_1)

# q_1 被 e.map 上一個 double 的 function，成為另外一個 double_q 的 Queue
double_q = e.map(double, q_2)

# inc_q, double_q 被 e.map 上一個 add 的 function，成為另外一個 add_q 的 Queue
add_q = e.map(add, inc_q, double_q)

out_q = e.gather(add_q)

In [51]:
inc_q

<queue.Queue at 0x1ba0f0c9898>

In [52]:
double_q

<queue.Queue at 0x1ba0f0c9b70>

In [53]:
add_q

<queue.Queue at 0x1ba0f0c9da0>

In [54]:
out_q

<queue.Queue at 0x1ba0f0c8128>

In [24]:
# 從 in_q 放入 10，會經過 q -> q_1 -> inc_q    -> add_q -> out_q = 11
#                        q -> q_2 -> double_q -> add_q -> out_q = 20
# 11 + 20 = 31
in_q.put(10)
out_q.get()

31

In [25]:
from random import random

def feed(q):
    for i in range(10000):
        sleep(random())
        q.put(i)
        
t = Thread(target=feed, args=(q,))
t.daemon = True
t.start()

In [26]:
out_q.qsize()

1