# Parallelism in Python

### John Kirkham

# The Problem

* Typical threading models are hard for (new) users to understand
* Easy to run into difficult to debug scenarios (e.g. deadlocking, race conditions, etc.)
* Implementation often becomes tied to a certain scale (e.g. multithreaded code -> cluster parallelized code)
* How could this be done better?

# Task-based parallelism

* Describe the pieces of the computation
* Relate these pieces to each other
* Use a scheduler to perform the computation

# Common implementations

* Dask
* ipyparallel
* Luigi

# Introducing a Task Graph

![]( images/pipeline.svg )

# A short example

![]( images/dask_example1.svg )

# A short example

In [None]:
import dask

a = [0, 1, 2, 3]
d = {"a": a, "b": (sum, "a")}

# A short example

In [None]:
import dask

a = [0, 1, 2, 3]
d = {"a": a, "b": (sum, "a")}

print(dask.get(d, "a"))
print(dask.get(d, "b"))

# A short example (follow-up question)

In [None]:
import dask

d = {"b": (sum, [0, 1, 2, 3])}

dask.get(d, "b")

# Using delayed

In [None]:
import dask

a = [0, 1, 2, 3]

r = dask.delayed(sum)(a)

d = dict(r.__dask_graph__())
k = next(iter(d.keys()))

print(d)
print(dask.get(d, k))
print(r.compute())

# Map (intro)

![]( images/dask_map.svg )

# Map (intro)

In [None]:
def my_map(func, *args):
    for v in args:
        yield func(v)

# Map (question)

In [None]:
import pprint
import dask

@dask.delayed
def addTwo(x):
    return x + 2

a = [0, 1, 2, 3]

b = list(map(addTwo, a))
pprint.pprint(b)

dask.compute(*b)

# Reduce (intro)

![]( images/dask_reduce.svg )

# Reduce (intro)

In [None]:
def my_reduce(func, *args):
    r = args[0]
    for v in args[1:]:
        r = func(r, v)
    return r

# Reduce (question)

In [None]:
from functools import reduce
from operator import add
import dask

add = dask.delayed(add)

a = [0, 1, 2, 3]
b = reduce(add, a)

print(b)
print(b.compute())

# Reduce (follow-up question)

In [None]:
from functools import reduce
from operator import add
import dask

add = dask.delayed(add)

a = [0, 1, 2, 3]
b = reduce(add, a)

b.visualize()

# Reduce (follow-up question)

![]( images/dask_reduce_example.svg )

# Reduce (performance)

1. Where did the values go?
2. How can we make this parallel friendly?

# Reduce (performance)

In [None]:
import pprint
from functools import reduce
from operator import add
import dask

add = dask.delayed(add)

a = [0, 1, 2, 3]
b = reduce(add, a)

pprint.pprint(dict(b.__dask_graph__()))

# Reduce (performance)

In [None]:
from operator import add
import dask

add = dask.delayed(add)

def reduce(func, values):
    l = len(values)
    if l == 1:
        return values[0]
    else:
        l_half = l // 2
        return func(reduce(func, values[:l_half]), reduce(func, values[l_half:]))

a = [0, 1, 2, 3]
b = reduce(add, a)

# Reduce (follow-up question)

![]( images/dask_reduce_tree.svg )

# Using Map and Reduce (problem)

1. Write a function to compute a weighted mean.

2. Try it on a list of values and weights.

   1. Inspect the graph it creates.

   2. Compute the result.

3. Does it seem to be the optimal solution? Why or why not?

# Using Map and Reduce (answer)

In [None]:
from __future__ import division
from operator import add, mul
import dask

add = dask.delayed(add)
mul = dask.delayed(mul)

def reduce(func, values):
    l = len(values)
    if l == 1:
        return values[0]
    else:
        l_half = l // 2
        return func(reduce(func, values[:l_half]), reduce(func, values[l_half:]))

def weighted_mean(values, weights):
    return reduce(add, list(map(mul, weights, values))) / reduce(add, weights)

# Using Map and Reduce (answer)

In [None]:
v = [0, 1, 2, 3]
w = [5, 10, 33, 16]

wm = weighted_mean(v, w)

print(wm.compute())

wm.visualize()

# Using Map and Reduce (answer)

![]( images/dask_delayed_weighted_mean.svg )

# Using Dask Array

In [None]:
import numpy as np

a1 = np.arange(4)
a2 = np.array([0.3, 0.2, 0.4, 0.1])

a1 * a2

In [None]:
import numpy as np
import dask.array as da

d1 = da.arange(4, chunks=1)
d2 = da.from_array(np.array([0.3, 0.2, 0.4, 0.1]), chunks=1)

d1 * d2

# Using Dask Array

# Weighted Mean with Dask Arrays (problem)

1. Write a function to compute a weighted mean.

2. Try it on a list of values and weights.

   1. Inspect the graph it creates.

   2. Compute the result.

3. Does it seem to be the optimal solution? Why or why not?

4. How does this compare to your Dask Delayed solution?

# Weighted Mean with Dask Arrays (answer)

In [None]:
import dask.array as da

def weighted_mean(values, weights):
    return (values * weights).sum() / weights.sum()

v = da.from_array([0, 1, 2, 3], chunks=1)
w = da.from_array([5, 10, 33, 16], chunks=1)

wm = weighted_mean(v, w)

print(wm.compute())

wm.visualize()

# Weighted Mean with Dask Arrays (answer)

![]( images/dask_array_weighted_mean.svg )