# Parallelism in Python

### John Kirkham

# The Problem

* Typical threading models are hard for (new) users to understand
* Easy to run into difficult to debug scenarios (e.g. deadlocking, race conditions, etc.)
* Implementation often becomes tied to a certain scale (e.g. multithreaded code -> cluster parallelized code)
* How could this be done better?

# Task-based parallelism

* Describe the pieces of the computation
* Relate these pieces to each other
* Use a scheduler to perform the computation

# Common implementations

* Dask
* ipyparallel
* Luigi

# Dask - Introducing a Task Graph

![]( images/pipeline.svg )

# Dask - A short example

![]( images/dask_example1.svg )

# Dask - A short example


```python
In [1]: import dask

In [2]: a = [0, 1, 2, 3]

In [3]: d = {"a": a, "b": (sum, "a")}
```

# Dask - A short example (question)


```python
In [1]: import dask

In [2]: a = [0, 1, 2, 3]

In [3]: d = {"a": a, "b": (sum, "a")}

In [4]: dask.get(d, "a")
Out[4]: ?

In [5]: dask.get(d, "b")
Out[5]: ?
```

# Dask - A short example (answer)


```python
In [1]: import dask

In [2]: a = [0, 1, 2, 3]

In [3]: d = {"a": a, "b": (sum, "a")}

In [4]: dask.get(d, "a")
Out[4]: [0, 1, 2, 3]

In [5]: dask.get(d, "b")
Out[5]: 6
```

# Dask - Using delayed (question)


```python
In [1]: import dask

In [2]: a = [0, 1, 2, 3]

In [3]: r = dask.delayed(sum)(a)

In [4]: d = dict(r.__dask_graph__())

In [5]: d
Out[5]: {'sum-11e9df38-2121-41a5-b9be-f4e83318ac72': (<function sum(iterable, start=0, /)>, [0, 1, 2, 3])}

In [6]: dask.get(d, "sum-11e9df38-2121-41a5-b9be-f4e83318ac72")
Out[6]: ?

In [7]: r.compute()
Out[7]: ?
```

# Dask - Using delayed (answer)


```python
In [1]: import dask

In [2]: a = [0, 1, 2, 3]

In [3]: r = dask.delayed(sum)(a)

In [4]: d = dict(r.__dask_graph__())

In [5]: d
Out[5]: {'sum-11e9df38-2121-41a5-b9be-f4e83318ac72': (<function sum(iterable, start=0, /)>, [0, 1, 2, 3])}

In [6]: dask.get(d, "sum-11e9df38-2121-41a5-b9be-f4e83318ac72")
Out[6]: 6

In [7]: r.compute()
Out[7]: 6
```

# Dask - Map

![]( images/dask_map.svg )

# Dask - Map (question)


```python
In [1]: import dask

In [2]: @dask.delayed
   ...: def addTwo(x):
   ...:     return x + 2
   ...:

In [3]: a = [0, 1, 2, 3]

In [4]: b = list(map(addTwo, a))

In [5]: b
Out[5]: ?

In [6]: dask.compute(*b)
Out[6]: ?
```

# Dask - Map (answer)


```python
In [1]: import dask

In [2]: @dask.delayed
   ...: def addTwo(x):
   ...:     return x + 2
   ...:

In [3]: a = [0, 1, 2, 3]

In [4]: b = list(map(addTwo, a))

In [5]: b
Out[5]:
[Delayed('addTwo-0bec16d0-6189-42a8-9390-8768548cef33'),
 Delayed('addTwo-728c5880-4a29-401c-afe6-144f17875cf7'),
 Delayed('addTwo-c82784a4-9000-49bd-88f9-976b5b07d479'),
 Delayed('addTwo-a492a9a2-9262-4e9e-93dd-4d0d557ad50f')]

In [6]: dask.compute(*b)
Out[6]: (2, 3, 4, 5)
```

# Dask - Reduce

![]( images/dask_reduce.svg )

# Dask - Reduce (question)


```python
In [1]: from functools import reduce

In [2]: import dask

In [3]: @dask.delayed
   ...: def add(x, y):
   ...:     return x + y
   ...:

In [4]: a = [0, 1, 2, 3]

In [5]: b = reduce(add, a)

In [6]: b
Out[6]: ?

In [7]: b.compute()
Out[7]: ?
```

# Dask - Reduce (answer)


```python
In [1]: from functools import reduce

In [2]: import dask

In [3]: @dask.delayed
   ...: def add(x, y):
   ...:     return x + y
   ...:

In [4]: a = [0, 1, 2, 3]

In [5]: b = reduce(add, a)

In [6]: b
Out[6]: Delayed('add-91c483ed-2953-4795-b572-5d61455aaace')

In [7]: b.compute()
Out[7]: 6
```