# Exercise 1. Embarrassingly parallel: multiple threads vs. multiple processes

In this example, you will investigate the speed-ups (or lack of speed-up) for an embarrssingly parallel workflow with a CPU-bound function that is executed using different paradigms (single thread, multiple threads and multiple processes). Here is a diagram of the workflow:
<img src='figures/embarrassingly_parallel.png'>

**Your task**:
1. Modify the code inside `func` function in `1_problem.py` so that it takes approximately 1 second to execute. (You choose what it does.)
1. Load it into this notebook using the cell below that has the `%load` magic.
1. Use the cell with the `%%time` magic to see how long it takes 4 instances of the `func` function to execute with 1) a single thread, 2) multiple threads and 3) multiple processes.
1. Note down the times for all executions and any other interesting observations.
1. Submit your work (see instructions in last cell).

**Hints**:
* If your `func` function is not reproducing the multithread/multiprocessing behaviour you expect, make sure it is CPU-bound. Ask a tutor if you have questions. :)

### Step 0: Import package(s) we need

In [1]:
import dask  # we'll explain this package later

### Step 1: Modify 1_problem.py, load it and execute the cell

In [3]:
# %load 1_solution.py
from dask import delayed  # DON'T CHANGE (explained later)

def func(i):
    """A dummy CPU-bound function."""
    print(f'Function {i} starting...')
    n = 2e7
    while n > 0:
        n -= 1
    print(f'Function {i} done')
    return i

lazy = [delayed(func)(i) for i in range(4)]  # DON'T CHANGE (explained later)

### Step 2: Make predictions

Write down how long you think it will take your function to run with  
a) a single thread  
b) multiple threads  
c) multiple processes  

### Step 3: Test your predictions

The cell below is where you can execute the computations with different schedulers (i.e., multithreaded vs. multiprocesses). We will explain later what the code is acutally doing, but for now all you need to change is the `scheduler` variable. The options for `scheduler` are
* `single-threaded`: a single thread
* `threads`: multiple threads
* `processes`: multiple processes

Try with all three scheduler options and write down your results. **Consider the following questions**. Do they match the times you predicted? Why/why not? Any other interesting observations?

**Last note**: I recommend copying the cell into three different cells, one for each scheduler option, to more easily compare the different paradigms.

In [4]:
%%time

scheduler = 'single-threaded'  # <--- ***CHANGE ME!***

# don't change the code below
res = dask.compute(lazy, scheduler=scheduler)
print(res)

Function 3 starting...
Function 3 done
Function 2 starting...
Function 2 done
Function 1 starting...
Function 1 done
Function 0 starting...
Function 0 done
([0, 1, 2, 3],)
Wall time: 12.5 s


In [5]:
%%time

scheduler = 'threads'  # <--- ***CHANGE ME!***

# don't change the code below
res = dask.compute(lazy, scheduler=scheduler)
print(res)

Function 3 starting...Function 2 starting...
Function 1 starting...

Function 0 starting...
Function 1 doneFunction 3 done

Function 2 done
Function 0 done
([0, 1, 2, 3],)
Wall time: 12.5 s


In [6]:
%%time

scheduler = 'processes'  # <--- ***CHANGE ME!***

# don't change the code below
res = dask.compute(lazy, scheduler=scheduler)
print(res)

([0, 1, 2, 3],)
Wall time: 8.71 s


### Step 4: Submit your work

Please follow these instructions carefully! Otherwise I won't be able to load your solution into my notebook.

In short, create a branch, add/commit **only** the text file you touched and then push the branch.

```
git checkout -b <unique branch name>
git add 1_problem.py
git commit -m "uploading our solution"
git push origin <unique branch name>

```