# 2.3. Dask Parallelism

You are now ready to parallelize your Dask computations with either the *multiprocessing* or *threaded* Dask schedulers.  However, in order to do this effectively, you need to know a bit about how Dask will parallelize your Task Graphs, depending upon available resources and the different schedulers.

In [None]:
import time
import dask
import dask.multiprocessing

## Example: *Exploiting Parallelism*

Let's consider another example where we want to apply two independent functions to a list of numbers.

*This time, I will use the `time` package's `sleep` function, which releases the GIL, preventing multi-threading contention.*

In [None]:
@dask.delayed
def inc(x):
    time.sleep(1)
    return x + 1

In [None]:
@dask.delayed
def dbl(x):
    time.sleep(1)
    return 2*x

Now, let's consider a short list of data:

In [None]:
data = [2,5,7,3]

And from this data, let's construct odd numbers from each element of the list using the operation:

    2*x + 1

In [None]:
%time odd_data = [inc(dbl(x)) for x in data]
odd_data

And, finally, let's sum up these values using Python's `sum` method:

In [None]:
%time sum_odds = sum(odd_data)
sum_odds

#### Notice!

Since `Delayed` objects can be added (`+`), the standard Python `sum` function operates on `Delayed` objects, too!**

#### What does this Task Graph look like?

In [None]:
sum_odds.visualize()

#### Notice!

The `_inner` functions are added to the graph automatically so that Dask can apply the `sum` operation on the `Delayed` object elements.

## How long will this take to compute with 2 workers?

#### Multi-Threading:

In [None]:
%time sum_odds.compute(scheduler='threads', num_workers=2)

#### Multi-Processing:

In [None]:
%time sum_odds.compute(scheduler='multiprocessing', num_workers=2)

## Did you guess correctly?  Was it obvious?

If it was not obvious, what might you do to "clean up" the Task Graph?

> #### TIP:
> 
> Make sure all of your "substantial" functions are `delayed`!

In [None]:
delayed_sum = dask.delayed(sum)

In [None]:
%time delayed_sum_odds = delayed_sum(odd_data)
delayed_sum_odds

In [None]:
delayed_sum_odds.visualize()

#### Multi-Threading with 2 Workers:

In [None]:
%time delayed_sum_odds.compute(scheduler='threads', num_workers=2)

#### Multi-Processing with 2 Workers:

In [None]:
%time delayed_sum_odds.compute(scheduler='multiprocessing', num_workers=2)

## Now, let's scale up!

#### Multi-Threading with 4 Workers:

In [None]:
%time delayed_sum_odds.compute(scheduler='threads', num_workers=4)

#### Multi-Processing with 4 Workers:

In [None]:
%time delayed_sum_odds.compute(scheduler='multiprocessing', num_workers=4)

You can see that the 4 independent `dbl`/`inc` operations are parallelized, with each parallel thread/process taking 2 seconds to operate.  (The `sum` function does not have a `time` call in it, so it happens very quickly.)