# 2.4. Dask Profiling & Diagnostics

Profiling Dask operations is critical to understanding how they work and how to improve performance.

In [None]:
import time
import dask
import dask.multiprocessing

## Example: *Adding Odds*

Let's consider the example from the previous notebook, but with time delays determined by the input data:

In [None]:
@dask.delayed
def add(x, y):
    time.sleep((x + y)/10.0)
    return x + y

In [None]:
@dask.delayed
def inc(x):
    time.sleep(x/10.0)
    return x + 1

In [None]:
@dask.delayed
def dbl(x):
    time.sleep(x/10.0)
    return 2*x

In [None]:
@dask.delayed
def dsum(*args):
    s = sum(*args)
    time.sleep(s/10.0)
    return s

In [None]:
data = [1,3,2,0]

In [None]:
sum_odds = dsum(inc(dbl(x)) for x in data)
sum_odds

In [None]:
sum_odds.visualize()

## Let's profile this with standard Python techniques

#### Multi-Threading Scheduler:

In [None]:
%prun sum_odds.compute(scheduler='threads', num_workers=4)

### Was that helpful?  Do you know why it ran the way it did?

#### Local Scheduler:

In [None]:
%prun sum_odds.compute(scheduler='single-threaded')

### Was that more helpful?  Do you know why it ran the way it did?

## Dask Profiler

Using the `Bokeh` package, Dask can visualize profiling (or diagnostic) information with the `diagnostics` subpackage.

In [None]:
from dask.diagnostics import Profiler, visualize
from bokeh.io import output_notebook
output_notebook()

with Profiler() as p:
    sum_odds.compute(scheduler='threads', num_workers=4)
    
visualize(p)

### The ResourceProfiler & CacheProfiler

Similar to the `Profiler` above, you can also import the `ResourceProfiler` to display memory and CPU usage and the `CacheProfiler` to display different metrics of the scheduler.

> ```python
> from dask.diagnostics import Profiler, ResourceProfiler, CacheProfiler, visualize
> from bokeh.io import output_notebook
> output_notebook()
> 
> with Profiler() as p, ResourceProfiler() as r, CacheProfiler() as c:
>     sum_odds.compute()
>     
> visualize([r, p, c])
```

## Dask ProgressBar

Dask can visualize the progress of computation using the `diagnostics` subpackage's `ProgressBar`.

In [None]:
from dask.diagnostics import ProgressBar

with ProgressBar():
    sum_odds.compute(scheduler='threads', num_workers=4)