# Using Dask for Distributed Image Processing

In [None]:
import dask
import dask.array as da
import dask_image.ndfilters as di
import matplotlib.pyplot as plt

## The image class and dask arrays

We start by creating an image object with a pointer to our hdf5 file:

In [None]:
from image import image
i = image("test_images/orion.hdf5")

The data attribute of the image class is a dask array storing the image data.

In [None]:
i.data

We can see that the dask array is broken down into several chunks, each of which is a numpy ndarray that can be computed independently and in parallel. 

Dask arrays are lazy by default, we need to make a call to compute() to access the values in the array:

In [None]:
%time i.data.compute()

Matplotlib can be used to view the image, which in this case is a picture of the orion constellation

In [None]:
%matplotlib notebook
plt.imshow(i.data, cmap='gray')

## Histogramming

Dask provides wrapper functions for most of the standard numpy functions. Here we demonstrate the histogramming function:

In [None]:
histo, bins = da.histogram(i.data, bins=100, range=[5000,10000])

In [None]:
histo

Note again that the dask object returned is lazy and needs to be explicitly computed:

In [None]:
%time y = histo.compute()
y

In [None]:
%matplotlib notebook
plt.bar(bins[:100], y, width=20)
plt.show()

## Smoothing

Dask also implements a large set of scipy functionality. Here we demonstrate a similar workflow for gaussian image smoothing:

In [None]:
smooth_array = di.gaussian_filter(i.data, sigma=10)
smooth_array

In [None]:
%time smooth_array.compute()

In [None]:
%matplotlib notebook
plt.imshow(smooth_array, cmap='gray')

## dask.distributed

In this simple example, each chunk of the dask array was processed in parallel across the four cores of one machine. However, for larger datasets, dask can be configured to autonomously distribute jobs across large compute clusters over network.