Spectral Clustering Example.

The image loaded here is a cropped portion of the ``MERCATOR_LC80210392016114LGN00_B10.TIF`` LANDSAT image included as a public [datashader example](http://datashader.org/topics/landsat.html).

In addition to `dask-ml`, we'll use `rasterio` to read the data and `matplotlib` to plot the figures.
I'm just working on my laptop, so we could use either the threaded or distributed scheduler. I'll use the distributed scheduler for the diagnostics.

In [None]:
import rasterio
import holoviews as hv
from holoviews.operation.datashader import regrid
import dask.array as da
from dask_ml.cluster import SpectralClustering
from dask.distributed import Client
hv.extension('bokeh')

In [None]:
with rasterio.open('landsat-sample.tiff') as dataset:
    arr = dataset.read(1)

arr = arr.astype(float)
# Rescale for the clustering algorithm
arr = (arr - arr.mean()) / arr.std()

In [None]:
%%opts Image (cmap='viridis')
regrid(hv.Image(arr))

In [None]:
client = Client(processes=False)
client

We'll reshape the image to be how dask-ml / scikit-learn expect it: `(n_samples, n_features)` where n_features is 1 in this case. Then we'll persist that in memory. We still have a small dataset at this point. The large dataset, which dask helps us manage, is the intermediate `n_samples x n_samples` array that spectral clustering operates on. For our 2,500 x 2,500 pixel subset, that's ~50

In [None]:
X = da.from_array(arr.reshape(-1, 1), chunks=100_000)
X = client.persist(X)

And we'll fit the estimator.

In [None]:
clf = SpectralClustering(n_clusters=4, random_state=0,
                         kmeans_params={'init_max_iter': 5})

In [None]:
%time clf.fit(X)

In [None]:
labels = clf.assign_labels_.labels_.compute()
c = labels.reshape(arr.shape)

In [None]:
%%opts Image (cmap='viridis')
regrid(hv.Image(arr)).relabel('Image') + regrid(hv.Image(c)).relabel('Clustered')