### Clustering RGB values in an image

Just as a toy example, we're going to take the RGB values from each pixel in an image and cluster them.

This means that we have 3 Dimensions (red, green and blue). The number of datapoints is simply the number of pixels in the image.

_NB: In Jupyter, you simply go to a cell and hit `Shift + Enter` to run the cell and advance to the next cell._

In [None]:
import sklearn.cluster
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
%matplotlib inline
from IPython.display import Image as show_image

In [None]:
# let's look at what the picture is like

# load the picture into an array
image = sp.misc.imread('landscape.png')

# for the machine learning later, we need to turn it into a pixelnumber x 3 array:
data = np.reshape(image[:, :, :3], (-1, 3)).astype(float)

# show the image
plt.imshow(image)
plt.show()

# print the size of the image
print('Shape of our image array:', image.shape)

However, whatever algorithm we use won't see the data like this (we're not do

Since this is "only" threedimensional, we can actually visualise it with only 3 plots.

This can give us some idea as to how hard this would be for an algorithm to solve.

In [None]:
# let's visualise the data points in all three dimensions
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(15, 5))
for dim in range(3):
    plt.axes(axes[dim])
    x, y = ((dim + 1) % 3, (dim + 2) % 3)
    plt.scatter(data[:, x], data[:, y], alpha=0.1)
    plt.xlabel(['Red', 'Green', 'Blue'][x])
    plt.ylabel(['Red', 'Green', 'Blue'][y])

plt.show()


## Using a k-clusters approach

First: An algorithm where we ask for a number of clusters and it optimises the location of those clusters.

Let's just use a very simple algorithm to make a start. KMeans simply places _k_ cluster centers randomly into our data and then iterates a bunch of times to find the optimal placement.

In [None]:
from sklearn.cluster import KMeans

# decide how many clusters you want to get from the algorithm
k = ?

# initialise the clustering class
cluster_algorithm = KMeans(n_clusters=k, init='k-means++', max_iter=1000)

# Use the "fit" method of the clustering class to fit the data.
# Note: in python, you simply call class_name.method_name(method_arguments)
# <your code here>

# make it give us a new clustering value
cluster_labels = cluster_algorithm.predict(data)


To check if the clustering algorithm actually produced some good values, let's plot the image using our labels:

In [None]:
plt.imshow(np.reshape(cluster_labels, (image.shape[0], image.shape[1])), cmap='Set3')
plt.show()


In [None]:
# we can also visualise these in our original three-slice plots
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(15, 5))
for dim in range(3):
    plt.axes(axes[dim])
    x, y = ((dim + 1) % 3, (dim + 2) % 3)
    plt.scatter(data[:, x], data[:, y], alpha=0.1, c=cluster_labels, cmap='Set3')
    plt.xlabel(['Red', 'Green', 'Blue'][x])
    plt.ylabel(['Red', 'Green', 'Blue'][y])

plt.show()


---

## Using a density approach

So this produces really good results when "telling" the algorithm how many labels we wanted back. Let's try an algorithm that uses a density approach.

In [None]:
from sklearn.cluster import DBSCAN

# initialise the algorithm
cluster_algorithm = DBSCAN(eps=3, min_samples=10, n_jobs=4)

# Make the algorithm fit the data
cluster_algorithm.fit(data)

cluster_labels = cluster_algorithm.labels_

In [None]:
plt.imshow(np.reshape(cluster_labels, (image.shape[0], image.shape[1])), cmap='Set3')
plt.show()


In [None]:
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(15, 5))
for dim in range(3):
    plt.axes(axes[dim])
    x, y = ((dim + 1) % 3, (dim + 2) % 3)
    plt.scatter(data[:, x], data[:, y], alpha=0.1, c=cluster_labels, cmap='Set3', label=cluster_labels)
    plt.xlabel(['Red', 'Green', 'Blue'][x])
    plt.ylabel(['Red', 'Green', 'Blue'][y])

plt.legend()
plt.show()
