### **Example 1:**
We will use K-means to perform color segmentation on an image.

<br>

[Idoia Ochoa](https://portalcientifico.unav.edu/investigadores/329427/detalle) (Tecnun, University of Navarra)

<br>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/iochoa/JII-GradoIA/blob/main/notebooks/JII_1_Kmeans_ColorSegmentation_easier_2024.ipynb)


## Common imports and loading image

We will first import some common Python libraries

In [None]:
# For images and importing
from PIL import Image
import requests
from io import BytesIO

# For plotting
import matplotlib.pyplot as plt

# For arrays
import numpy as np

We will now read the image to which we will apply color segmentation.

At the end of the notebook you have code with an example on how to load a different image from the internet, as well as loading one of your pics stored in Google Drive. Just be mindful with the size of the image...

In [None]:
# Load the image
path_img_small = 'https://raw.githubusercontent.com/iochoa/ml-datasets/main/mandrill-small.tiff'

response = requests.get(path_img_small)
img = Image.open(BytesIO(response.content))
plt.imshow(img)

Let's check teh size of the Mandrill image

In [None]:
# Let's check its size
rgb_pixels = np.array(img)
rgb_pixels.shape

Does the shape make sense?

The image is composed of 128 x 128 pixels, with each pixel having 3 channels (R, G, B)

In [None]:
width = rgb_pixels.shape[0]
height = rgb_pixels.shape[1]
channels = rgb_pixels.shape[2]
print('Width is: ', width)
print('Height is: ', height)
print('Number of channels is: ', channels)

We will now reshape the image to a 2D-tensor (matrix) of pixels, with dimension (128*128) x 3, i.e., each component of the tensor represents a pixel, composed of 3 channels (R, G, B).

In [None]:
# We use the reshape function
img_vector = np.array(img)
img_vector = img_vector.reshape((width*height, channels))
img_vector.shape

## K-Means clustering

We will now apply K-Means to cluster the pixels by similarity, i.e., we want to group pixels that are similar to each other (based on their R, G, B values) together.

For example, we would like all red(ish) pixels to form a cluster, all blue(ish) pixels to form another cluster, and so on. The clustering algorithm K-means will do this automatically.

In [None]:
# Apply Kmeans with K=16 (this will generate 16 clusters/groups)
from sklearn.cluster import KMeans
k = 16
kmeans = KMeans(n_clusters=k)

# Cluster assignmet for each pixel is stored in vector 'y_pred'
y_pred = kmeans.fit_predict(img_vector)

In [None]:
# y_pred contains the cluster assignment to each pixel (it should range from 0 to 15)
y_pred

In [None]:
# We can check how many pixels are assigned to each cluster
np.unique(y_pred, return_counts=True)

We can generate a bar plot with the number of pixels in each cluster

In [None]:
from collections import Counter

# Count the frequency of each element
count = Counter(y_pred)

# Separate the keys (the elements in the vector) and the values (their counts)
elements = list(count.keys())
frequencies = list(count.values())

# Create the bar plot
plt.bar(elements, frequencies)
plt.xlabel('Cluster')
plt.ylabel('Frequency')
plt.title('Number of pixels assigned to each cluster')
plt.show()

We can now check how the representatives (centroids) of each of these clusters look like.

A cluster representative is computed as the average of all pixels assigned to that cluster, so it can be viewed as the "average pixel" (it should have 3 values for the R, G, B channles).

Hence, we would expect one of them to be red(ish), another one blue(ish), and so on.

In [None]:
# Plot the centroids to understand better how the clusters look like
tmp = kmeans.cluster_centers_.reshape(16,1,3)
plt.imshow(np.uint8(tmp))

We can also check their R, G, B values explicitly

In [None]:
kmeans.cluster_centers_

We can also plots pixels belonging to specific clusters (groups). We would expect pixels of a cluster to loook similar to the corresponding centroid (representative).

In other words, if the representative of a cluster is red(ish), the pixels of that cluster should also look red.

In [None]:
# Select the cluster for which you want to plot pixels assigned to it
cluster = 4

# Select the first 100 pixels
pixels_in_cluster = img_vector[y_pred == cluster][0:100]

# Reshape them to 10x10 for visualization purposes
tmp1 = pixels_in_cluster.reshape(10, 10, 3)

# Plot them
plt.imshow(np.uint8(tmp1))

## Color segmentation (compression)

Next we will substitute all pixels belonging to a cluster, by the representative of that cluster.

After this transformation, the pixels of the newly generated image can take only 16 colors, those of the centroids (see above).

In [None]:
# Subsitute each pixel by the representative of the cluster they belong to
new_image_vector = kmeans.cluster_centers_[y_pred]
new_image_vector.shape

Let's take a closer look at the first pixel, to make sure the new value correspond indeed to the corresponding centroid

In [None]:
# Let's check the pixel value (R,G,B) of the first pixel
print('New pixel: ', new_image_vector[0])

# Cluster assignment of the first pixel
print('Cluster assignment: ', y_pred[0])

# Centroid of that cluster
print('Centroid of cluster ', y_pred[0], ' is ', kmeans.cluster_centers_[y_pred[0]])

We will now plot the newly generated image that has only 16 unique pixels in it

In [None]:
# Reshape back to (128,128,3) to plot it
new_image = new_image_vector.reshape(width,height,channels)

# Plot new image
plt.imshow(np.uint8(new_image))

In [None]:
# Let's plot the original image for comparison
plt.imshow(img)

Let's see how the reconstruction changes for different values of K (number of clusters). Recall that K corresponds to the number of unique pixels in the reconstructed (newly generated) image.

In [None]:
# values of K to test
knumber = [5, 10, 16, 32, 64, 128]

# Apply k-means iteratively for each k
f, axarr = plt.subplots(2,3)
row = 0
col = 0
for k in knumber:
  kmeans = KMeans(n_clusters=k)
  y_pred = kmeans.fit_predict(img_vector)
  new_image_vector = kmeans.cluster_centers_[y_pred]
  new_image = new_image_vector.reshape(128,128,3)
  axarr[row,col].imshow(np.uint8(new_image))
  axarr[row,col].set_title(f"K=%d" % k)
  col += 1
  if col == 3:
    row = 1
    col = 0

## Using other images

Using a different image from the internet

In [None]:
# Just modify the path to point to the image, for example:
path_tiger = "https://cdn.britannica.com/22/226322-050-C17930D6/Bengal-tiger-Panthera-tigris-tigris-Maharastra-India.jpg"

response = requests.get(path_tiger)
img_web = Image.open(BytesIO(response.content))
plt.imshow(img_web)

Using an image located in your Google Drive

In [None]:
# Mount Google Drive so that you can access the picture
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Specify the path of your image
# Note that your Google Drive path ALWAYS starts with "/content/drive/MyDrive/"
path_google_drive = "/content/drive/MyDrive/courses/JII-NuevoGrado/mandrill-small.tiff"
path_google_drive = "/content/drive/MyDrive/courses/JII-NuevoGrado/picture-family-LQ.jpg"

In [None]:
# Let's load the image to which we want to apply color segmentation and plot it
import PIL
img = PIL.Image.open(path_google_drive)
plt.imshow(img)