# Using `K-means` for image compression


Clustering can be used for image compression. The scheme is surprisingly simple: similar, adjacent colors should be combined into one color. Let's do it ourselves. You will work with the following image.

Read the image from your individual assignment.

In [None]:
import matplotlib.pyplot as plt
from google.colab.patches import cv2_imshow
import cv2
import numpy as np


# Путь к изображению
img_path = '/content/china.jpg'

img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.axis('off')
plt.imshow(img)

The image shown is a three-channel (RGB) image. Calculate the average pixel intensity across all channels.

In [None]:
# < ENTER YOUR CODE HERE > 

Normalize pixel intensity values by dividing all values by $255$. Calculate the average value of pixel intensity for all channels after the transformation has been done.

In [None]:
# < ENTER YOUR CODE HERE > 

Let's make sure that the original color space with about $16$ million possible states ($256^3$) is too large, and its dimensionality can be reduced. For clarity, let's build only some subsamples of dependencies of intensity values and only on a random subset of pixels.

"Straighten" the image so that <code>.shape</code> of the corresponding array has the following form <code>(height * width, 3)</code>.

In [None]:
# The function takes as input a "straightened" array corresponding to the image

def plot_pixels(data, colors=None, N=10000):
    if colors is None:
        colors = data

    rng = np.random.RandomState(0)
    i = rng.permutation(data.shape[0])[:N]
    colors = colors[i]
    R, G, B = data[i].T

    fig, ax = plt.subplots(1, 2, figsize=(16, 6))
    ax[0].scatter(R, G, color=colors, marker='.')
    ax[0].set(xlabel='Red', ylabel='Green', xlim=(0, 1), ylim=(0, 1))

    ax[1].scatter(R, B, color=colors, marker='.')
    ax[1].set(xlabel='Red', ylabel='Blue', xlim=(0, 1), ylim=(0, 1))

In [None]:
# < ENTER YOUR CODE HERE > 

Let's transform $16$ million possibilities into just $16$ possibilities by using <code>K-means</code>. To speed up the algorithm, we'll use <a href="https://scikit-learn.org/stable/modules/generated/sklearn.cluster.MiniBatchKMeans.html"><code>K-means</code></a> on batches (subsets of the dataset). We'll see if this spoils the result.

Train the <code>MiniBatchKMeans</code> model on a normalized and "straightened" image with the parameters specified in your assignment.

In the image, replace the color values of each pixel with the coordinate values of the centroid of the cluster to which the pixel was assigned. Plot color dependency graphs using <code>plot_pixels()</code> with the new palette (the parameter <code>colors</code>).

In [None]:
# < ENTER YOUR CODE HERE > 

Calculate the average pixel intensity value of the resulting image.

In [None]:
# < ENTER YOUR CODE HERE > 

Enter the image number that corresponds to the $16$ color palette.

In [None]:
# < ENTER YOUR CODE HERE > 

Construct an image of size $4 \times 4$ based on the $16$ colors obtained. Select the correct image.

In [None]:
# < ENTER YOUR CODE HERE > 

It's time to look at the result! Line up next to the original image and an image that uses only $16$ of colors.

In [None]:
# < ENTER YOUR CODE HERE > 