# K-Means / PCA

## Task Description

In this exercise, your task is to compress images using K-Means and PCA algorithms.

You are given three RGB images: a butterfly, a flower, and a planet.

<img src="butterfly.jpg" alt="Photo by David Clode on Unsplash" height="300"/>
<img src="flower.jpg" alt="Photo by Y S on Unsplash"  height="300"/>
<img src="nasa.jpg" alt="Photo by NASA on Unsplash" height="300"/>

You have to implement and answer the following:
1. Compute the singular values of an image.

2. K-Means Classifier using sklearn with the arguments (see function's signature) and random_state=42. 
Each data point resembles a pixel in 3-dim space (RGB).
You should replace each pixel with its nearest cluster center.
- What number of clusters is good for compressing each image?

3. PCA using sklearn with the arguments (see function's signature) and random_state=42.
Treat each $n \times m$ image as three different data matrices, where each color channel (Red, Green, and Blue) gives rise to exactly one $n \times m$ data matrix.
- What number of principal components is enough to almost perfectly compress each image?
- What number of principal components is enough for a human to see what is on the image?

In [1]:
# Run `pip install PyQt6` if you get an error

import matplotlib

matplotlib.use("QtAgg")

import matplotlib.pyplot as plt
import numpy as np
from matplotlib.image import imread
from matplotlib.widgets import Slider
from numpy.typing import ArrayLike
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
import functools

np.random.seed(42)

In [2]:
def get_singular_values(image: ArrayLike) -> ArrayLike:
    """
    Compute singular values directly from the reshaped and centered RGB image.

    Args:
        image: RGB image of shape (H, W, 3).
    Returns:
        Singular values of the centered 2D image matrix.
    """
    image = np.asarray(image)

    if image.ndim != 3 or image.shape[2] != 3:
        raise ValueError("Expected RGB image with shape (H, W, 3).")

    # Option 1: flatten channels into width dimension → (H, W*3)
    reshaped_image = image.reshape(image.shape[0], -1)  # (H, W*3)

    # Center the data (subtract column-wise mean)
    centered = reshaped_image - np.mean(reshaped_image, axis=0)

    # Compute SVD directly
    _, s, _ = np.linalg.svd(centered, full_matrices=False)

    return s

In [3]:
def transform_kmeans(image: ArrayLike, n_clusters: int, n_init: int = 1) -> ArrayLike:
    """Transform the image using KMeans clustering.
    Args:
        image (ArrayLike): The image to be transformed.
        n_clusters (int): The number of clusters to use.
        n_init (int): The number of times the KMeans algorithm will be run with different centroid seeds.
    Returns:
        ArrayLike: The transformed image.
    """
    # TODO: Implement this function using sklearn and random_state=42
    # Note: do not forget to reshape the image as the task description says
    reshaped = image.reshape(-1, 3)  # Reshape to (H*W, 3)
    kmeans = KMeans(n_clusters=n_clusters, n_init=n_init, random_state=42)
    kmeans.fit(reshaped)
    transformed = kmeans.cluster_centers_[kmeans.labels_]
    return transformed.reshape(image.shape)  # Reshape back to (H, W, 3)

In [4]:
def transform_pca(image: ArrayLike, n_components: int) -> ArrayLike:
    """Transform an image using PCA.
    Args:
        image (ArrayLike): The image to be transformed.
        n_components (int): The number of PCA components to use.
    Returns:
        ArrayLike: The transformed image.
    """
    # TODO: Implement this function using sklearn and random_state=42
    # Note: do not forget to reshape the image as the task description says

    pass

In [5]:
def plot_slider_image(image: ArrayLike, transform: callable, title: str) -> None:
    """Plot an image with a slider to adjust the number of PCA components.
    Args:
        image (ArrayLike): The image to be transformed.
        transform (callable): The transformation function to apply to the image.
        title (str): The title of the plot.
    """

    @functools.cache
    def cached_transform(*args, **kwargs):
        return transform(image, *args, **kwargs)

    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(8, 6))
    fig.subplots_adjust(bottom=0.25)

    d = min(min(image.shape[:2]) - 1, 60)  # max number of clusters/components is 60
    allowed_pca = list(range(1, d + 1))

    pos = ax1.get_position()
    ax_pca_cnt = fig.add_axes([pos.x0 + 0.02, pos.y0 - 0.1, pos.width, 0.03])

    if transform == transform_kmeans:
        slider_title = "n_clusters"
    elif transform == transform_pca:
        slider_title = "n_components"
    else:
        slider_title = "unknown"
    spca = Slider(
        ax_pca_cnt,
        slider_title,
        1,
        d,
        valinit=allowed_pca[1],
        valstep=allowed_pca,
        color="red",
    )

    disp = ax1.imshow(cached_transform(allowed_pca[1]).clip(0, 1))

    def update(val):
        n_pcs = spca.val
        disp.set_data(cached_transform(n_pcs).clip(0, 1))
        fig.canvas.draw_idle()

    spca.on_changed(update)

    sigmas = get_singular_values(image)
    ax2.plot(list(range(d)), sigmas[:d], ".", color="#D81B60")
    ax2.set_xlabel("Singular value index")
    ax2.set_ylabel("Singular value")

    if transform == transform_pca:
        transform_name = "PCA"
    elif transform == transform_kmeans:
        transform_name = "KMeans"
    else:
        transform_name = "Unknown"

    fig.suptitle(f"{transform_name} of {title} Image")

    plt.show()

In [6]:
def pipeline(path: str, transform: callable, title: str) -> None:
    """Run the PCA pipeline on an image.
    Args:
        path (str): The path to the image file.
        transform (callable): The transformation function to apply to the image.
        title (str): The title of the plot.
    """
    im = imread(path) / 255.0
    plot_slider_image(im, transform, title)

## Butterfly

In [8]:
pipeline("butterfly.jpg", transform_kmeans, title="Butterfly")

ImportError: libGL.so.1: cannot open shared object file: No such file or directory

In [None]:
pipeline("butterfly.jpg", transform_pca, title="Butterfly")

## Flower

In [None]:
pipeline("flower.jpg", transform_kmeans, title="Flower")

In [None]:
pipeline("flower.jpg", transform_pca, title="Flower")

## NASA

In [None]:
pipeline("nasa.jpg", transform_kmeans, title="NASA")

In [None]:
pipeline("nasa.jpg", transform_pca, title="NASA")