# Import Dependencies

In [1]:
import os 
import numpy as np 
import cv2 as cv
import matplotlib.pyplot as plt

from pygments.formatters import img
from tqdm import tqdm

from preprocessing.edge_extraction import *
from feature_extraction import * 
from preprocessing.fourier_transform import * 
from preprocessing.image_conversion import * 
from clustering import *
from preprocessing.contrast_enhancement import *

# Pre-processing

To reduce noise in images of whole artworks and fragments, we initially considered using the Fourier transform to process the images in the frequency domain.

While converting an image from RGBA to grayscale simplifies processing, it results in the loss of RGB color and alpha channel data, which can be problematic if that information is needed later. Therefore, we chose to split the image into its primary color channels (excluding the alpha channel) and process each channel separately in the frequency domain. After filtering, we planned to reconstruct the filtered image by recombining the processed channels.

However, after several trials, we found that processing the channels separately led to significant information loss in one or more channels. Consequently, we decided to use the NLMeansDenoising filter instead.

Since our goal is to cluster fragments that belong to the same image, we focus on maintaining "continuity" along the fragment borders. Therefore, our process emphasizes the information present along these edges.

Steps:
1. Extract a working region from the borders of the fragment.
2. Filter out the transparent pixels from the working region.
3. Denoise the working region.

**CONSIDERATION**: Contrast enhancement.

In [None]:
images = create_dataset("./data", threshold=5)

# Feature Extraction

To extract relevant features from the fragments, we employ two methods:
- Color Histograms
- Gradient Jacobians

## Color Histograms

Color histograms are graphical representations of the distribution of colors in an image. They quantify the number of pixels that have specific color values, effectively capturing the color composition of the image. By analyzing the color histograms of image fragments, we can compare and cluster similar fragments based on their color distributions.

**This technique is particularly useful for identifying and matching regions of images that share similar color patterns**.

In [None]:
flatten_color_histograms = compute_color_histograms(images)
unflatten_color_histograms = compute_color_histograms(images, flatten=False)

In [None]:
distance_matrix_color_histogram = compute_color_histogram_dist_matrix(unflatten_color_histograms)
distance_matrix_color_histogram

### K-Means

In [2]:
references = [reference.split(".")[1] for reference in os.listdir("./references")]
references

['33', '34', '35', '36', '37', '38', '39', '40']

In [5]:
import pickle 
from sklearn.cluster import KMeans


os.makedirs("./optimal_data", exist_ok=True)

while len(references) > 0:
    images = create_dataset("./data", threshold=5)
    flatten_color_histograms = compute_color_histograms(images)
    kmeans = KMeans(n_clusters=len(references), random_state=42)
    fit_kmeans = kmeans.fit(flatten_color_histograms)

    create_cluster_dirs(data_dir="./data", output_dir="clusters/kmeans/colors", labels=fit_kmeans.labels_)
    scores = {}
    for reference_id in references:
        scores[reference_id] = compute_metrics(reference_id, "clusters/kmeans/colors", output_file=f"results/kmeans_color_n_42_ref_{reference_id}.json")

    threshold = 0.80
    opt_clusters = {}
    for reference_id, d in scores.items():
        max_items = d["max_items"]
        for max_item in max_items:
            if max_item[1] >= threshold:
                if reference_id in opt_clusters:
                    opt_clusters[reference_id].append(max_item[0])
                else:
                    opt_clusters[reference_id] = [max_item[0]]
                
    # move the optimal clusters to another path and reinitiate the clustering process without those fragments
    opt_dir = "optimal_clusters/kmeans/colors"
    os.makedirs(opt_dir, exist_ok=True)
    for reference_id, cluster_dirs in opt_clusters.items():
        reference_dir = os.path.join(opt_dir, reference_id)
        os.makedirs(reference_dir, exist_ok=True)
        for cluster_dir in cluster_dirs:
            img_dir = os.path.join("clusters/kmeans/colors", cluster_dir)
            for filename in os.listdir(img_dir):
                shutil.copy(os.path.join(img_dir, filename), os.path.join(reference_dir, filename))
                shutil.move(os.path.join("./data", filename), os.path.join("./optimal_data", filename))
            shutil.rmtree(img_dir)
        references.remove(reference_id)

Creating dataset: 100%|██████████| 328/328 [00:09<00:00, 33.11it/s]
Computing color histograms: 100%|██████████| 328/328 [00:00<00:00, 65629.79it/s]
Creating cluster dirs: 100%|██████████| 328/328 [00:00<00:00, 1336.91it/s]


FileNotFoundError: [WinError 3] Impossibile trovare il percorso specificato: 'clusters/kmeans/colors\\cluster_4'

## Gradient Jacobians

Gradient Jacobians represent the gradients of pixel intensities in an image. They capture the rate of change of pixel values in both the horizontal and vertical directions, highlighting edges and texture details. By computing the Jacobians of image fragments, we can compare and group fragments that exhibit similar edge and texture patterns. Formally, the gradient jacobians we use are of the form:

$$
\begin{align}
\begin{bmatrix} G_x & G_{x\_gray} \\ G_y & G_{y\_gray} \end{bmatrix}
\end{align}
$$

where $G_x$ and $G_y$ are the aggregated gradient of the RGB channels, while $G_{x\_gray}$ and $G_{y\_gray}$ are the gradient of the grayscale image.

This method is especially valuable for identifying structural similarities and continuities between different fragments.

In [None]:
flatten_jacobians = compute_jacobians(images)
unflatten_jacobians = compute_jacobians(images, flatten=False)

In [None]:
distance_matrix_jacobians = compute_jacobians_dist_matrix(unflatten_jacobians)
distance_matrix_jacobians

### K-Means

In [None]:
from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=3, random_state=42)
fit_kmeans = kmeans.fit(flatten_jacobians)
create_cluster_dirs(data_dir="./data", output_dir="clusters/kmeans/jacobians", labels=fit_kmeans.labels_)