**Custom clustering**


The custom clustering method implemented a form of vector quantization.

This method partitions the intensity levels of the grayscale image into distinct clusters based on their values. The image is first flattened into a one-dimensional vector. The range of intensity values is then divided into equal intervals, with the number of intervals corresponding to the number of desired clusters. These intervals serve as the initial clusters.

For each pixel in the image, the method calculates the absolute difference between the pixel's intensity value and the representative values of the clusters (the mid-points of the intervals). The pixel is assigned to the cluster with the smallest difference.

This process is repeated for all pixels in the image, creating a list of clusters, where each cluster contains the intensity values of the pixels that belong to it.

for each cluster, a binary mask is created. In the mask, pixels that belong to the cluster are assigned a unique identifier (i.e., the cluster index plus one), and all other pixels are assigned a value of zero. The binary masks are returned as the output of the function, providing a segmented version of the original image.

This method is akin to a manual implementation of k-means clustering, where the initial centroids are equally spaced across the range of intensity values. However, unlike k-means, this method does not involve any iterative updating of the centroids. The clusters are determined solely based on the initial partitioning of the intensity range.

this custom clustering method may prove effective for certain types of images, particularly when the intensity values of different regions of the image are distinctly separated. 

**Imports**

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import cv2
import os
from datetime import datetime
import matplotlib.pyplot as plt


**Setup Directory Paths**

Path - location of raw datasets

path_output - where preprocessed images will be saved

In [2]:
#drive path to directory containg datasets
path = "../Data Sets/Processed Datasets"

#drive path to output directory for preprocessed_data
path_output = "../Data Sets/Clustered Datasets"

**Import data to dictonary**

Imports images to a dicontary of format dataset[folder]->Image

**Reshape Images**

Reshape images to have a consistint shape via adding downsizing and padding

clustering expects images to have a standard size

In [4]:
def resize_image(image, target_size):
    height, width = image.shape[:2]
    target_height, target_width = target_size

    # Calculate the aspect ratio and the new dimensions
    aspect_ratio = width / height
    new_width = target_width
    new_height = int(target_width / aspect_ratio)

        
    if new_height > target_height:
        new_height = target_height
        new_width = int(target_height * aspect_ratio)

    # Resize the image
    resized_image = cv2.resize(image, (new_width, new_height), interpolation=cv2.INTER_AREA)

    # Add padding to the resized image to match the target size
    pad_height = target_height - new_height
    pad_width = target_width - new_width
    padding = [(pad_height // 2, pad_height - pad_height // 2), (pad_width // 2, pad_width - pad_width // 2)]

    padded_image = np.pad(resized_image, padding, mode='constant', constant_values=0)

    return padded_image

In [9]:
# Empty dictionary to store the processed image data
resized_dataset = {}

target_size = (256, 256)  # Set your target size here

# Loop through all folders in the output directory and import images
# Images will be stored as processed_dataset['folder'][data]
for folder in sorted(os.listdir(path)):
    folder_path = os.path.join(path, folder)
    data = []
    for file_name in sorted(os.listdir(folder_path)):
        # Import data
        file_path = os.path.join(folder_path, file_name)
        image = np.load(file_path)
        if image is not None:
            # Resize the image using the new function
            resized_image = resize_image(image, target_size)
            data.append(resized_image)
    
    # Convert data to numpy arrays
    data = np.array(data)
    # Add the data to the processed_dataset dictionary
    resized_dataset[folder] = data

**Cluster Images**


In [7]:
def Custom_clustering(image, num_clusters):
    img_vector = image.flatten()

    # Create a list to store clusters
    clusters = [[] for _ in range(num_clusters)]  # Initialize with required size
    cluster_indices = [[] for _ in range(num_clusters)]  # Store the indices for each cluster

    # Range
    range_value = np.max(img_vector) - np.min(img_vector)
    
    # Determine the # of steps
    stepv = range_value/num_clusters
    
    # Cluster initialization
    K = np.arange(stepv, np.max(img_vector), stepv)

    for i in range(len(img_vector)):
        difference = np.abs(K - img_vector[i])
        ind = np.argmin(difference)
        clusters[ind].append(img_vector[i])  # Add the value to the corresponding cluster
        cluster_indices[ind].append(i)  # Store the index

    # Reshape clusters and create binary masks
    binary_masks = []
    for i in range(num_clusters):
        cluster_img = np.zeros_like(img_vector)
        cluster_img[cluster_indices[i]] = i + 1
        binary_masks.append(cluster_img.reshape(image.shape))

    return binary_masks

In [10]:
Custom_results = []
num_clusters = 4
for label, images in resized_dataset.items():
    for img in images:
        binary_masks = Custom_clustering(img, num_clusters)
        Custom_results.append(binary_masks)

**Save the cluster images**

applys the binary masks generated by clustering in order to create 4 cluster images per image in dataset.

These clusted images are saved to the output directory in "Custom_Timestamp" format

In [None]:
#check if directory is there, if not create it
# Check if the directory exists
if not os.path.isdir(path_output):
    # Create the directory
    os.makedirs(path_output)

In [11]:
def save_individual_clusters(image, binary_masks, label, image_index, base_path):
    num_clusters = len(binary_masks)
    for i in range(num_clusters):
        single_cluster_image = binary_masks[i] * 255  # Multiply by 255 to get back to the range [0, 255]
        
        # Save the single cluster image
        cluster_folder = os.path.join(base_path, label)
        if not os.path.exists(cluster_folder):
            os.makedirs(cluster_folder)
        
        file_name = f"image{image_index}_cluster{i + 1}.png"
        file_path = os.path.join(cluster_folder, file_name)
        
        cv2.imwrite(file_path, single_cluster_image)

In [12]:
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
base_path = os.path.join(path_output, f"Custom_{timestamp}")

global_image_index = 0  # Global image index for position in the dictionary
for label, images in resized_dataset.items():
    local_image_index = 1  # Local image index for naming, resets for each folder
    for img in images:
        binary_masks = Custom_results[global_image_index]
        save_individual_clusters(img, binary_masks, label, local_image_index, base_path)  # Use local_image_index for naming
        global_image_index += 1
        local_image_index += 1
print(f"All images were successfully saved in the directory: '{base_path}'")

All images were successfully saved in the directory: '../Data Sets/Clustered Datasets\Custom_20230518_013052'
