**FCM CLustering**

Fuzzy C-Means (FCM) is an unsupervised machine learning algorithm that is often applied to image segmentation tasks. It's a method of clustering that allows a single data point (in this case, a pixel) to belong to multiple clusters.

In FCM, each pixel in the image is assigned a membership value for each cluster, representing the degree to which the pixel belongs to the cluster. This is in contrast to hard clustering methods like K-means, where each pixel is assigned to exactly one cluster.

The FCM algorithm works by initializing a fuzzy partition matrix with random values between 0 and 1. This matrix represents the initial membership degrees of the pixels for each cluster. The algorithm then iterates through two main steps until convergence:


1.   Compute the cluster centers: The center of each cluster is calculated as a weighted average of all pixels, where the weights are the membership degrees of the pixels for the cluster.
2.   Update the membership degrees: The degree to which each pixel belongs to each cluster is updated based on the distance between the pixel and the cluster center. Pixels closer to a cluster center will have a higher membership degree for that cluster.

The result of the FCM algorithm is a set of membership matrices for each cluster, which can be used to generate a segmented image. Each pixel in the segmented image can be labeled according to the cluster for which it has the highest membership degree, or the membership degrees themselves can be used to create a fuzzy segmented image.






In [6]:
pip install scikit-fuzzy

Note: you may need to restart the kernel to use updated packages.


You should consider upgrading via the 'c:\Users\zacli\AppData\Local\Programs\Python\Python310\python.exe -m pip install --upgrade pip' command.


**Imports**

In [7]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import skfuzzy as fuzz
import cv2
import os
from datetime import datetime
import matplotlib.pyplot as plt

**Setup Directory Paths**

Path - location of raw datasets

path_output - where preprocessed images will be saved

In [8]:
#drive path to directory containg datasets
path = "../Data Sets/Processed Datasets"

#drive path to output directory for preprocessed_data
path_output = "../Data Sets/Clustered Datasets"

**Import data to dictonary**

Imports images to a dicontary of format dataset[folder]->Image

**Reshape Images**

Reshape images to have a consistint shape via adding downsizing and padding

clustering expects images to have a standard size

In [9]:
def resize_image(image, target_size):
    height, width = image.shape[:2]
    target_height, target_width = target_size

    # Calculate the aspect ratio and the new dimensions
    aspect_ratio = width / height
    new_width = target_width
    new_height = int(target_width / aspect_ratio)

    if new_height > target_height:
        new_height = target_height
        new_width = int(target_height * aspect_ratio)

    # Resize the image
    resized_image = cv2.resize(image, (new_width, new_height), interpolation=cv2.INTER_AREA)

    # Add padding to the resized image to match the target size
    pad_height = target_height - new_height
    pad_width = target_width - new_width
    padding = [(pad_height // 2, pad_height - pad_height // 2), (pad_width // 2, pad_width - pad_width // 2)]

    padded_image = np.pad(resized_image, padding, mode='constant', constant_values=0)

    return padded_image

In [10]:
# Empty dictionary to store the processed image data
resized_dataset = {}
target_size = (256, 256)  # Set your target size here

# Loop through all folders in the output directory and import images
# Images will be stored as processed_dataset['folder'][data]
for folder in sorted(os.listdir(path)):
    folder_path = os.path.join(path, folder)
    data = []
    for file_name in sorted(os.listdir(folder_path)):
        # Import data
        file_path = os.path.join(folder_path, file_name)
        image = np.load(file_path)

        if image is not None:
            image = resize_image(image, target_size)
            data.append(image)
    
    # Convert data to numpy arrays
    data = np.array(data)
    # Add the data to the processed_dataset dictionary
    resized_dataset[folder] = data

**Cluster Images**

In [11]:
fcm_results = []
# Fuzzy C-means
# set up perameters
num_clusters = 4
fuzziness = 2
error=0.005
maxiter=1000
for label, images in resized_dataset.items():
    for img in images:
        # Flatten the image
        img_flattened = img.reshape(-1, 1)
        
        # Perform Fuzzy C-means clustering
        cntr, u, _, _, _, _, _ = fuzz.cluster.cmeans(
            img_flattened.T, num_clusters,
            fuzziness, 
            error,
            maxiter, 
            init=None
        )
        
        fcm_labels = np.argmax(u, axis=0)
        fcm_results.append(fcm_labels.reshape(img.shape))

**Save the cluster images**

applys the binary masks generated by clustering in order to create 4 cluster images per image in dataset.

These clusted images are saved to the output directory in "FCM_Timestamp" format

In [12]:
def save_individual_clusters(image, cluster_labels, num_clusters, label, image_index, base_path):
    for i in range(num_clusters):
        single_cluster_image = np.zeros_like(image, dtype=np.float32)
        single_cluster_image[cluster_labels == i] = 255
        
        # Save the single cluster image
        cluster_folder = os.path.join(base_path, label)
        if not os.path.exists(cluster_folder):
            os.makedirs(cluster_folder)
        
        file_name = f"image{image_index}_cluster{i + 1}.png"
        file_path = os.path.join(cluster_folder, file_name)
        
        cv2.imwrite(file_path, single_cluster_image)

In [13]:
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
base_path = os.path.join(path_output, f"FCM_{timestamp}")

global_image_index = 0  # Global image index for position in the dictionary
for label, images in resized_dataset.items():
    local_image_index = 1  # Local image index for naming, resets for each folder
    for img in images:
        fcm_labels = fcm_results[global_image_index]
        save_individual_clusters(img, fcm_labels, num_clusters, label, local_image_index, base_path)
        global_image_index += 1
        local_image_index += 1
print(f"All images were successfully saved in the directory: '{base_path}'")

All images were successfully saved in the directory: '../Data Sets/Clustered Datasets\FCM_20230518_013415'
