## <center><b>Western University</b></center>
## <center><b>Faculty of Engineering</b></center>
## <center><b>Department of Electrical and Computer Engineering</b></center>

# <center><b>AISE 3350A FW24: Cyber-Physical Systems Theory</b></center>
# <center><b>Group 13 - Project</b></center>


Students:
- Jahangir (Janik) Abdullayev (251283871)
- Richard Augustine (251275608)
- Matthew Linders (251296414)
- Xander Chin  (251314531)
- Joseph Kim (251283383)


# Introduction

&nbsp;&nbsp;&nbsp;&nbsp;As cyber-physical systems become increasingly prevalent in the world, sensors have had to become more complex as well. This has resulted in inspection through the use of computer vision, which is an application of artificial intelligence that is used to interpret visual data like images and videos. Computer vision can be indispensable in many different areas. For instance, in civil engineering computer vision has many uses for structural health monitoring [[1]](#bib), like the process of using sensing technology to evaluate the structural integrity and changing conditions of existing structures over time. Using computer vision, structural health monitoring can be used to detect missing components such as bolts and deterioration that appears visually, with more accuracy and cheaper labour costs than a human.

&nbsp;&nbsp;&nbsp;&nbsp;However, computer vision is a challenging solution to implement. Success varies greatly based on the quality of the video or image given to the system. Computer vision software may be able to identify an object perfectly in some scenarios, but if the object is rotated or partially occluded, or the colours are darker or desaturated, the software may struggle. In the real world, this makes computer vision quite complicated, as real objects very rarely appear consistent with each other to the extent that a basic computer vision model may expect. Computer vision for counting is a valuable application in the industry as it enables accurate, automated inventory management, reducing the time, cost, and errors associated with manual counting. Its scalability and adaptability make it ideal for diverse use cases, from retail stock tracking to industrial supply chain optimization.

&nbsp;&nbsp;&nbsp;&nbsp;Through this assignment, these challenges are explored more thoroughly in a physical example. This project involved developing a computer vision application to count M&M candies by addressing challenges such as object overlap, inconsistent lighting, and varied appearances. The implementation utilized the FastSAM [[2]](#bib), [[3]](#bib) model, a lightweight and efficient variant of the Segment Anything Model (SAM) [[4]](#bib), chosen for its zero-shot segmentation capabilities. FastSAM allowed the system to accurately segment M&M candies without requiring extensive training data, making it well-suited for real-world variability.

# Methodology


## Assumptions

It was assumed that the M&Ms presented would be roughly circular and come in a pre-defined colour which simplifies the classification process. As well, it was assumed that the image quality used for testing would come with sufficient resolution.

## Implementation
The application was developed from scratch in python 3.12.1 [[5]](#bib) and utilized the FastSAM [[2]](#bib) model for the identification and classification of M&Ms, which was then presented to the user using a GUI made using tkinter [[6]](#bib). The application code can be found in [[7]](#bib).
For purposes of running the code, a user would need to install the required dependencies listed in the first code cell, this could be done through pip using:

```pip install transformers opencv-python matplotlib```

On top of this, a user would need to download the FastSAM model locally from [[8]](#bib) and place the downloaded FastSAM-x.py file alongside this Jupyter notebook file.

## Code Breakdown

The following code cell imports the necessary dependencies outlined earlier.

In [None]:
# Code dependencies

# Fixes version mismatch when using ultralytics.yolo
%pip install ultralytics==8.0.100

# For CV
import cv2
import matplotlib.pyplot as plt
from fastsam import FastSAM, FastSAMPrompt
import numpy as np

# Also need the FastSAM model which is downloaded from google drive:
# https://drive.google.com/file/d/1m1sjY4ihXBU1fZXdQ-Xdj-mDltW-2Rqv/view
# Place the downloaded FastSAM-x.py file alongside this jupyter notebook file

# For GUI
import tkinter as tk
from tkinter import filedialog
from tkinter import ttk
from PIL import Image, ImageTk

%matplotlib inline

The following helper functions are designed to assist in the image processing and colour analysis tasks.

The apply_mask function creates a mask from a set of coordinates and uses it to isolate a specific region in an image. The mask is applied to the original image, setting the pixels outside the masked area to zero, thereby enhancing the region of interest for further analysis.

The `check_circularity` function calculates the circularity of a contour by assessing its area and perimeter, then fitting an ellipse to the contour to determine its shape. The function combines these metrics to provide a score that reflects how circular the contour is. This is important for identifying round objects like M&Ms in images, as it helps distinguish them from other shapes.

The `get_average_color` function computes the average colour of an image by filtering out non-black pixels and calculating their mean RGB values. This average colour is then used for classification, allowing the system to identify and distinguish different coloured M&Ms. If no discernible colour is found, the function returns [0, 0, 0].
The `classify_color` function matches an RGB value to a predefined set of colours by calculating the Euclidean distance between the given RGB and reference colours. The colour with the smallest distance is selected, providing a classification of the colour present in the image.

In [None]:
# Helper functions

# Applies the mask to passed image
def apply_mask(image, xy_array):
    # Create empty mask of same size as image
    mask = np.zeros(image.shape[:2], dtype=np.uint8)
    
    # Fill polygon defined by xy coordinates with ones
    cv2.fillPoly(mask, [xy_array], 1)
    
    # Apply mask to image
    masked_image = image.copy()
    masked_image[mask == 0] = 0
    
    return masked_image

# Checks how circular the passed contour is
def check_circularity(contour):
    # Calculate area and perimeter
    area = cv2.contourArea(contour)
    perimeter = cv2.arcLength(contour, True)
    
    # Circularity using isoperimetric inequality
    circularity = 4 * np.pi * area / (perimeter * perimeter)
    
    # Fit an ellipse and check ratio of axes
    if len(contour) >= 5:  # Need at least 5 points to fit ellipse
        ellipse = cv2.fitEllipse(contour)
        major_axis = max(ellipse[1])
        minor_axis = min(ellipse[1])
        axis_ratio = minor_axis / major_axis
    else:
        axis_ratio = 0
    
    # Combine metrics (weight them equally)
    final_score = (circularity + axis_ratio) / 2
    
    return final_score

# Returns average colour of the passed image
def get_average_color(img):
    pixels = np.array(img)
    
    # Create mask for non-black pixels (where not all RGB values are 0)
    non_black_mask = ~np.all(pixels == 0, axis=2)
    
    # Only consider non-black pixels for average
    valid_pixels = pixels[non_black_mask]
    
    # Return average of valid pixels, or [0,0,0] if all pixels were black
    if len(valid_pixels) > 0:
        avg_rgb = np.round(valid_pixels.mean(axis=0)).astype(int)
        return avg_rgb
    return np.array([0, 0, 0])

# Predefined colors
def classify_color(rgb):
    color_dict = {
        'Red': [206, 38, 38],
        'Orange': [255, 120, 0],
        'Yellow': [255, 255, 0],
        'Green': [0, 204, 0],
        'Blue': [51, 153, 255],
        'Brown': [70, 5, 5],
        'White': [255, 255, 255]
    }
    
    distances = {
        color: np.sqrt(sum((rgb - np.array(ref_rgb))**2))
        for color, ref_rgb in color_dict.items()
    }
    
    return min(distances.items(), key=lambda x: x[1])[0]

The `processImage` function is responsible for processing an image to count different coloured M&Ms using FastSAM. Initially, the function clears the GUI output textbox and notifies the user that image processing has started. If no image path is provided, it alerts the user and terminates the function. If a valid image path is given, the function reads and displays the image using OpenCV (`cv2`) and Matplotlib (`plt`). Next, the FastSAM model is loaded with specified parameters (device, retina masks, image size, confidence, IoU) and applied to the image. The function then initializes a FastSAMPrompt, executes the `everything_prompt()` method for image analysis, and stores the detection results. The main role of this function is to seamlessly integrate image loading, object detection, and visualization, allowing for efficient counting of M&Ms by colour.

After processing the image with FastSAM, the `processImage` function continues by evaluating the circularity of each detected mask. Using a predefined circularity threshold (`CIRCULAR_THRESHOLD = 0.75`), it filters out masks that do not resemble M&Ms. For qualifying masks, a binary mask is created from the contour, and this mask is applied to the original image to extract the region of interest. The average colour of this region is computed using the `get_average_color` function, and the colour is classified using the `classify_color` function. Each classified colour is counted and stored in a dictionary that tracks the frequency of each M&M colour detected. The function then updates the GUI output box with these counts, providing a visual summary of the M&M distribution by colour. This process is essential for accurately visualizing and validating object detection in the context of real-world applications.

In [None]:
# Main image processing function
# Takes the path to the image on the machine along with the reference to the GUI output textbox
def processImage(img_url, output_text):

    # Handle the case the image path is None
    if img_url == None:
        # Informs user processing has begun
        output_text.delete("1.0", tk.END)  # Clear previous text
        output_text.insert(tk.END, "No image loaded")
        print("No image loaded")
        return

    # Informs user processing has begun
    output_text.delete("1.0", tk.END)  # Clear previous text
    output_text.insert(tk.END, "Processing...")
    
    # Load and the image in the terminal
    raw_image = cv2.cvtColor(cv2.imread(img_url), cv2.COLOR_BGR2RGB)
    plt.imshow(raw_image)
    plt.axis("off")
    plt.show()

    # Load the fastSAM
    modelSAM = FastSAM("FastSAM-x.pt")

    # Stores results provided the passed settings
    everything_results = modelSAM(
        img_url,
        device="cpu",
        retina_masks=True,
        imgsz=384,
        conf=0.3,
        iou=0.9,
    )
    prompt_process = FastSAMPrompt(img_url, everything_results, device="cpu")

    # Everything prompt
    prompt_process.everything_prompt()

    num_of_masks = len(everything_results[0])
    print(num_of_masks)

    # Display images with matplotlib
    fig, axes = plt.subplots(nrows=int(np.ceil(num_of_masks / 6)), ncols=6, figsize=(10, 5))

    # Flatten the axes array for easy iteration
    axes = axes.flatten()

    final_dict = {
        "Red": 0,
        "Orange": 0,
        "Yellow": 0,
        "Green": 0,
        "Blue": 0,
        "Brown": 0,
        "White": 0,
    }
    for index, r in enumerate(everything_results[0]):
        maskCoords = (r.masks.xy)[0]
        xy_array = np.array(maskCoords)

        CIRCULAR_THRESHOLD = 0.75
        
        # Checks if the mask is circular enough
        if(check_circularity(xy_array) > CIRCULAR_THRESHOLD):   
            contour = xy_array.reshape((-1, 1, 2)).astype(np.int32)
            # Create binary mask from contour
            mask = np.zeros(raw_image.shape[:2], dtype=np.uint8)
            cv2.fillPoly(mask, [contour], 255)

            # Apply mask to image
            masked_image = cv2.bitwise_and(raw_image, raw_image, mask=mask)

            # Get bounding box just to determine region of interest
            x, y, w, h = cv2.boundingRect(contour)
            result_image = masked_image[y:y+h, x:x+w]

            # Get average RGB and classify it as a color
            avg_rgb = get_average_color(result_image)
            color_category = classify_color(avg_rgb)
            final_dict[color_category] += 1

            ax = axes[index]
            ax.axis("off")
            ax.imshow(result_image)

    print(final_dict)

    # Display results in the GUI output box
    output_text.delete("1.0", tk.END)  # Clear previous text
    output_text.insert(tk.END, final_dict)

    plt.tight_layout()
    plt.show()

The `uploadImage` function allows the user to select an image file through a file dialog, supporting various formats such as PNG, JPG, and BMP. Upon selection, the function resizes the image for display, updates the GUI with the image preview, and provides details about the file path, size, and format in the output textbox.

In [None]:
# Function to handle uploading the image
def uploadImage():
    global file_path
    file_path = filedialog.askopenfilename(
        filetypes=[("Image Files", "*.png;*.jpg;*.jpeg;*.bmp;*.gif")]
    )
    if file_path:
        img = Image.open(file_path)
        img.thumbnail((300, 300))  # Resize the image to fit in the window
        img_tk = ImageTk.PhotoImage(img)
        image_label.config(image=img_tk)
        image_label.image = img_tk
        file_path_label.config(text=f"File: {file_path}")

        # Display some information in the text box
        output_text.delete("1.0", tk.END)  # Clear previous text
        output_text.insert(tk.END, f"File Path: {file_path}\n")
        output_text.insert(tk.END, f"Image Size: {img.size}\n")
        output_text.insert(tk.END, f"Image Format: {img.format}\n")

The main script initializes a graphical user interface (GUI) for image uploading, processing, and information display. Using the `tkinter` library, it creates a main window with a title, specified dimensions, and various interactive elements. Buttons for uploading an image and processing it trigger the `uploadImage` and `processImage` functions, respectively, providing core functionality for file handling and analysis. The GUI includes an image display area, a label to show the selected file path, and a text box to display metadata or analysis results. By packing these components with appropriate layouts and functionality, the script creates an intuitive interface for interacting with images and viewing their processed outputs.

In [None]:
# Main script for GUI

# Initialize the main window
root = tk.Tk()
root.title("Image and Info Display GUI")
root.geometry("400x600")

file_path = None

# Upload image button
upload_button = ttk.Button(
    root, text="Upload Image", command=uploadImage
)
upload_button.pack(pady=10)

# Upload process button
upload_button = ttk.Button(
    root, text="Process Image", command=lambda: processImage(file_path, output_text)
)
upload_button.pack(pady=10)

# Image display label
image_label = tk.Label(root)
image_label.pack(pady=10)

# File path label
file_path_label = tk.Label(root, text="No file selected", wraplength=300)
file_path_label.pack()

# Text box for output information
output_text = tk.Text(root, height=10, width=40, state=tk.NORMAL)
output_text.pack(pady=10)

# Run the main loop
root.mainloop()

# Results

In order to use the developed code, a user must first download the dependencies, as explained in the methodology. Once the dependencies are acquired, run the code and the GUI will pop up. When the GUI appears, the user can select the upload image option and select the image file that they wish to select. Once the image has been selected, the user can press the “Process Image” button to execute the code to process the image. The results of the model’s count are printed in the text box as a count of each colour. 

<div style="text-align: center;">
    <img src="GUI_No_Image.png" alt="GUI with no image" width="300"/>
</div>

<div style="text-align: center;">
    <img src="GUI_With_Image.png" alt="GUI with image" width="300"/>
</div>

During testing, the program also outputs an image with all the masks of the objects it found that resembled M&Ms due to their round shape. To measure the effectiveness of the model, this image of selected masks was compared to the original image. Specifically, the values measured were the true number of candies in the image, the number of objects that the model and classified as M&Ms, the number of M&Ms missed, the colour(s) of the candies missed, the quantity of each colour it says it found, the number of colours misidentified, the total number of objects detected. Using these observations, a variety of metrics were derived. These metrics can be split up into M&M identification metrics and colour identification metrics. 

M&M identification metrics: 
-	The number of correct classifications of M&Ms (True positives)
-	The number of non-M&M objects considered as M&Ms (False positives)
-	The number of non-M&M objects identified that are discarded (True Negatives)
-	The number of M&Ms missed (False Negatives)
-	The percentage of M&Ms identified
-	The percentage of objects considered as M&M’s that were correct
-	Precision, Accuracy, Sensitivity, Specificity



Colour identification metrics: 
-	The number of incorrect colour classifications
-	The percentage of correct colour classifications


<div style="text-align: center;">
    <img src="results1.png" alt="results1" width="1080"/>
</div>

<div style="text-align: center;">
    <img src="results2.png" alt="results2" width="820"/>
</div>

These metrics were gathered from 15 unique testing images including varying numbers of M&Ms, varying backgrounds, the addition of extra objects, and different lighting.

In terms of M&M identification, the model does quite well. The model only misses M&Ms in two trials and finds additional non-M&Ms as M&Ms in 5 trials. The missed M&Ms are due to faulty object segmentation where the masks of the M&Ms include too much of the surrounding background to be circular. Most of the extra objects included were perfectly circular, or images of M&Ms on the packaging, which all fit the criteria and given assumption that M&Ms are perfectly circular objects. Only 4 objects thought to be M&Ms were not circular or looked very similar M&Ms. 

This relationship between True positives and false positives can be summarized by the precision score:

$$
\frac{True Positive}{True Positive + False Positive}
$$ 

which this model achieved an average of 96%. Similarly, there is the sensitivity:

$$\frac{True Positive}{True Positive + False Negative}$$ 

with an average of 99%. Additionally, it is important to note that the model segments all objects and the number of true negatives that are disqualifies is far greater than the number of false negatives. This is symbolized by the specificity:

$$\frac{True Negatives}{True Negatives + False Positives}$$ 

with an average of 87% and the Negative predictive value:

$$\frac{True Negatives}{True Negatives + False Negatives}$$ 

with an average of 96%. Finally, all of this is summarized with the accuracy:

$$\frac{True Positives + True Negatives}{True Negatives + True Positives + False Negatives + False Positives}$$ 

which has an average of 96%. Overall, the model is quite good at distinguishing M&Ms from other objects even with a simple criterion.

In contrast, the model does a very poor job determining the colour of the M&M once it has found it. Across the 15 trials, the model correctly identifies the correct colour for an average of 61% of the candies. Typically, the model can accurately identify the red M&Ms, struggles to identify orange, yellow, green, and blue, and disproportionately labels the candies as brown. 

Three main causes of this discrepancy are a basic understanding of colours, the lighting, and the background of the image. First, the orange M&Ms are often categorized as red, which makes sense as those colours are very similar, especially if the orange colour is in darker lighting. In general, when the lighting is darker, almost all colours become like brown, thus explaining how brown is the only colour sensed when the photo is taken in a dark space. Finally, if the background is light, the model makes more colour errors. This seems to be because the reflection off lighter backgrounds causes the photo to be more saturated, shadowed, and contrasted thus making them look darker and more like brown.

# Discussion

### Approach Explanation
FastSAM was selected for its zero-shot segmentation capabilities [[2]](#bib), which allow it to identify objects in an image without requiring pre-trained labels or additional training data. The segmented regions are further processed using circularity checks [[9]](#bib) and colour classification to identify and count M&Ms based on predefined colour categories. Clearly, this decision led to good results since the model produced an average precision of 96%.
### Requirements for Approach
To implement this model, some key components were required. A custom colour classifier was implemented to categorize the objects based on their average RGB values, ensuring precise colour identification. Additionally, a GUI was developed using tkinter to provide a user-friendly interface, allowing users to upload images, process them, and view results interactively. These elements worked together to create an automated system for identifying and classifying M&Ms.
### Implementation Issues
Although the program performs well in most cases, some challenges came up during development. The FastSAM model initially had high sensitivity to overlapping or incomplete masks sometimes resulted in irrelevant or partial regions being segmented. To address this, the circularity threshold was introduced to filter out non-circular shapes. However, this threshold occasionally excluded valid objects, such as slightly deformed or hidden M&Ms. Additionally, while FastSAM's zero-shot segmentation capabilities eliminated the need for training data, its general-purpose design meant that the segmentation output was not always optimized for this specific use case.

Another issue was regarding color classification. Predefined RGB values worked effectively for clear and distinct colors, but shades that closely resembled multiple categories (e.g., dark red versus brown) led to occasional misclassifications. Variations in lighting or image quality further complicated color identification. A potential improvement could involve incorporating CMYK color space conversion, which separates color components (cyan, magenta, yellow) from the brightness and shading component (black). This could allow the algorithm to focus on pure color information, reducing the impact of lighting variations. Additionally, refining classification thresholds or training a custom color classification model could further enhance accuracy in challenging conditions.

### Potential Societal/Economic Impact
This project demonstrates significant societal and economic potential. For example, in the food industry, this system could automate quality control processes, ensuring that items are correctly sorted and packaged by color. This automation would improve efficiency and reduce labor costs, benefiting manufacturers and consumers alike. Additionally, the approach could be applied to other industries, such as recycling or manufacturing, where accurate object identification and sorting are crucial.
On the societal side, the automation of such processes raises concerns about job displacement in industries dependent on manual sorting. Furthermore, The FastSAM's ease of use and reliance on general-purpose segmentation reduce the need for extensive training data, lowering entry barriers for deploying similar solutions across various applications. By balancing these benefits and challenges, this project has transformative potential of AI in automation and efficiency.


# Conclusion

This project helped explore the application of a computer vision model in counting M&M candies, displaying applications of machine learning in the real world. By using the FastSAM model and integrating it to detect for circularity and colour, the model was able to achieve a high level of precision in terms of detecting M&Ms with zero shot segmentation. 

The results have shown that while the detection of M&Ms was largely successful, with a rare issue being the circularity detection integrated in the FastSAM model producing false negatives. This demonstrated the flawed aspect of circularity detection as it could not detect M&Ms that were occluded or not perfectly circular.

Additionally, colour detection was a significant and reoccurring issue with this program. This provided a learning opportunity in the importance of lighting and background colour control for future evaluations. Therefore, future models will aim to disregard the effects of lighting, background and colour similarities through incorporating CMYK colour space conversion.

In conclusion, this project helped teach the obstacles faced by the machine learning industry in integrating image recognition technology to real world applications. While this model was not perfect it was largely successful in creating a scalable, efficient and, cost-effective method of object detection that can be used in various industries. 

# <a id="bib">Bibliography</a>

[1]	Z. Peng, J. Li, H. Hao, and Y. Zhong, “Smart structural health monitoring using computer vision and edge computing,” Engineering Structures, vol. 319, p. 118809, Nov. 2024, doi: [10.1016/j.engstruct.2024.118809](https://doi.org/10.1016/j.engstruct.2024.118809).

[2]	Ultralytics, “FastSAM (Fast Segment Anything Model).” Accessed: Dec. 19, 2024. [Online]. Available: https://docs.ultralytics.com/models/fast-sam

[3]	CASIA-IVA-Lab/FastSAM. (Dec. 19, 2024). Python. CASIA-IVA-Lab. Accessed: Dec. 19, 2024. [Online]. Available: https://github.com/CASIA-IVA-Lab/FastSAM

[4]	Ultralytics, “SAM (Segment Anything Model).” Accessed: Dec. 19, 2024. [Online]. Available: https://docs.ultralytics.com/models/sam

[5]	“Python Release Python 3.12.1,” Python.org. Accessed: Dec. 19, 2024. [Online]. Available: https://www.python.org/downloads/release/python-3121/

[6]	“tkinter — Python interface to Tcl/Tk,” Python documentation. Accessed: Dec. 19, 2024. [Online]. Available: https://docs.python.org/3/library/tkinter.html

[7]	janik, JanikThePanic/AISE3350-project. (Dec. 20, 2024). Jupyter Notebook. Accessed: Dec. 19, 2024. [Online]. Available: https://github.com/JanikThePanic/AISE3350-project

[8]	“FastSAM-x.pt - Google Drive.” Accessed: Dec. 19, 2024. [Online]. Available: https://drive.google.com/file/d/1m1sjY4ihXBU1fZXdQ-Xdj-mDltW-2Rqv/view

[9]	“Roundness,” Wikipedia. Oct. 03, 2024. Accessed: Dec. 19, 2024. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Roundness&oldid=1249101195