## <center><b>Western University</b></center>
## <center><b>Faculty of Engineering</b></center>
## <center><b>Department of Electrical and Computer Engineering</b></center>

# <center><b>AISE 3350A FW24: Cyber-Physical Systems Theory</b></center>
# <center><b>Group 13 - Project</b></center>


Students:
- Jahangir (Janik) Abdullayev (251283871)
- Richard Augustine (251275608)
- Matthew Linders (251296414)
- Xander Chin  (251314531)
- Joseph Kim (251283383)


# Introduction

&nbsp;&nbsp;&nbsp;&nbsp;As cyber-physical systems become increasingly prevalent in the world, sensors have had to become more complex as well. This has resulted in inspection through the use of computer vision, which is an application of artificial intelligence that is used to interpret visual data like images and videos. Computer vision can be indispensable in many different areas. For instance, in civil engineering computer vision has many uses for structural health monitoring [[1]](#bib), like the process of using sensing technology to evaluate the structural integrity and changing conditions of existing structures over time. Using computer vision, structural health monitoring can be used to detect missing components such as bolts, and deterioration that appears visually, with more accuracy and cheaper labour costs than a human.

&nbsp;&nbsp;&nbsp;&nbsp;However, computer vision is a challenging solution to implement. Success varies greatly based on the quality of the video or image given to the system. Computer vision software may be able to identify an object perfectly in some scenarios, but if the object is rotated or partially occluded, or the colours are darker or desaturated, the software may struggle. In the real world, this makes computer vision quite complicated, as real objects very rarely appear consistent with each other to the extent that a basic computer vision model may expect. Computer vision for counting is a valuable application in the industry as it enables accurate, automated inventory management, reducing the time, cost, and errors associated with manual counting. Its scalability and adaptability make it ideal for diverse use cases, from retail stock tracking to industrial supply chain optimization.

&nbsp;&nbsp;&nbsp;&nbsp;Through this assignment, these challenges are explored more thoroughly in a physical example. This project involved developing a computer vision application to count M&M candies by addressing challenges such as object overlap, inconsistent lighting, and varied appearances. The implementation utilized the FastSAM [[2]](#bib), [[3]](#bib) model, a lightweight and efficient variant of the Segment Anything Model (SAM) [[4]](#bib), chosen for its zero-shot segmentation capabilities. FastSAM allowed the system to accurately segment M&M candies without requiring extensive training data, making it well-suited for real-world variability.

# Methodology

In [None]:
# Code dependencies

# For CV
import cv2
import matplotlib.pyplot as plt
from fastsam import FastSAM, FastSAMPrompt
import numpy as np

# Also need the FastSAM model which is downloaded from google drive:
# https://drive.google.com/file/d/1m1sjY4ihXBU1fZXdQ-Xdj-mDltW-2Rqv/view
# Place the downloaded FastSAM-x.py file alongside this jupyter notebook file

# For GUI
import tkinter as tk
from tkinter import filedialog
from tkinter import ttk
from PIL import Image, ImageTk

%matplotlib inline

In [None]:
# Helper functions

# Applies the mask to passed image
def apply_mask(image, xy_array):
    # Create empty mask of same size as image
    mask = np.zeros(image.shape[:2], dtype=np.uint8)
    
    # Fill polygon defined by xy coordinates with ones
    cv2.fillPoly(mask, [xy_array], 1)
    
    # Apply mask to image
    masked_image = image.copy()
    masked_image[mask == 0] = 0
    
    return masked_image

# Checks how circular the passed contour is
def check_circularity(contour):
    # Calculate area and perimeter
    area = cv2.contourArea(contour)
    perimeter = cv2.arcLength(contour, True)
    
    # Circularity using isoperimetric inequality
    circularity = 4 * np.pi * area / (perimeter * perimeter)
    
    # Fit an ellipse and check ratio of axes
    if len(contour) >= 5:  # Need at least 5 points to fit ellipse
        ellipse = cv2.fitEllipse(contour)
        major_axis = max(ellipse[1])
        minor_axis = min(ellipse[1])
        axis_ratio = minor_axis / major_axis
    else:
        axis_ratio = 0
    
    # Combine metrics (weight them equally)
    final_score = (circularity + axis_ratio) / 2
    
    return final_score

# Returns average colour of the passed image
def get_average_color(img):
    pixels = np.array(img)
    
    # Create mask for non-black pixels (where not all RGB values are 0)
    non_black_mask = ~np.all(pixels == 0, axis=2)
    
    # Only consider non-black pixels for average
    valid_pixels = pixels[non_black_mask]
    
    # Return average of valid pixels, or [0,0,0] if all pixels were black
    if len(valid_pixels) > 0:
        avg_rgb = np.round(valid_pixels.mean(axis=0)).astype(int)
        return avg_rgb
    return np.array([0, 0, 0])

# Predefined colors
def classify_color(rgb):
    color_dict = {
        'Red': [206, 38, 38],
        'Orange': [255, 120, 0],
        'Yellow': [255, 255, 0],
        'Green': [0, 204, 0],
        'Blue': [51, 153, 255],
        'Brown': [70, 5, 5],
        'White': [255, 255, 255]
    }
    
    distances = {
        color: np.sqrt(sum((rgb - np.array(ref_rgb))**2))
        for color, ref_rgb in color_dict.items()
    }
    
    return min(distances.items(), key=lambda x: x[1])[0]

In [None]:
# Main image processing function
# Takes the path to the image on the machine along with the reference to the GUI output textbox
def processImage(img_url, output_text):

    # Handle the case the image path is None
    if img_url == None:
        # Informs user processing has begun
        output_text.delete("1.0", tk.END)  # Clear previous text
        output_text.insert(tk.END, "No image loaded")
        print("No image loaded")
        return

    # Informs user processing has begun
    output_text.delete("1.0", tk.END)  # Clear previous text
    output_text.insert(tk.END, "Processing...")
    
    # Load and the image in the terminal
    raw_image = cv2.cvtColor(cv2.imread(img_url), cv2.COLOR_BGR2RGB)
    plt.imshow(raw_image)
    plt.axis("off")
    plt.show()

    # Load the fastSAM
    modelSAM = FastSAM("FastSAM-x.pt")

    # Stores results provided the passed settings
    everything_results = modelSAM(
        img_url,
        device="cpu",
        retina_masks=True,
        imgsz=384,
        conf=0.3,
        iou=0.9,
    )
    prompt_process = FastSAMPrompt(img_url, everything_results, device="cpu")

    # Everything prompt
    prompt_process.everything_prompt()

    num_of_masks = len(everything_results[0])
    print(num_of_masks)

    # Display images with matplotlib
    fig, axes = plt.subplots(nrows=int(np.ceil(num_of_masks / 6)), ncols=6, figsize=(10, 5))

    # Flatten the axes array for easy iteration
    axes = axes.flatten()

    final_dict = {
        "Red": 0,
        "Orange": 0,
        "Yellow": 0,
        "Green": 0,
        "Blue": 0,
        "Brown": 0,
        "White": 0,
    }
    for index, r in enumerate(everything_results[0]):
        maskCoords = (r.masks.xy)[0]
        xy_array = np.array(maskCoords)

        CIRCULAR_THRESHOLD = 0.75
        
        # Checks if the mask is circular enough
        if(check_circularity(xy_array) > CIRCULAR_THRESHOLD):   
            contour = xy_array.reshape((-1, 1, 2)).astype(np.int32)
            # Create binary mask from contour
            mask = np.zeros(raw_image.shape[:2], dtype=np.uint8)
            cv2.fillPoly(mask, [contour], 255)

            # Apply mask to image
            masked_image = cv2.bitwise_and(raw_image, raw_image, mask=mask)

            # Get bounding box just to determine region of interest
            x, y, w, h = cv2.boundingRect(contour)
            result_image = masked_image[y:y+h, x:x+w]

            # Get average RGB and classify it as a color
            avg_rgb = get_average_color(result_image)
            color_category = classify_color(avg_rgb)
            final_dict[color_category] += 1

            ax = axes[index]
            ax.axis("off")
            ax.imshow(result_image)

    print(final_dict)

    # Display results in the GUI output box
    output_text.delete("1.0", tk.END)  # Clear previous text
    output_text.insert(tk.END, final_dict)

    plt.tight_layout()
    plt.show()

In [None]:
# Function to handle uploading the image
def uploadImage():
    global file_path
    file_path = filedialog.askopenfilename(
        filetypes=[("Image Files", "*.png;*.jpg;*.jpeg;*.bmp;*.gif")]
    )
    if file_path:
        img = Image.open(file_path)
        img.thumbnail((300, 300))  # Resize the image to fit in the window
        img_tk = ImageTk.PhotoImage(img)
        image_label.config(image=img_tk)
        image_label.image = img_tk
        file_path_label.config(text=f"File: {file_path}")

        # Display some information in the text box
        output_text.delete("1.0", tk.END)  # Clear previous text
        output_text.insert(tk.END, f"File Path: {file_path}\n")
        output_text.insert(tk.END, f"Image Size: {img.size}\n")
        output_text.insert(tk.END, f"Image Format: {img.format}\n")

In [None]:
# Main script for GUI

# Initialize the main window
root = tk.Tk()
root.title("Image and Info Display GUI")
root.geometry("400x600")

file_path = None

# Upload image button
upload_button = ttk.Button(
    root, text="Upload Image", command=uploadImage
)
upload_button.pack(pady=10)

# Upload process button
upload_button = ttk.Button(
    root, text="Process Image", command=lambda: processImage(file_path, output_text)
)
upload_button.pack(pady=10)

# Image display label
image_label = tk.Label(root)
image_label.pack(pady=10)

# File path label
file_path_label = tk.Label(root, text="No file selected", wraplength=300)
file_path_label.pack()

# Text box for output information
output_text = tk.Text(root, height=10, width=40, state=tk.NORMAL)
output_text.pack(pady=10)

# Run the main loop
root.mainloop()

# Results

# Discussion

# Conclusion

# <a id="bib">Bibliography</a>

[1]	Z. Peng, J. Li, H. Hao, and Y. Zhong, “Smart structural health monitoring using computer vision and edge computing,” Engineering Structures, vol. 319, p. 118809, Nov. 2024, doi: [10.1016/j.engstruct.2024.118809](https://doi.org/10.1016/j.engstruct.2024.118809).

[2]	Ultralytics, “FastSAM (Fast Segment Anything Model).” Accessed: Dec. 19, 2024. [Online]. Available: https://docs.ultralytics.com/models/fast-sam

[3]	CASIA-IVA-Lab/FastSAM. (Dec. 19, 2024). Python. CASIA-IVA-Lab. Accessed: Dec. 19, 2024. [Online]. Available: https://github.com/CASIA-IVA-Lab/FastSAM

[4]	Ultralytics, “SAM (Segment Anything Model).” Accessed: Dec. 19, 2024. [Online]. Available: https://docs.ultralytics.com/models/sam