# Clustering Experiment 
#### By: Charbel Marche

We decided to individually tackle the problem using 1 method, and by the time we are all done we will be able to merge our techniques and select the optimal techinque. Currently we are determining only the indiviudal digits, but we need to recognize these as coherent numbers and be able to assign entries to numbers.

### Register Images to Start

To start, we need to register images using the `utilities/conversion/apply_homography_to_labels.ipynb` notebook. This should be run before running this notebook. This notebook is built on the assumption that the `data/registered_images` directory has been created and populated. Additionally it assumes that the `data/yolo_data.json` file is created. Both of these are created in the referenced notebook. 

#### Install Packages

This will be added to as I develop.

In [2]:
import os
import json
import random
from pathlib import Path
from typing import List

import cv2
from PIL import Image


#### Start By Loading YOLO Data

To start I want to bring in the YOLO formatted data for each sheet and I can additionally load the respective images.

In [3]:
# Load yolo_data.json
PATH_TO_YOLO_DATA = '../../data/yolo_data.json'
PATH_TO_REGISTERED_IMAGES = '../../data/registered_images'
UNIFIED_IMAGE_PATH = '../../data/unified_intraoperative_preoperative_flowsheet_v1_1_front.png'
with open(PATH_TO_YOLO_DATA) as json_file:
    yolo_data = json.load(json_file)

print(f"Found {len(yolo_data)} sheets in yolo_data.json")

Found 19 sheets in yolo_data.json


Now let's select relevant bounding boxes from the blood pressure and HR zone. 

Start by defining functions to convert YOLO bounding box format to pixels (to see if the bounding box is within region of interest). Then create a function that allows you to select ROI and returns a list of bounding boxes within this ROI.

In [4]:
def YOLO_to_pixels(x_center, y_center, width, height, image_width, image_height):
    """
    Convert YOLO bounding box format to pixel coordinates

    Args:
        x_center: float, x center of the bounding box
        y_center: float, y center of the bounding box
        width: float, width of the bounding box
        height: float, height of the bounding box
        image_width: int, width of the image
        image_height: int, height of the image

    Returns:
        A single tuple with the following values:
            x_min: int, minimum x coordinate of the bounding box in pixels
            y_min: int, minimum y coordinate of the bounding box in pixels
            x_max: int, maximum x coordinate of the bounding box in pixels
            y_max: int, maximum y coordinate of the bounding box in pixels
    """
    x_min = int((float(x_center) * image_width) - (width * image_width) / 2)
    y_min = int((float(y_center) * image_height) - (height * image_height) / 2)
    x_max = int((float(x_center) * image_width) + (width * image_width) / 2)
    y_max = int((float(y_center) * image_height) + (height * image_height) / 2)
    return x_min, y_min, x_max, y_max

def select_relevant_bounding_boxes(sheet_data: List[str], path_to_image: Path) -> List[str]:
  """
  Given sheet data for bounding boxes in YOLO format, display the image and allow the user to select a region of interest (ROI).
    Identify bounding boxes that are within the selected region and draw rectangles around them. 
    Return the bounding boxes that are within the selected region.

  Args:
      sheet_data: List of bounding boxes in YOLO format.
      path_to_image: Path to the image file.

  Returns:
      Bounding boxes that are within the selected region, in YOLO format.
  """
  # Load the image
  image = cv2.imread(path_to_image)

  # Display the image and allow the user to select a ROI
  resized_image = cv2.resize(image, (800, 600))
  roi = cv2.selectROI("Select ROI", resized_image)
  print(f"ROI selected: {roi}")

  # The function returns a tuple (x, y, width, height)
  x, y, w, h = roi
  print(f"Selected region: x={x}, y={y}, w={w}, h={h}")

  # Draw a rectangle around the selected region
  cv2.rectangle(img=resized_image, pt1=(x, y), pt2=(x + w, y + h), color=(0, 255, 0), thickness=1)

  # Close all OpenCV windows
  cv2.destroyAllWindows()

  # List of bounding boxes that are within the selected region
  bounding_boxes_within_region = []

  # Identify bounding boxes that are within the selected region
  for bounding_box in sheet_data:
    # Bounding boxes are in YOLO format, let's convert to pixels and see if they are within the selected region
    identifier, x_center, y_center, bb_width, bb_height = bounding_box.split(' ')
    x_min, y_min, x_max, y_max  = YOLO_to_pixels(float(x_center), float(y_center), float(bb_width), float(bb_height), 800, 600)
    if x_min >= x and y_min >= y and x_max <= x + w and y_max <= y + h:
      print(f"Bounding box {bounding_box} is within the selected region")

      # Generate a random color for the bounding box in (0, 255, 0) scalar format
      generate_color = lambda: (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))

      # Draw a rectangle around the bounding box
      cv2.rectangle(img=resized_image, pt1=(x_min, y_min), pt2=(x_max, y_max), color=generate_color(), thickness=1)

      # Append the bounding box to the list of bounding boxes within the selected region
      bounding_boxes_within_region.append(bounding_box)

  # Display the image with the selected region and bounding boxes
  resized_image = cv2.cvtColor(resized_image, cv2.COLOR_BGR2RGB)
  resized_image = Image.fromarray(resized_image)
  resized_image.show()

  return bounding_boxes_within_region


Now lets use these functions to get the relevant bounding boxes for clustering.

In [5]:
# Iterate over all images
for sheet, bounding_boxes in yolo_data.items():
    print(f"Sheet: {sheet}")
    full_image_path = os.path.join(PATH_TO_REGISTERED_IMAGES, sheet)
    print(f"Full image path: {full_image_path}")

    # Call the analyze_sheet function with data from the loop
    relevant_bounding_boxes = select_relevant_bounding_boxes(bounding_boxes, full_image_path)

    # Now we need to cluster the bounding boxes that pertain to the same multi-digit number
    # For now, we will just print the relevant bounding boxes
    print(f"Relevant bounding boxes: {relevant_bounding_boxes}")

    # Break after the first image
    break

Sheet: RC_0001_intraoperative.JPG
Full image path: ../../data/registered_images\RC_0001_intraoperative.JPG
ROI selected: (103, 220, 632, 203)
Selected region: x=103, y=220, w=632, h=203
Bounding box 5 0.9098491136955492 0.38141891180300247 0.0047951438210227515 0.010082098268995088 is within the selected region
Bounding box 2 0.13794031316583807 0.39902471804151346 0.00507585005326705 0.010464657054227944 is within the selected region
Bounding box 2 0.1434342216722893 0.39900106991038603 0.005159921357125952 0.010430668849571112 is within the selected region
Bounding box 0 0.14874451145981296 0.3989715576171875 0.004878808223839959 0.010312284581801445 is within the selected region
Bounding box 2 0.13839600996537643 0.41458115521599265 0.005050483472419515 0.010243135340073484 is within the selected region
Bounding box 1 0.14340712576201467 0.4147241689644608 0.004516906738281257 0.010044136795343106 is within the selected region
Bounding box 0 0.1484095810398911 0.4144839298023897 0.0