## Exercise 5: #YOLO ALMA


**Warning: This problem will take a variable amount of time to setup depending on the time of day!**

In this exercise, we'll try to classify everyday objects from the
[Alma webcam](https://illinois.edu/about/almacam.html) using the [You Only Look Once (YOLO) v11](https://docs.ultralytics.com/yolov11/).

(Fun fact, there's another web camera on the quad called "[the Quadcam](https://illinois.edu/about/quadcam.html)", but the resolution is problematic with this algorithm.)

Prior to beginning this problem, please make sure that a **GPU is enabled** by going to:

```
Runtime -> Change runtime type -> Hardware Accelerator -> GPU
```

Next, please run the following setup code to setup the environment for predicting with YOLO v11. In-depth instructions follow immediately after the setup code.

## Setup Code

Please run each code chunk in order. Failure to do so may result in issues when trying to detect an image with YOLO v11.

### Install Dependencies

The YOLO model is very popular and there are a variety of different implementations. The implementation we'll use is [highly maintained](https://github.com/ultralytics/ultralytics) and [available on PyPI](https://pypi.org/project/ultralytics/).

In [None]:
# Suppress output
%%capture

# 1. Install required dependencies from PyPi
!pip install ultralytics

### Initialize Trained Model

Instead of training a model to fit, we'll use a pretrained version of YOLOv11 on the [COCO data set](https://cocodataset.org/#home) with the following [possible classes](https://github.com/ultralytics/ultralytics/blob/9f4eb2491b6a25abaae29b74da6698cefe8dc662/ultralytics/cfg/datasets/coco.yaml) available to use to classify objects.

In [None]:
# 2. Retrieve and load the pretrained model
from ultralytics import YOLO

# Load the YOLO11 model
model = YOLO("yolo11n.pt")

### Model Predictions

Consider the following image from a UIUC blog:

In [1]:
from IPython.display import Image
Image(url='https://blogs.illinois.edu/files/6231/545166/117021.jpg', width=500)

The YOLOv11 algorithm could be used by:

In [None]:
# Importing cv2 (OpenCV [Computer Vision] library)
import cv2

# Patch the cv2.imshow() function (for Google Colab only)
from google.colab.patches import cv2_imshow

# Predict on an image
results = model('https://blogs.illinois.edu/files/6231/545166/117021.jpg', iou = 0.7, conf = 0.25)

Visually, we can see the predictions alongside of the bounding boxes by graphing the resulting frame.

In [None]:
for result in results:
  # This automatically draws bounding boxes, labels, and confidence scores
  annotated_frame = result.plot()

  # Display bounding (required fix for Colab)
  # will error if run locally
  cv2_imshow(annotated_frame)

For our prediction statement, we're receiving the `results` object that contains the following

- `boxes.xyxy`: Bounding box regions of where the object detected lies
  - e.g. blue boxes in the above image
- `boxes.cls`: Class identifiers of objects detected in the above image.
  - e.g. `0 -> "person"`, `26 -> "umbrella"`
- `boxes.conf`: Probabilities of the different classes the model was trained for.
  - e.g. the blue numeric values in the above image (e.g. bottom-left is 0.30)
- `names`: All possible labels of objects
  - e.g. the names (e.g. `'person'`, `'bicycle'`, `'car'`) as a value and the id (e.g. `0`, `1`, `2`) as a key in a dictionary.

From the above variables, we are only interested in the `boxes.cls` and `names` variables. In particular, we want to know how many items were detected in the image via `boxes.cls` and the class that was detected, `names`.

In [None]:
print("Bounding Box Coordinates (xmin, ymin, xmax, ymax)")
print(results[0].boxes.xyxy)

In [None]:
print("Classes detected")
print(results[0].boxes.cls)

In [None]:
print("List of classes possible")
print(results[0].names)

## (a) Organizing Data

For this exercise, we're looking to create a Pandas data frame that translates the predictions from being stored in an object with multiple dictionaries and lists into a single data frame. For simplicity, we'll use example image and its `results` object.

We would like the data frame to have the following structure:

| classid | classname|   xmin |   ymin |   xmax |   ymax |
|--------:|:---------|-------:|-------:|-------:|-------:|
|       0 | person   |    582 |    643 |    642 |    786 |
|       0 | person   |    651 |    580 |    713 |    763 |
|       0 | person   |    376 |    560 |    438 |    720 |
|      25 | umbrella |    209 |    442 |    318 |    468 |
|       0 | person   |     73 |    641 |    151 |    785 |

We can retrieve the necessary items from `results` with:

- `results[0].names`: dictionary of _possible_ class names
- `results[0].boxes.cls`: predicted class IDs from the image
- `results[0].boxes.xyxy`: predicted bounding box coordinates where:
  - `results[0].boxes.xyxy[0]`: predicted `xmin` values
  - `results[0].boxes.xyxy[1]`: predicted `ymin` values
  - `results[0].boxes.xyxy[2]`: predicted `xmax` values
  - `results[0].boxes.xyxy[3]`: predicted `ymax` values

**Hint:** You may wish to use a list comprehension or a `for` loop to cycle the predicted object data contained in the results object.

**Hint:** You may wish to also explicitly cast each value as an integer. You will want to explicitly cast `classid` as an integer before attempting to subset from the data frame, e.g. `results[0].names[int(classid)]`.

In [None]:
import pandas as pd

def organized_yolo_df(results):
  # Implement logic here
  pass


df = organized_yolo_df(results)
df

## (b) Detecting Objects

In this exercise, we're interested in detecting within images from the Alma mater webcam these object classes:

- person
- bicycle
- backpack
- handbag
- cell phone

Please use the spelling given above as it matches with classes recognized by the YoloV11 network (see [coco.yaml](https://github.com/ultralytics/ultralytics/blob/9f4eb2491b6a25abaae29b74da6698cefe8dc662/ultralytics/cfg/datasets/coco.yaml) for details.)

Obtain the images from:

 - https://coatless.github.io/alma-cam/alma-cam-1.png
 - https://coatless.github.io/alma-cam/alma-cam-2.png
 - https://coatless.github.io/alma-cam/alma-cam-3.png
 - https://coatless.github.io/alma-cam/alma-cam-4.png
 - https://coatless.github.io/alma-cam/alma-cam-5.png
 - https://coatless.github.io/alma-cam/alma-cam-6.png
 - https://coatless.github.io/alma-cam/alma-cam-7.png
 - https://coatless.github.io/alma-cam/alma-cam-8.png
 - https://coatless.github.io/alma-cam/alma-cam-9.png
 - https://coatless.github.io/alma-cam/alma-cam-10.png

So, the first image can be retrieved with:

```
https://coatless.github.io/alma-cam/alma-cam-1.png
```

Dynamically construct a Pandas dataframe that contains the ImageID and a count of each object under the given class. e.g.

| ImageID    | person | bicycle | backpack | handbag | cell phone   |
|:-----------|--------|---------|----------|---------|--------------|
| example-1  | 0      |     1   | 1        | 0       |    0         |



For all of our URLs, under the default settings of `0.7` for `iou` and `0.25` for `conf`, we would expect to have:

| ImageID     |   person |   bicycle |   backpack |   handbag |   cell phone |
|:------------|---------:|----------:|-----------:|----------:|-------------:|
| alma-cam-1  |       12 |         0 |          0 |         0 |            0 |
| alma-cam-2  |        5 |         0 |          0 |         0 |            0 |
| alma-cam-3  |        3 |         0 |          0 |         0 |            0 |
| alma-cam-4  |        5 |         0 |          0 |         0 |            0 |
| alma-cam-5  |        9 |         0 |          0 |         0 |            0 |
| alma-cam-6  |       11 |         0 |          0 |         0 |            0 |
| alma-cam-7  |       17 |         0 |          0 |         0 |            0 |
| alma-cam-8  |       12 |         0 |          0 |         0 |            0 |
| alma-cam-9  |        9 |         0 |          0 |         0 |            0 |
| alma-cam-10 |       11 |         0 |          0 |         0 |            0 |

To help in this endeavor, we suggest using the function designed in part **(a)** to isolate the `classnames` for each prediction.

In [None]:
import pandas as pd
from pathlib import Path

# Generate a list of problem URLs
url_base = "https://coatless.github.io/alma-cam/"
urls = [f"{url_base}/alma-cam-{i}.png" for i in range(1, 11)]

# This makes urls have between 1 and 10:
# https://coatless.github.io/alma-cam/alma-cam-1.png

# List of classes
# Important: These names _must_ match with what is specified in coco.yaml.
classes_to_count = ['person', 'bicycle', 'backpack', 'handbag', 'cell phone']

## code here
def yolo_classifications(urls, classes_to_count):
  # Pre-populate a data frame
  cols = ['ImageID'] + classes_to_count
  df = pd.DataFrame(columns = cols)

  # Process each image
  for i, url in enumerate(urls):
    # Extract image ID from URL (e.g., "alma-cam-1")
    image_id = Path(url).stem

    print(f"Processing {image_id}...")

    ## Implement logic here ##

    # 1. Make prediction on the image
    # results = ???? #

    # 2. Get the organized DataFrame for this image's detections
    # detection_df = organized_yolo_df(results)

    # 3. Initialize counts for all classes

    # 4. Count occurrences of each class of interest

    # 5. Create row data for this image

    # 6. Add to the main dataframe

  pass



## (c) Upload and run your own image!

In this exercise, you will repeat the prior exercise but instead you will select the classes that should be detected from an image you supply.

View all possible classes by exploring the `results[0].names` possible class name list.


In [None]:
# Upload an image file
from google.colab import files
uploaded_image = files.upload()

Running the next line of code, should embed the uploaded image inside of the notebook.

In [None]:
from IPython.display import Image
# Display the embedded image in the notebook.
Image(uploaded_image[0], width=100)

Apply the functions developed in **(a)** and **(b)**

In [None]:
# Modify classes for unique values
classes_to_count = ['person', 'bicycle', 'backpack', 'handbag', 'cell phone']

yolo_classifications(uploaded_image, classes_to_count)