<p>
  <b>AI Lab: Deep Learning for Computer Vision</b><br>
  <b><a href="https://www.wqu.edu/">WorldQuant University</a></b>
</p>

<div class="alert alert-danger" role="alert">
  <p>
    <center><b>Usage Guidelines</b></center>
  </p>
  <p>
    This notebook can only be used on the WorldQuant University platform. It is not licensed for personal use or for use on any other platform.
  </p>
  <p>
    You <b>cannot</b>:
    <ul>
      <li><span style="color: red">✗</span> Download this notebook</li>
      <li><span style="color: red">✗</span> Show this notebook to friends or colleagues</li>
      <li><span style="color: red">✗</span> Post this notebook in public or private repositories</li>
      <li><span style="color: red">✗</span> Upload this notebook (or screenshots of it) to other websites, including websites for study resources</li>
    </ul>
  </p>
  <p>
    Failure to follow these guidelines is a violation of your terms of service and will lead to your expulsion from WorldQuant University and the revocation your certificate.
  </p>
</div>

# 3.7. Istanbul Traffic Object Detection

# Prepare the Environment

First, you need to import the libraries you'll need. You can also import them as you find them necessary.

In [None]:
import random
import shutil
import xml.etree.ElementTree as ET
from collections import Counter
from pathlib import Path

import torch
import yaml
from PIL import Image
from tqdm.notebook import tqdm
from ultralytics import YOLO

Since GPUs are available on your machine, make sure you handle placing the tensors to the proper device.

**Task 3.7.1:** Check the availability of GPUs on this machine and determine the correct device name. Store the device name in the variable `device`.

In [None]:
device = ...
print(f"Using {device} device.")

### Check the data

The bounding box data is provided using the XML file format. Unfortunately, both the images and XML files are in the same directory, `istanbul_traffic/train`. We'll need rearrange the files and directories in the form we'll need later. Let's take a peek at the current directory hierarchy.

In [None]:
!tree istanbul_traffic --filelimit=10

**Task 3.7.2:** Create a variable for the train directory using `pathlib` syntax, `istanbul_traffic/train`.

In [None]:
istanbul_dir = ...

print("Training data directory:", istanbul_dir)

There should be one XML file for each JPG image. The corresponding XML and JPG share the same filename except for the file extension. E.g., `0ab6f274892b9b370e6441886b2d7b9d.jpg` and `0ab6f274892b9b370e6441886b2d7b9d.xml` belong together. We should verify that each training image has a corresponding XML file.

**Task 3.7.3:** Create a variable that counts how many files have the same base name.

In [None]:
file_extension_counts = ...

In [None]:
print(
    f"Number of files with 0ab6f274892b9b370e6441886b2d7b9d basename: {file_extension_counts['0ab6f274892b9b370e6441886b2d7b9d']}"
)

**Task 3.7.4:** Check that all of the values in `file_extension_counts` are 2.  An easy way to do this is to pass all of the values into a `set` object. Submit all the unique counts to the grader.

In [None]:
unique_counts = ...

Let's look at the format of the XML data.

In [None]:
xml_filepath = istanbul_dir / "0ab6f274892b9b370e6441886b2d7b9d.xml"
!head -n 25 $xml_filepath

Luckily for us, the XML data has the bounding box data in the format that YOLO expects. No need to transform those values! However, we'll still need to convert those XML files in text files where every line represents an object in `class_index x_center y_center width height` format. It's time to turn our attention to getting things ready.

**Task 3.7.5:** Finish the function below that returns a list of the bounding box data. The part you will need to finish is inside the `for` loop.

In [None]:
def parse_annotations(f):
    """Parse all of the objects in a given XML file to YOLO format.

    Input:  f      The path of the file to parse.

    Output: A list of objects in YOLO format.
            Each object is a list of numbers [class_id, x_center, y_center, width, height].
    """

    objects = []

    tree = ET.parse(f)
    root = tree.getroot()
    width = int(root.find("size").find("width").text)
    height = int(root.find("size").find("height").text)

    for obj in root.findall("object"):
        class_id = int(obj.find("name").text)
        bndbox = obj.find("bndbox")
        # Getting the bounding box values
        x_c = ...
        y_c = ...
        width = ...
        height = ...

        # Appending the object in the form of
        # [class_id, x_center, y_center, width, height]
        objects.append(...)

    return objects

With this working function, we can use one that takes the list of bounding box data and writes to disk a text file of the bounding box data.

**Task 3.7.6:** Write the bounding box data as a text file for YOLO

In [None]:
def write_label(objects, filename):
    """Write the annotations to a file in the YOLO text format.

    Input:  objects   A list of YOLO objects, each a list of numbers.
            filename  The path to write the text file."""

    with open(filename, "w") as f:
        for obj in objects:
            # Write the object out as space-separated values

            # Write a newline


In [None]:
objects = parse_annotations(istanbul_dir / "0ab6f274892b9b370e6441886b2d7b9d.xml")

In [None]:
write_label(objects, "yolo_test.txt")
!head -n 1 yolo_test.txt

We now need to set up our directory structure for YOLO.  Recall that YOLO expects a structure like
```
data_yolo
├── images
│   ├── train
│   └── val
└── labels
    ├── train
    └── val
```
We'll need to:
- Create the directories.
- Split the data into training and validation sets (80/20).
- Copy images to the correct folders.
- Convert the XML files to text files in the correct folders.

**Task 3.7.7:** Set up the directory structure.

In [None]:
yolo_base = Path("data_yolo")
# Make sure everything's cleared out
shutil.rmtree(yolo_base, ignore_errors=True)

# Make the directories
...


**Task 3.7.8:** Populate the YOLO training directory. 80% of the data will be sent to `train` and the remaining 20% to `val`.

In [None]:
# Don't change this
random.seed(42)

train_frac = 0.8
images = list(istanbul_dir.glob("*.jpg"))

for img in tqdm(images):
    # Randomly choose train or val split
    split = ... # this should be `train` or `val`
    # XML file path, from image stem
    annotation = istanbul_dir / f"{img.stem}.xml"
    # Parse annotations.  Watch out for errors!
    ...
    
    # Write label file based on parsed annotation
    ...

    
    # Copy image file to correct location
    shutil.copy(...)

### Train YOLO

For this assignment, we'll load a pre-trained YOLO model, in order to reduce train times and make it more efficient. Nevertheless, it's important to understand how the training would be performed.

The classes we wish to predict are:

In [None]:
classes = ["bicycle", "bus", "car", "motorcycle", "person"]

**Task 3.7.9:** Create a dictionary with the appropriate keys for a YOLO data set, for the creation of a YAML file.

In [None]:
metadata = {
    "path": str(
        yolo_base.absolute()
    ),  # It's easier to specify absolute paths with YOLO.
    "train": ..., # Training images, relative to above.

    "val": ..., # Validation images

    "names": ..., # Class names, as a list
    
    "nc": ..., # Number of classes
}

print(metadata)

**Task 3.7.10:** Save `metadata` as a YAML file named `data.yaml`.

In [None]:
yolo_config = "data.yaml"
yaml.safe_dump(...)

!cat data.yaml

Let's use the nano pre-trained YOLO model as our base model. Recall how this model is 30% smaller but with 80% of the performance of the small model.

In [None]:
model = YOLO("yolov8n.pt")

#print(model)

**Task 3.7.11:** Load the pre-trained model for this assignment

In [None]:
saved_model = YOLO(...)

**Task 3.7.12:** Define the variable `save_dir`

In [None]:
save_dir = ...

### Evaluating our Model

Before using our model, let's evaluate how it performed. The results are saved in a directory as specified in the `.save_dir` directory.

In [None]:
!tree $save_dir

**Task 3.7.13:** Display and examine the precision-recall curves for the model.  They are plotted in `PR_curve.png`.

In [None]:
pr_curve_image = Image.open(...)
pr_curve_image

Which classes does the model do well at detecting? Remember that the more area under the curve, the better the model is performing.

### Run YOLO on Image

We can confidently start using the YOLO model to detect objects in our images.

**Task 3.7.14:** Detect the objects in image `istanbul_traffic/test/3c794894a576d0d6355379613c2dadc5.jpg`. Set the confidence to 50% and make sure to save the results.

In [None]:
image_path_task = ...

result = ...

print(type(result))

The next thing we'd like to check is how many objects we detected.

**Task 3.7.15:** Determine the number of objects we detected

In [None]:
num_detections = ...
print(f"Number of objects detected: {len(result[0].boxes.cls)}")

What did we exactly detect?

**Task 3.7.16:** Create a dictionary that maps class names to how many objects we detected. E.g., how many "cars" we detected.

In [None]:
detected_objects = ...
print(detected_objects)

YOLO gain popularity because it's both fast and accurate.

**Task 3.7.17:** Calculate the total time object detection took.

In [None]:
total_time = ...

print(f"Total time in milliseconds: {total_time}")

We have configured YOLO to save the image with the bounding boxes. Let's see how it did.

**Task 3.7.18:** Create a path object with the location of the saved results.

In [None]:
location_of_results = ...
print(f"Location of saved results: {location_of_results}")

With the location of the saved results, we can take a look of the drawn bounding boxes from running YOLO.

In [None]:
Image.open(location_of_results / "3c794894a576d0d6355379613c2dadc5.jpg")

How did we fair? If you are not satisfied with the results, what would you recommend?

**Task 3.7.19:** Run YOLO on all test images, `istanbul_traffic/test`. Set the confidence to 50% and save the results.

In [None]:
test_images_path = ...
results_test = ...

By running YOLO on all our test images, we can determine the distribution of detected objects.

**Task 3.7.20:** Create a dictionary that maps class names to how many objects we detected across all of the test images. E.g., how many "cars" we detected.

In [None]:
detected_objects_test = ...

detected_objects_test

Are you surprised by this distribution? Or do they make sense given our images?

---
This file &#169; 2024 by [WorldQuant University](https://www.wqu.edu/) is not licensed personal or commercial use of any kind. **Any downloading, reproduction or redistribution of this material is strictly prohibited.**