**Setting Up the Environment for Object Detection**

We installed Roboflow to manage datasets and Ultralytics to use YOLO models for object detection, setting up our environment for computer vision tasks.

In [None]:
!pip install roboflow
!pip install ultralytics

**Importing Libraries and Tools**

We begin by importing all the essential libraries needed for our object detection project:

Roboflow: To load and manage datasets from the Roboflow platform.

os and glob: For handling file and directory operations.

cv2 (OpenCV): For image processing and manipulation.

matplotlib.pyplot: To visualize images and results.

yaml: To read and write configuration files.

ultralytics.YOLO: To use the powerful YOLO models for training and inference.

shutil: To perform file operations like copying or moving files.

In [None]:
from roboflow import Roboflow
import os
import glob
import cv2
import matplotlib.pyplot as plt
import yaml
from ultralytics import YOLO
import shutil

**Loading the Dataset from Roboflow**

Next, we connect to Roboflow using our API key and access the desired dataset:

We authenticate with the Roboflow API using our personal key.

Then, we navigate to our workspace and select the specific Cityscapes project.

We choose version 1 of the dataset and download it in the YOLOv8 format, ready for training with Ultralytics' YOLO models.

In [None]:
rf = Roboflow(api_key="9FSdqSkLxd4ITsUw2uxG")
project = rf.workspace("luigiaworkspace").project("cityscapes-zz0ur")
dataset = project.version("1").download("yolov8")

**Exploring the Dataset Structure**

We now dive into the structure of our downloaded dataset:

We define the base training folder and locate its images and labels subfolders.

We list a few sample files from each folder to ensure the dataset is loaded correctly.

We then search for image files (.jpg or .png) in the images folder.

To proceed with analysis or visualization, we select the first image in the dataset.

Based on the image filename, we construct the corresponding label file path from the labels folder by replacing the image extension with .txt.

In [None]:
# Base train folder
train_folder = "/content/Cityscapes-1/train"

# Images and labels folders
images_folder = os.path.join(train_folder, "images")
labels_folder = os.path.join(train_folder, "labels")

print("Images folder files:", os.listdir(images_folder)[:5])
print("Labels folder files:", os.listdir(labels_folder)[:5])

# Find image files (.jpg or .png)
image_files = glob.glob(images_folder + "/*.jpg") + glob.glob(images_folder + "/*.png")
if not image_files:
    raise FileNotFoundError("No image files found in:", images_folder)

# Pick first image
image_path = image_files[0]

# Corresponding label path in labels folder
label_name = os.path.basename(image_path).replace(".jpg", ".txt").replace(".png", ".txt")
label_path = os.path.join(labels_folder, label_name)

print("Selected image:", image_path)
print("Corresponding label:", label_path)

**Visualizing a Sample Image and Its Labels**

With our dataset ready, we take a closer look at one of the training examples:

We load the image using OpenCV and convert its color format from BGR to RGB for correct visualization with Matplotlib.

The image is then displayed with a title, giving us a visual sense of what the model will learn from.

We also open the corresponding YOLOv8 label file and print the first couple of lines to see how segmentation data is structured — typically in the form of class IDs followed by normalized polygon coordinates.

In [None]:
# Load image and show
img = cv2.imread(image_path)
if img is None:
    raise ValueError("Could not load image:", image_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(img)
plt.title("Sample Cityscapes Image")
plt.axis("off")
plt.show()

# Print first two lines of label
with open(label_path, "r") as f:
    lines = f.readlines()
    print("Sample YOLOv8 segmentation labels:")
    print(lines[:2])

**Preparing and Customizing the Dataset Configuration**

Before training our model, we configure the data.yaml file that tells YOLO how to interpret our dataset:

Step 1: We create a backup of the original data.yaml (if it exists and hasn’t been backed up yet), ensuring we can revert changes if needed.

Step 2: We read and print the contents of the original configuration from the backup to review its structure.

Step 3: We define a new data.yaml file that specifies:

The paths to the training and validation image directories.

The number of object classes (nc: 3).

The names of the classes: "bicycle", "car", and "motorcycle"

In [None]:
import shutil

dataset_path = "/content/Cityscapes-1"
yaml_path = f"{dataset_path}/data.yaml"
backup_yaml_path = f"{dataset_path}/original_data.yaml"

# STEP 1: Backup original if it exists and not already backed up
if os.path.exists(yaml_path) and not os.path.exists(backup_yaml_path):
    shutil.copy(yaml_path, backup_yaml_path)
    print(f"📦 Backed up original data.yaml to {backup_yaml_path}")

# STEP 2: Read original (or backup if exists)
if os.path.exists(backup_yaml_path):
    with open(backup_yaml_path, "r") as f:
        original_yaml = f.read()
    print("📄 Original data.yaml (from backup):\n", original_yaml)
else:
    print("⚠️ No original data.yaml backup found.")

# STEP 3: Define and write new data.yaml
new_yaml = f"""
train: {dataset_path}/train/images
val: {dataset_path}/valid/images

nc: 3
names: ["bicycle", "car", "motorcycle"]
"""

with open(yaml_path, "w") as f:
    f.write(new_yaml.strip())

print("\n✅ New data.yaml written:\n", new_yaml.strip())

**Remapping Class Labels for YOLO**

In this step, we ensure our label files align with the updated class definitions in data.yaml by remapping class IDs:

We define a remap dictionary that translates the original class IDs (1, 2, 3) to the new IDs (0, 1, 2) used in training.

We target the labels folders in both the train and validation datasets.

For each label file:

We read its content and process each line.

If the line contains a valid class ID and polygon coordinates, we replace the original class ID with its new mapped value.

The updated lines are written back to the file.

This ensures consistency between the label files and the class names defined in data.yaml, preparing the dataset for error-free training.

In [None]:
import os

# Class remapping: original_id -> new_id
remap = {1: 0, 2: 1, 3: 2}

# Folder paths
label_dirs = [
    "/content/Cityscapes-1/train/labels",
    "/content/Cityscapes-1/valid/labels"
]

for label_dir in label_dirs:
    for filename in os.listdir(label_dir):
        if not filename.endswith(".txt"):
            continue

        path = os.path.join(label_dir, filename)
        with open(path, "r") as f:
            lines = f.readlines()

        new_lines = []
        for line in lines:
            parts = line.strip().split()
            if not parts or len(parts) < 3:
                continue
            try:
                cls = int(parts[0])
                if cls in remap and (len(parts) - 1) % 2 == 0:
                    new_cls = remap[cls]
                    new_line = " ".join([str(new_cls)] + parts[1:]) + "\n"
                    new_lines.append(new_line)
            except ValueError:
                continue

        with open(path, "w") as f:
            f.writelines(new_lines)

**Training the YOLOv8 Segmentation Model**

With everything in place, we launch the training process using Ultralytics' YOLOv8 segmentation model:

We load a pretrained YOLOv8 model (yolov8m-seg.pt) as the starting point, benefiting from its pre-learned features.

The training is configured using our customized data.yaml file, which defines the dataset structure and classes.

In [None]:
from ultralytics import YOLO

# Load the pretrained segmentation model
model = YOLO("yolov8m-seg.pt")  # You can also use yolov8m-seg.pt or others for better accuracy


model.train(
    data="/content/Cityscapes-1/data.yaml",
    epochs=50,
    imgsz=640,
    batch=16,        # number of images per batch (increase if your GPU has enough memory)
    device=0,        # GPU device index; 0 means first GPU; use 'cpu' if no GPU available
    workers=4        # number of CPU workers for loading data (helps speed up data loading)
)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

**Displaying a Test Video in the Notebook**

After training, we may want to visualize results or test our model on video input. This cell does just that:

We specify the path to a video file stored in Google Drive.

The video is read in binary mode, encoded in Base64, and embedded directly into the notebook using HTML.

This allows the video to be played back inline, without needing to download or open it externally.

In [None]:
from IPython.display import HTML
from base64 import b64encode

# Update the filename if it's different
video_path = "/content/drive/MyDrive/Prof data/video_tst.mp4"

# Encode video to base64
mp4 = open(video_path, 'rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()

# Display video
HTML(f"""
<video width=640 controls>
    <source src="{data_url}" type="video/mp4">
</video>
""")

**Running Inference on a Video with the Trained YOLOv8 Model**

Now that our model is trained, we use it to detect and segment objects in a video:

Load the trained model from the best weights saved during training.

Specify the input video to test the model’s performance in real-world scenarios.

Run predictions using the YOLO model, saving the output video with segmentation overlays.

Locate the output .avi file generated by YOLO.

Convert the video to .mp4 using ffmpeg for browser compatibility and smoother playback.

Read and Base64 encode the .mp4 file.

Display the result inline using HTML so we can instantly watch how well our model performs.

In [None]:
from ultralytics import YOLO
from IPython.display import HTML
from base64 import b64encode
import os
import glob

# Step 1: Load your trained YOLO model
model = YOLO("/content/runs/segment/train/weights/best.pt")

# Step 2: Path to the input video
input_video = "/content/drive/MyDrive/Prof data/video_tst.mp4"

# Step 3: Run prediction and save results
results = model.predict(source=input_video, save=True, conf=0.4)

# Step 4: Find the saved .avi video in the output directory
output_dir = results[0].save_dir  # e.g. 'runs/segment/predict3'
avi_files = glob.glob(os.path.join(output_dir, "*.avi"))
if not avi_files:
    raise FileNotFoundError("No AVI file found in output directory!")

avi_path = avi_files[0]
mp4_path = avi_path.replace(".avi", ".mp4")

# Step 5: Convert .avi to .mp4 using ffmpeg
!ffmpeg -y -i "$avi_path" -vcodec libx264 -crf 23 "$mp4_path"

# Step 6: Read and encode the .mp4 video for inline display
with open(mp4_path, 'rb') as f:
    video_data = f.read()
data_url = "data:video/mp4;base64," + b64encode(video_data).decode()

# Step 7: Display video in notebook
HTML(f"""
<video width=640 controls>
    <source src="{data_url}" type="video/mp4">
    Your browser does not support the video tag.
</video>
""")