**name :Jessi King**
**netid: xxz230009 **


**Verify NVIDIA GPU Availability**

Make sure you're using a GPU-equipped machine by going to "Runtime" -> "Change runtime type" in the top menu bar, and then selecting one of the GPU options in the Hardware accelerator section. Click Play on the following code block to verify that the NVIDIA GPU is present and ready for training.

In [None]:
!nvidia-smi

Sun Apr  6 01:38:43 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   37C    P8              9W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

#1.&nbsp;Gather and Label Training Images

Before we start training, we need to gather and label images that will be used for training the object detection model. A good starting point for a proof-of-concept model is 200 images. The training images should have random objects in the image along with the desired objects, and should have a variety of backgrounds and lighting conditions.

There are a couple options for gathering images:


*   Build a custom dataset by taking your own pictures of the objects and labeling them (this typically results in the best performance)
*   Find a pre-made dataset from sources like [Roboflow Universe](), [Kaggle](), or [Google Images V7]()


If you want to build your own dataset, there are several tools available for labeling images. One good option is [Label Studio](https://labelstud.io/?utm_source=youtube&utm_medium=video&utm_campaign=edjeelectronics), a free and open-source labeling tool that has a simple workflow while providing capabilities for more advanced features. My YouTube video that walks through this notebook (link to be added soon) shows how to label images with Label Studio.


If you used Label Studio to label and export the images, they'll be exported in a `project.zip` file that contains the following:

- An `images` folder containing the images
- A `labels` folder containing the labels in YOLO annotation format
- A `classes.txt` labelmap file that contains all the classes
- A `notes.json` file that contains info specific to Label Studio (this file can be ignored)

If you obtained your dataset from another source (like Roboflow Universe) or used another tool to label your dataset, make sure the files are organized in the same folder structure.

<p align=center>
<img src="https://raw.githubusercontent.com/EdjeElectronics/Train-and-Deploy-YOLO-Models/refs/heads/main/doc/zipped-data-example.png" height=""><br>
<i>Organize your data in the folders shown here. See my <a href="https://s3.us-west-1.amazonaws.com/evanjuras.com/resources/candy_data_06JAN25.zip">Candy Detection Dataset</a> for an example.</i>
</p>

Once you've got your dataset built, put into the file structure shown above, and zipped into `data.zip`, you're ready to move on to the next step.

# 2.&nbsp;Upload Image Dataset and Prepare Training Data

Next, we'll upload our dataset and prepare it for training with YOLO. We'll split the dataset into train and validation folders, and we'll automatically generate the configuration file for training the model.

## 2.1 Upload images

First, we need to upload the dataset to Colab. Here are a few options for moving the `data.zip` folder into this Colab instance.


**Copy from Google Drive**

You can also upload your images to your personal Google Drive, mount the drive on this Colab session, and copy them over to the Colab filesystem. This option works well if you want to upload the images beforehand so you don't have to wait for them to upload each time you restart this Colab. If you have more than 50MB worth of images, I recommend using this option.

First, upload the `data.zip` file to your Google Drive, and make note of the folder you uploaded them to. Replace `MyDrive/path/to/data.zip` with the path to your zip file. (For example, I uploaded the zip file to folder called "candy-dataset1", so I would use `MyDrive/candy-dataset1/data.zip` for the path). Then, run the following block of code to mount your Google Drive to this Colab session and copy the folder to this filesystem.

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

!cp /content/custom_data.zip /content

Mounted at /content/gdrive
cp: '/content/custom_data.zip' and '/content/custom_data.zip' are the same file


## 2.2 Split images into train and validation folders

At this point, whether you used Option 1, 2, or 3, you should be able to click the folder icon on the left and see your `data.zip` file in the list of files. Next, we'll unzip `data.zip` and create some folders to hold the images. Run the following code block to unzip the data.

In [None]:
# Unzip images to a custom data folder
!unzip -q /content/custom_data.zip -d /content/custom_data

split the data: randomly move 90% of dataset to the "train" folder and 10% to the "validation" folder.

In [None]:
import os
import random
import shutil

# Set the base data directory
data_dir = '/content/custom_data/custom_data'
images_dir = os.path.join(data_dir, 'images')
labels_dir = os.path.join(data_dir, 'label')
class_file = os.path.join(data_dir, 'class.txt')

# List all image files (filtering by common image extensions)
image_files = sorted(os.listdir(images_dir))
image_files = [f for f in image_files if f.lower().endswith(('.jpg', '.jpeg', '.png'))]

# Randomly shuffle the list of images
random.shuffle(image_files)

# Split images into 90% train and 10% validation
split_index = int(len(image_files) * 0.9)
train_images = image_files[:split_index]
val_images = image_files[split_index:]

# Define output directories for train and validation sets
train_dir = '/content/train'
val_dir = '/content/val'

train_images_dir = os.path.join(train_dir, 'images')
train_labels_dir = os.path.join(train_dir, 'label')
val_images_dir = os.path.join(val_dir, 'images')
val_labels_dir = os.path.join(val_dir, 'label')

# Create directories if they don't exist
os.makedirs(train_images_dir, exist_ok=True)
os.makedirs(train_labels_dir, exist_ok=True)
os.makedirs(val_images_dir, exist_ok=True)
os.makedirs(val_labels_dir, exist_ok=True)

# Function to copy images and corresponding label files
def copy_files(image_list, dest_images_dir, dest_labels_dir):
    for image in image_list:
        # Copy image file
        src_image_path = os.path.join(images_dir, image)
        dst_image_path = os.path.join(dest_images_dir, image)
        shutil.copy(src_image_path, dst_image_path)

        # Determine corresponding label file (assumes label file has same basename with .txt)
        label_file = os.path.splitext(image)[0] + '.txt'
        src_label_path = os.path.join(labels_dir, label_file)
        if os.path.exists(src_label_path):
            dst_label_path = os.path.join(dest_labels_dir, label_file)
            shutil.copy(src_label_path, dst_label_path)

# Copy training files
copy_files(train_images, train_images_dir, train_labels_dir)
# Copy validation files
copy_files(val_images, val_images_dir, val_labels_dir)



print("Data split complete!")
print(f"Training set: {len(train_images)} images")
print(f"Validation set: {len(val_images)} images")


Data split complete!
Training set: 106 images
Validation set: 12 images


Download our split data to my local

In [None]:
!zip -r /content/train.zip /content/train
!zip -r /content/val.zip /content/val


  adding: content/train/ (stored 0%)
  adding: content/train/label/ (stored 0%)
  adding: content/train/images/ (stored 0%)
  adding: content/train/images/Screenshot 2025-04-05 at 8.05.21 PM.png (deflated 0%)
  adding: content/train/images/Screenshot 2025-04-05 at 8.52.24 PM.png (deflated 0%)
  adding: content/train/images/Nesting-birds-750x465.png (deflated 0%)
  adding: content/train/images/Screenshot 2025-04-05 at 8.16.07 PM.png (deflated 0%)
  adding: content/train/images/Screenshot 2025-04-05 at 6.50.01 PM.png (deflated 0%)
  adding: content/train/images/Screenshot 2025-04-05 at 6.52.27 PM.png (deflated 0%)
  adding: content/train/images/Screenshot 2025-04-05 at 8.50.41 PM.png (deflated 0%)
  adding: content/train/images/Screenshot 2025-04-05 at 7.18.35 PM.png (deflated 0%)
  adding: content/train/images/Screenshot 2025-04-05 at 7.17.06 PM.png (deflated 0%)
  adding: content/train/images/ATT-osprey-rescue-696x369.jpg (deflated 0%)
  adding: content/train/images/Screenshot 2025-04-

In [None]:
from google.colab import files

files.download("/content/train.zip")
files.download("/content/val.zip")


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

There's one last step before we can run training: we need to create the Ultralytics training configuration YAML file. This file specifies the location of your train and validation data, and it also defines the model's classes. An example configuration file model is available [here](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/coco128.yaml).

Run the code block below to automatically generate a `data.yaml` configuration file. Make sure you have a labelmap file located at `custom_data/classes.txt`. If you used Label Studio or one of my pre-made datasets, it should already be present. If you assembled the dataset another way, you may have to manually create the `classes.txt` file (see [here](https://github.com/EdjeElectronics/Train-and-Deploy-YOLO-Models/blob/main/doc/classes.txt) for an example of how it's formatted).

In [None]:
# Python function to automatically create data.yaml config file
# 1. Reads "classes.txt" file to get list of class names
# 2. Creates data dictionary with correct paths to folders, number of classes, and names of classes
# 3. Writes data in YAML format to data.yaml

import yaml
import os

def create_data_yaml(path_to_classes_txtt, path_to_data_yaml):

  # Read class.txt to get class names
  if not os.path.exists(path_to_classes_txt):
    print(f'classes.txt file not found! Please create a classes.txt labelmap and move it to {path_to_classes_txt}')
    return
  with open(path_to_classes_txt, 'r') as f:
    classes = []
    for line in f.readlines():
      if len(line.strip()) == 0: continue
      classes.append(line.strip())
  number_of_classes = len(classes)

  # Create data dictionary
  data = {
      'path': '/content/data',
      'train': 'train/images',
      'val': 'validation/images',
      'nc': number_of_classes,
      'names': classes
  }

  # Write data to YAML file
  with open(path_to_data_yaml, 'w') as f:
    yaml.dump(data, f, sort_keys=False)
  print(f'Created config file at {path_to_data_yaml}')

  return

# Define path to classes.txt and run function
path_to_classes_txt = '/content/custom_data/custom_data/classes.txt'
path_to_data_yaml = '/content/data.yaml'

create_data_yaml(path_to_classes_txt, path_to_data_yaml)

print('\nFile contents:\n')
!cat /content/data.yaml

Created config file at /content/data.yaml

File contents:

path: /content/data
train: train/images
val: validation/images
nc: 12
names:
- GSM Antenna
- Microwave Antenna
- Antenna
- Lattice
- M Type Tower
- Birdnest
- Rust
- Guyed
- Fire
- Snow
- Person
- Tower Breakage
