## Training for Tennis Ball Camera Model
If you need to download the required dataset, run the first section. WARNING: the downloaded files are large (2.24GB)!

#### Download Dataset
From: https://storage.googleapis.com/openimages/web/index.html 

Download instructions: https://storage.googleapis.com/openimages/web/download_v7.html#download-manually

FiftyOne: https://docs.voxel51.com/user_guide/dataset_zoo/index.html

Visualisation: https://storage.googleapis.com/openimages/web/visualizer/index.html?type=detection&set=train&c=%2Fm%2F05ctyq


In [1]:
%pip install ultralytics
%pip install fiftyone

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
import fiftyone as fo
import fiftyone.zoo as foz
from ultralytics.utils import LOGGER, SETTINGS, Path
import warnings

name = "open-images-v7-tennis-ball"
for split in 'train', 'test', 'validation': # Leave out the validation set.
    train = split == 'train'

    dataset = foz.load_zoo_dataset("open-images-v7", split=split, label_types=["detections"], classes=["Tennis ball"], 
                                dataset_dir="./camera_data",
                                max_samples=1000, shuffle=True)


    # Export the dataset to the YOLOv5 format for easy training and configuration using the Ultralytics API.
    if split == 'train':
        classes = dataset.distinct('ground_truth.detections.label')  # only observed classes - get the distinct labels in the dataset.

    with warnings.catch_warnings():
        warnings.filterwarnings("ignore", category=UserWarning, module="fiftyone.utils.yolo")
        dataset.export(export_dir="./camera_data/YOLO_Dataset",
                            dataset_type=fo.types.YOLOv5Dataset,
                            label_field='ground_truth',
                            split='val' if split == 'validation' else split,
                            classes=classes,
                            overwrite=train)

Downloading split 'train' to './camera_data\train' if necessary
Only found 299 (<1000) samples matching your requirements
Necessary images already downloaded
Existing download of split 'train' is sufficient
Loading existing dataset 'open-images-v7-train-1000'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use
 100% |█████████████████| 299/299 [5.9s elapsed, 0s remaining, 62.1 samples/s]      
Downloading split 'test' to './camera_data\test' if necessary
Only found 45 (<1000) samples matching your requirements
Necessary images already downloaded
Existing download of split 'test' is sufficient
Loading existing dataset 'open-images-v7-test-1000'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use
Directory './camera_data/YOLO_Dataset' already exists; export will be merged with existing files
 100% |███████████████████| 45/45 [871.5ms elapsed, 0s remaining, 51.6 samples/s]      
Downloading split

- Need to condense the train/test/validation csvs for only tennis balls using the LabelName "/m/05ctyq". Each has max and min values for x and y to draw a bounding box with extra info which can be ignored.
- Images and other data saved here: C:\Users\mvsue\fiftyone\open-images-v7

#### Training Code

Ultralytics and Open Images V7: https://docs.ultralytics.com/datasets/detect/open-images-v7/#citations-and-acknowledgments

Training: https://docs.ultralytics.com/modes/train/#key-features-of-train-mode

In [3]:
# Download a YOLO segmentation model.
# Need to add a softmax and maybe other layers on top for two classes: [tennis ball, not tennis ball].
# Data output should be pixel segmented image with binary tennis ball locations with depth.

# How to get class output and depth using a softmax? - We don't need to predict depth as we have the depth camera already?
from ultralytics import YOLO

# Load this model pretrained on the Open Images V7 Dataset.
model = YOLO('./camera_data/YOLO_Dataset/yolov8n-oiv7.pt')

# Train on the created dataset via the .yaml made in the previous section.
# Pump up the imgsz.
results = model.train(data='./camera_data/YOLO_Dataset/dataset.yaml', epochs=200, imgsz=640)
# imgsz is the size to which images are automatically resized during training in [width, height]
# Make it the same as the pybullet simulation.

Downloading https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8n-oiv7.pt to 'camera_data\YOLO_Dataset\yolov8n-oiv7.pt'...


100%|██████████| 6.87M/6.87M [00:04<00:00, 1.48MB/s]


New https://pypi.org/project/ultralytics/8.2.11 available  Update with 'pip install -U ultralytics'
Ultralytics YOLOv8.2.10  Python-3.8.19 torch-2.3.0+cpu CPU (AMD Ryzen 7 2700X Eight-Core Processor)
[34m[1mengine\trainer: [0mtask=detect, mode=train, model=./camera_data/YOLO_Dataset/yolov8n-oiv7.pt, data=./camera_data/YOLO_Dataset/dataset.yaml, epochs=200, time=None, patience=100, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=train4, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=

[34m[1mtrain: [0mScanning C:\Users\mvsue\OneDrive - UTS\UTS\Ai in Robotics\AI-In-Robotics-Project\camera_data\YOLO_Dataset\labels\train... 299 images, 0 backgrounds, 0 corrupt: 100%|██████████| 299/299 [00:00<00:00, 652.84it/s]

[34m[1mtrain: [0mNew cache created: C:\Users\mvsue\OneDrive - UTS\UTS\Ai in Robotics\AI-In-Robotics-Project\camera_data\YOLO_Dataset\labels\train.cache



[34m[1mval: [0mScanning C:\Users\mvsue\OneDrive - UTS\UTS\Ai in Robotics\AI-In-Robotics-Project\camera_data\YOLO_Dataset\labels\val... 11 images, 0 backgrounds, 0 corrupt: 100%|██████████| 11/11 [00:00<00:00, 314.24it/s]

[34m[1mval: [0mNew cache created: C:\Users\mvsue\OneDrive - UTS\UTS\Ai in Robotics\AI-In-Robotics-Project\camera_data\YOLO_Dataset\labels\val.cache





Plotting labels to runs\detect\train4\labels.jpg... 
[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m AdamW(lr=0.000204, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0)
[34m[1mTensorBoard: [0mmodel graph visualization added 
Image sizes 640 train, 640 val
Using 0 dataloader workers
Logging results to [1mruns\detect\train4[0m
Starting training for 200 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      1/200         0G      1.307      4.944       1.32         79        640: 100%|██████████| 19/19 [02:14<00:00,  7.09s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:02<00:00,  2.57s/it]

                   all         11         97          0          0          0          0






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      2/200         0G      1.291      4.727      1.288         64        640: 100%|██████████| 19/19 [02:16<00:00,  7.21s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:02<00:00,  2.15s/it]

                   all         11         97          0          0          0          0






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      3/200         0G       1.32      4.218       1.29         96        640: 100%|██████████| 19/19 [02:17<00:00,  7.25s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:02<00:00,  2.18s/it]

                   all         11         97     0.0532     0.0884     0.0677     0.0451






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      4/200         0G      1.342      3.675      1.284        105        640: 100%|██████████| 19/19 [02:08<00:00,  6.78s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:02<00:00,  2.10s/it]

                   all         11         97      0.624       0.11      0.156      0.117






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      5/200         0G      1.304      3.188      1.276         57        640: 100%|██████████| 19/19 [02:19<00:00,  7.34s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:01<00:00,  1.89s/it]

                   all         11         97     0.0909      0.331      0.221      0.164






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      6/200         0G      1.317      2.926      1.279         74        640: 100%|██████████| 19/19 [01:59<00:00,  6.31s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:01<00:00,  1.85s/it]

                   all         11         97      0.103       0.39      0.236      0.157






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      7/200         0G      1.296      2.859      1.287         72        640: 100%|██████████| 19/19 [02:02<00:00,  6.45s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:01<00:00,  1.98s/it]

                   all         11         97      0.771      0.178      0.271      0.166






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      8/200         0G       1.22       2.71      1.269         84        640: 100%|██████████| 19/19 [02:07<00:00,  6.72s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:02<00:00,  2.23s/it]

                   all         11         97      0.669      0.201      0.282      0.183






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      9/200         0G      1.197      2.554      1.262         63        640: 100%|██████████| 19/19 [02:08<00:00,  6.74s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:02<00:00,  2.37s/it]

                   all         11         97      0.655      0.229      0.292      0.202






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     10/200         0G      1.268      2.467      1.243        119        640: 100%|██████████| 19/19 [02:14<00:00,  7.07s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:02<00:00,  2.13s/it]

                   all         11         97      0.608      0.215      0.307      0.221






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     11/200         0G      1.208      2.325      1.239         66        640: 100%|██████████| 19/19 [02:13<00:00,  7.00s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:02<00:00,  2.21s/it]

                   all         11         97       0.55      0.229      0.315       0.22






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     12/200         0G      1.235      2.322      1.225        144        640: 100%|██████████| 19/19 [02:12<00:00,  6.96s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:01<00:00,  1.98s/it]

                   all         11         97      0.499      0.227      0.328      0.209






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     13/200         0G      1.252      2.226      1.228        132        640:  42%|████▏     | 8/19 [00:55<01:21,  7.37s/it]

Run predictions on the test set to assess model performance.

Guide: https://docs.ultralytics.com/modes/predict/#key-features-of-predict-mode

In [None]:
from ultralytics.utils.plotting import Annotator
import random
import os
from PIL import Image
from ultralytics import YOLO
# https://stackoverflow.com/questions/75324341/yolov8-get-predicted-bounding-box 

# Load the model if you have to.
model = YOLO('./runs/detect/train3/weights/best.pt')

# https://stackoverflow.com/questions/701402/best-way-to-choose-a-random-file-from-a-directory
random_file = random.choice(os.listdir('./camera_data/test/data'))

test_results = model.predict('./camera_data/test/data/'f'{random_file}')

annotator = Annotator(Image.open('./camera_data/test/data/'f'{random_file}'), pil=True)

boxes = test_results[0].boxes
for box in boxes:
    b = box.xyxy[0] # Box coordinates, left, top, right, bottom.
    c = box.cls
    annotator.box_label(b, model.names[int(c)])

annotator.show()

# img = annotator.result()
# img[0]
# cv2.imshow('YOLO V8 Test Sample Result', img)

# if cv2.waitKey(1) & 0xFF == ord(' '):
#     break


image 1/1 c:\Users\mvsue\OneDrive - UTS\UTS\Ai in Robotics\AI-In-Robotics-Project\camera_data\test\data\33f577e844eb3e26.jpg: 256x320 1 Human face, 1 Tennis ball, 1 Tennis racket, 52.0ms
Speed: 1.0ms preprocess, 52.0ms inference, 1.0ms postprocess per image at shape (1, 3, 256, 320)
