# How to Train Your Own Cone Detection Networks

<img src="https://user-images.githubusercontent.com/22118253/70957091-fe06a480-2042-11ea-8c06-0fcc549fc19a.png">

In this notebook, we will demonstrate 
- how to train your own YOLOv3-based traffic cone detection network and do inference on a video.

**[Accurate Low Latency Visual Perception for Autonomous Racing: Challenges Mechanisms and Practical Solutions](https://github.com/mit-han-lab/once-for-all)** is an accurate low latency visual perception system introduced by Kieran Strobel, Sibo Zhu, Raphael Chang, and Skanda Koppula.

## 1. Preparation
Let's first install all the required packages:

In [1]:
! sudo apt install unzip
print('Installing PyTorch...')
! pip3 install torch 
print('Installing torchvision...')
! pip3 install torchvision 
print('Installing numpy...')
! pip3 install numpy 
# tqdm is a package for displaying a progress bar.
print('Installing tqdm (progress bar) ...')
! pip3 install tqdm 
print('Installing matplotlib...')
! pip3 install matplotlib 
print('Installing Tensorboard')
! pip3 install tensorboardx
print('Installing all the other required packages once for all')
! sudo python3 setup.py install
print('Installing video editor')
! sudo apt install ffmpeg -y 

Reading package lists... Done
Building dependency tree       
Reading state information... Done
unzip is already the newest version (6.0-21ubuntu1.1).
The following packages were automatically installed and are no longer required:
  libnvidia-common-460 nsight-compute-2020.2.0
Use 'sudo apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 42 not upgraded.
Installing PyTorch...
Installing torchvision...
Installing numpy...
Installing tqdm (progress bar) ...
Installing matplotlib...
Installing Tensorboard
Collecting tensorboardx
  Downloading tensorboardX-2.5-py2.py3-none-any.whl (125 kB)
[K     |████████████████████████████████| 125 kB 7.0 MB/s 
Installing collected packages: tensorboardx
Successfully installed tensorboardx-2.5
Installing all the other required packages once for all
python3: can't open file 'setup.py': [Errno 2] No such file or directory
Installing video editor
Reading package lists... Done
Building dependency tree       
Reading state informa

Let' s clone our repo first...

In [2]:
! git clone -b crop-image https://github.com/Imperial-Driverless/MIT-Driverless-CV-TrainingInfra.git

! mv MIT-Driverless-CV-TrainingInfra/CVC-YOLOv3/* .

Cloning into 'MIT-Driverless-CV-TrainingInfra'...
remote: Enumerating objects: 934, done.[K
remote: Counting objects: 100% (53/53), done.[K
remote: Compressing objects: 100% (45/45), done.[K
remote: Total 934 (delta 30), reused 15 (delta 8), pack-reused 881[K
Receiving objects: 100% (934/934), 9.90 MiB | 33.46 MiB/s, done.
Resolving deltas: 100% (610/610), done.


Before we start training, let's download the Cone Detection dataset and the corresponding label and intial training weights. 

In [None]:
"""
print("Downloading Training Dataset")
! wget https://storage.googleapis.com/mit-driverless-open-source/YOLO_Dataset.zip
! unzip -q YOLO_Dataset.zip
! mv YOLO_Dataset dataset/ && rm YOLO_Dataset.zip
print("Downloading YOLOv3 Sample Weights")
! wget https://storage.googleapis.com/mit-driverless-open-source/yolov3-training/sample-yolov3.weights 
print("Downloading Training and Validation Label")
! cd dataset/ && wget https://storage.googleapis.com/mit-driverless-open-source/yolov3-training/all.csv && cd ..
! cd dataset/ && wget https://storage.googleapis.com/mit-driverless-open-source/yolov3-training/train_mini_yolo.csv && mv train_mini_yolo.csv train.csv && cd ..
! cd dataset/ && wget https://storage.googleapis.com/mit-driverless-open-source/yolov3-training/validate_mini_yolo.csv && mv validate_mini_yolo.csv validate.csv && cd ..
"""

Downloading Training Dataset
--2022-05-06 17:28:42--  https://storage.googleapis.com/mit-driverless-open-source/YOLO_Dataset.zip
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.137.128, 142.250.141.128, 2607:f8b0:4023:c03::80, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.137.128|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2022-05-06 17:28:42 ERROR 403: Forbidden.

unzip:  cannot find or open YOLO_Dataset.zip, YOLO_Dataset.zip.zip or YOLO_Dataset.zip.ZIP.
mv: cannot stat 'YOLO_Dataset': No such file or directory
Downloading YOLOv3 Sample Weights
--2022-05-06 17:28:42--  https://storage.googleapis.com/mit-driverless-open-source/yolov3-training/sample-yolov3.weights
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.137.128, 142.250.141.128, 2607:f8b0:4023:c03::80, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.137.128|:443... connected.
HTTP request sent, awaiting respo

Get required data

In [None]:
! echo "Downloading Training Dataset"
! gsutil cp -p gs://mit-driverless-open-source/YOLO_Dataset.zip ./dataset/
! unzip dataset/YOLO_Dataset.zip -d ./dataset/
! rm YOLO_Dataset.zip

In [4]:
! echo "Downloading YOLOv3 pretrained Weights"
! gsutil cp -p  gs://mit-driverless-open-source/pretrained_yolo.weights ./yolo_weights/

Downloading YOLOv3 pretrained Weights
Copying gs://mit-driverless-open-source/pretrained_yolo.weights...
\ [1 files][236.5 MiB/236.5 MiB]                                                
Operation completed over 1 objects/236.5 MiB.                                    


In [5]:
!zip -r /content/yolo_weights.zip /content/yolo_weights

from google.colab import files
files.download("/content/yolo_weights.zip")

  adding: content/yolo_weights/ (stored 0%)
  adding: content/yolo_weights/pretrained_yolo.weights (deflated 7%)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [6]:
! echo "Downloading Training and Validation Label"
! gsutil cp -p gs://mit-driverless-open-source/yolov3-training/all.csv ./dataset/
! gsutil cp -p gs://mit-driverless-open-source/yolov3-training/train.csv ./dataset/
! gsutil cp -p gs://mit-driverless-open-source/yolov3-training/validate.csv ./dataset/

Downloading Training and Validation Label
Copying gs://mit-driverless-open-source/yolov3-training/all.csv...
/ [1 files][  2.4 MiB/  2.4 MiB]                                                
Operation completed over 1 objects/2.4 MiB.                                      
Copying gs://mit-driverless-open-source/yolov3-training/train.csv...
/ [1 files][  1.8 MiB/  1.8 MiB]                                                
Operation completed over 1 objects/1.8 MiB.                                      
Copying gs://mit-driverless-open-source/yolov3-training/validate.csv...
/ [1 files][350.3 KiB/350.3 KiB]                                                
Operation completed over 1 objects/350.3 KiB.                                    


## 2. Using Pretrained YOLOv3 Weights File to Start Training


First, import all the packages used in this tutorial:

In [9]:
import os
import random
import tempfile
import time
import multiprocessing
import subprocess
import math
import shutil
import math

from datetime import datetime
import torch
import torch.nn as nn
from torch.utils.data import DataLoader

from models import Darknet
from utils.datasets import ImageLabelDataset
from utils.utils import model_info, print_args, Logger, visualize_and_save_to_local,xywh2xyxy
from yolo_tutorial_util import run_epoch
import validate
import warnings
import sys

##### section for all random seeds #####
torch.manual_seed(2)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
########################################

warnings.filterwarnings("ignore")
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

cuda = torch.cuda.is_available()
device = torch.device('cuda:0' if cuda else 'cpu')
num_cpu = multiprocessing.cpu_count() if cuda else 0

if cuda:
    torch.cuda.synchronize()
random.seed(0)
torch.manual_seed(0)

if cuda:
    torch.cuda.manual_seed(0)
    torch.cuda.manual_seed_all(0)
    torch.backends.cudnn.benchmark = True
    torch.cuda.empty_cache()

Successfully imported all packages and configured random seed to 17!

Training Config

In [10]:
# Training Related Config
batch_size = int(5)
optimizer_pick = "Adam"
model_cfg = "model_cfg/yolo_baseline.cfg"
weights_path = "sample-yolov3.weights"
output_path = "automatic"
dataset_path = "dataset/YOLO_Dataset/"
num_epochs = int(2) # Set them to 2048 during full dataset training
num_steps = 8388608
checkpoint_interval = int(1) # How often you want to get evaluation metric during training
val_tolerance = int(3)
min_epochs = int(3)

# Dataloader Related Config
data_aug = False # toggle for image augmentation
blur = False # Add blur to image
salt = False # Add "salt" noise to image
noise = False # Add noise to image
contrast = False # Add High Contrast to image
sharpen = False # Image Sharpen
ts = True # Tiling and Scaling
augment_affine = False # Affine
augment_hsv = False # HSV
lr_flip = False # left and right flip
ud_flip = False # up and down flip

# Training Hyperparameter Related Config
momentum = float(0.9)
gamma = float(0.95)
lr = float(0.001)
weight_decay = float(0.0)

xy_loss = float(2)
wh_loss= float(1.6)
no_object_loss = float(25)
object_loss = float(0.1)

# Debugging/Visualization Related Config
debug_mode = False
upload_dataset = False
vanilla_anchor = False
vis_batch = int(0)


Initializing Model

In [11]:
input_arguments = list(locals().items())

print("Initializing model")
model = Darknet(config_path=model_cfg,xy_loss=xy_loss,wh_loss=wh_loss,no_object_loss=no_object_loss,object_loss=object_loss,vanilla_anchor=vanilla_anchor)

Initializing model


Processing Training Config

In [None]:
if output_path == "automatic":
    current_month = datetime.now().strftime('%B').lower()
    current_year = str(datetime.now().year)
    if not os.path.exists(os.path.join('outputs/', current_month + '-' + current_year + '-experiments/' + model_cfg.split('.')[0].split('/')[-1])):
        os.makedirs(os.path.join('outputs/', current_month + '-' + current_year + '-experiments/' + model_cfg.split('.')[0].split('/')[-1]))
    output_uri = os.path.join('outputs/', current_month + '-' + current_year + '-experiments/' + model_cfg.split('.')[0].split('/')[-1])
else:
    output_uri = output_path

img_width, img_height = model.img_size()
bw  = model.get_bw()
validate_uri, train_uri = model.get_links()
num_validate_images, num_train_images = model.num_images()
conf_thresh, nms_thresh, iou_thresh = model.get_threshs()
num_classes = model.get_num_classes()
loss_constant = model.get_loss_constant()
conv_activation = model.get_conv_activation()
anchors = model.get_anchors()
onnx_name = model.get_onnx_name()

start_epoch = 0
weights_path = weights_path

### Data Loaders

One of our main contributions to vanilla YOLOv3 is the custom data loader we implemented:

Each set of training images from a specific sensor/lens/perspective combination is uniformly rescaled such that their landmark size distributions matched that of the camera system on the vehicle. Each training image was then padded if too small or split up into multiple images if too large.

<p align="center">
<img src="https://user-images.githubusercontent.com/22118253/69765465-09e90000-1142-11ea-96b7-370868a0033b.png" width="600">
</p>

In [None]:
with tempfile.TemporaryDirectory() as tensorboard_data_dir:
    print("Initializing data loaders")
    train_data_loader = torch.utils.data.DataLoader(
        ImageLabelDataset(train_uri, dataset_path=dataset_path, width=img_width, height=img_height, augment_hsv=augment_hsv,
                            augment_affine=augment_affine, num_images=num_train_images,
                            bw=bw, n_cpu=num_cpu, lr_flip=lr_flip, ud_flip=ud_flip,vis_batch=vis_batch,data_aug=data_aug,blur=blur,salt=salt,noise=noise,contrast=contrast,sharpen=sharpen,ts=ts,debug_mode=debug_mode, upload_dataset=upload_dataset),
        batch_size=(1 if debug_mode else batch_size),
        shuffle=(False if debug_mode else True),
        num_workers=(0 if vis_batch else num_cpu),
        pin_memory=cuda)
    print("Num train images: ", len(train_data_loader.dataset))

    validate_data_loader = torch.utils.data.DataLoader(
        ImageLabelDataset(validate_uri, dataset_path=dataset_path, width=img_width, height=img_height, augment_hsv=False,
                            augment_affine=False, num_images=num_validate_images,
                            bw=bw, n_cpu=num_cpu, lr_flip=False, ud_flip=False,vis_batch=vis_batch,data_aug=False,blur=False,salt=False,noise=False,contrast=False,sharpen=False,ts=ts,debug_mode=debug_mode, upload_dataset=upload_dataset),
        batch_size=(1 if debug_mode else batch_size),
        shuffle=False,
        num_workers=(0 if vis_batch else num_cpu),
        pin_memory=cuda)

Initialize Optimizer

In [None]:
if optimizer_pick == "Adam":
    print("Using Adam Optimizer")
    optimizer = torch.optim.Adam(filter(lambda p: p.requires_grad, model.parameters()),
                                lr=lr, weight_decay=weight_decay)
elif optimizer_pick == "SGD":
    print("Using SGD Optimizer")
    optimizer = torch.optim.SGD(filter(lambda p: p.requires_grad, model.parameters()),
                            lr=lr, momentum=momentum, weight_decay=weight_decay)
else:
    raise Exception(f"Invalid optimizer name: {optimizer_pick}")
print("Loading weights")
model.load_weights(weights_path, model.get_start_weight_dim())

# Set scheduler
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=gamma)

Sending Model to GPUs if we are in GPU mode

### Let's Dance (Training)

In [None]:
if torch.cuda.device_count() > 1:
    print('Using ', torch.cuda.device_count(), ' GPUs')
    model = nn.DataParallel(model)
model = model.to(device, non_blocking=True)

val_loss = 999  # using a high number for validation loss
val_loss_counter = 0
step = [0]  # wrapping in an array so it is mutable
epoch = start_epoch
while epoch < num_epochs and step[0] < num_steps:
    epoch += 1
    scheduler.step()
    model.train()
    run_epoch(label_prefix="train", data_loader=train_data_loader, epoch=epoch,
                step=step, model=model, num_epochs=num_epochs, num_steps=num_steps,
                optimizer=optimizer, device=device)
    print('Completed epoch: ', epoch)
    # Update best loss
    if epoch % checkpoint_interval == 0 or epoch == num_epochs or step[0] >= num_steps:
        # First, save the weights
        save_weights_uri = os.path.join(output_uri, "{epoch}.weights".format(epoch=epoch))
        model.save_weights(save_weights_uri)

        with torch.no_grad():
            print("Calculating loss on validate data")
            epoch_losses, epoch_time_total, epoch_num_targets = run_epoch(
                label_prefix="validate", data_loader=validate_data_loader, epoch=epoch,
                model=model, num_epochs=num_epochs, num_steps=num_steps, optimizer=None,
                step=step, device=device)
            avg_epoch_loss = epoch_losses[0] / epoch_num_targets
            print('Average Validation Loss: {0:10.6f}'.format(avg_epoch_loss))

            if avg_epoch_loss > val_loss and epoch > min_epochs:
                val_loss_counter += 1
                print(f"Validation loss did not decrease for {val_loss_counter}"
                        f" consecutive check(s)")
            else:
                print("Validation loss decreased. Yay!!")
                val_loss_counter = 0
                val_loss = avg_epoch_loss
                ##### updating best result for optuna study #####
                result = open("logs/result.txt", "w" )
                result.write(str(avg_epoch_loss))
                result.close() 
                ###########################################
            validate.validate(dataloader=validate_data_loader, model=model, device=device, step=step[0], bbox_all=False,debug_mode=debug_mode)
            if val_loss_counter == val_tolerance:
                print("Validation loss stopped decreasing over the last " + str(val_tolerance) + " checkpoints, creating onnx file")
                with tempfile.NamedTemporaryFile() as tmpfile:
                    model.save_weights(tmpfile.name)
                    weights_name = tmpfile.name
                    cfg_name = os.path.join(tempfile.gettempdir(), model_cfg.split('/')[-1].split('.')[0] + '.tmp')
                    onnx_gen = subprocess.call(['python3', 'yolo2onnx.py', '--cfg_name', cfg_name, '--weights_name', weights_name])
                    save_weights_uri = os.path.join(output_uri, onnx_name)
                    os.rename(weights_name, save_weights_uri)
                    try:
                        os.remove(onnx_name)
                        os.remove(cfg_name)
                    except:
                        pass
                break

Our full dataset accuracy metrics for detecting traffic cones on the racing track:

| mAP | Recall | Precision |
|----|----|----|
| 89.35% | 92.77% | 86.94% |

## 3. Inference

Download target video file for inference

In [12]:
! gsutil cp -p gs://mit-driverless-open-source/test_yolo_video.mp4 ./
# ! wget https://storage.googleapis.com/mit-driverless-open-source/test_yolo_video.mp4

! ffmpeg -i test_yolo_video.mp4 test.mp4 && rm test_yolo_video.mp4

Copying gs://mit-driverless-open-source/test_yolo_video.mp4...
/ [1 files][  2.9 MiB/  2.9 MiB]                                                
Operation completed over 1 objects/2.9 MiB.                                      
ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
  configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy

In [None]:
from IPython.display import HTML
from base64 import b64encode

video_path = 'test.mp4'

mp4 = open(video_path,'rb').read()
decoded_vid = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML(f'<video width=400 controls><source src={decoded_vid} type="video/mp4"></video>')

Download pretrained weights for inference

In [None]:
"""
! wget https://storage.googleapis.com/mit-driverless-open-source/pretrained_yolo.weights
"""
# This should already have been downloaded from the cells above
! gsutil cp -p gs://mit-driverless-open-source/pretrained_yolo.weights ./yolo_weights/


Copying gs://mit-driverless-open-source/pretrained_yolo.weights...
\ [1 files][236.5 MiB/236.5 MiB]                                                
Operation completed over 1 objects/236.5 MiB.                                    


Import all packages for inference

In [13]:
import os
from os.path import isfile, join
import copy
import cv2
from tensorboardX import SummaryWriter
from PIL import Image, ImageDraw
import torchvision
from utils.nms import nms
from utils.utils import calculate_padding
from yolo_tutorial_util import single_img_detect, detect
from tqdm import tqdm

In [14]:
warnings.filterwarnings("ignore")
detection_tmp_path = "/tmp/detect/"

Set up config file for inference

In [15]:
target_path = "test.mp4"
output_path = "outputs/visualization/"
weights_path = "yolo_weights/pretrained_yolo.weights"
conf_thres = float(0.8)
nms_thres = float(0.25)

Cell to use image instead of video

In [16]:
target_path = "./dataset/YOLO_Dataset/vid_100_frame_101.jpg"

In [18]:
cuda = torch.cuda.is_available()
device = torch.device('cuda:0' if cuda else 'cpu')
random.seed(0)
torch.manual_seed(0)
if cuda:
    torch.cuda.manual_seed(0)
    torch.cuda.manual_seed_all(0)
    torch.backends.cudnn.benchmark = True
    torch.cuda.empty_cache()
model = Darknet(config_path=model_cfg,xy_loss=xy_loss,wh_loss=wh_loss,no_object_loss=no_object_loss,object_loss=object_loss,vanilla_anchor=vanilla_anchor)

# Load weights
model.load_weights(weights_path, model.get_start_weight_dim())
model.to(device, non_blocking=True)

detect(target_path, output_path, model, device=device, conf_thres=conf_thres, nms_thres=nms_thres, detection_tmp_path=detection_tmp_path)

Detection Mode is: image


AttributeError: ignored

In [None]:
! cd outputs/visualization/ && ffmpeg -i test.mp4 output.mp4 && rm test.mp4 && cd ../..

video_path = "outputs/visualization/output.mp4"

mp4 = open(video_path,'rb').read()
decoded_vid = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML(f'<video width=400 controls><source src={decoded_vid} type="video/mp4"></video>')

**Notice:** Again, you can further improve the accuracy of the cone detection network by switching YOLOv3 backbone to the most recent published YOLOv4

<p align="center">
<img src="https://user-images.githubusercontent.com/22118253/70950893-e2de6980-202f-11ea-9a16-399579926ee5.gif" width="600">
</p>

Congratulations! You've finished all the content of this tutorial!
Hope you enjoy playing with the our object detection model. If you are interested,  please refer to our paper and GitHub Repo for further details.

## Reference
[1] Kieran Strobel, Sibo Zhu, Raphael Chang and Skanda Koppula.
**Accurate, Low-Latency Visual Perception for Autonomous Racing:Challenges, Mechanisms, and Practical Solutions**. In *IROS* 2020.
[[paper]](https://arxiv.org/abs/2007.13971), [[code]](https://github.com/cv-core/MIT-Driverless-CV-TrainingInfra).