# How to Train Your Own Cone Detection Networks

![](https://user-images.githubusercontent.com/22118253/70957091-fe06a480-2042-11ea-8c06-0fcc549fc19a.png)

In this notebook, we will demonstrate 
- how to train your own YOLOv3-based traffic cone detection network and do inference on a video.

**[Accurate Low Latency Visual Perception for Autonomous Racing: Challenges Mechanisms and Practical Solutions](https://github.com/mit-han-lab/once-for-all)** is an accurate low latency visual perception system introduced by Kieran Strobel, Sibo Zhu, Raphael Chang, and Skanda Koppula.

## 1. Preparation
Let's first install all the required packages:

In [None]:
! sudo apt install unzip
print('Installing PyTorch...')
! pip3 install torch 
print('Installing torchvision...')
! pip3 install torchvision 
print('Installing numpy...')
! pip3 install numpy 
# tqdm is a package for displaying a progress bar.
print('Installing tqdm (progress bar) ...')
! pip3 install tqdm 
print('Installing matplotlib...')
! pip3 install matplotlib 
print('Installing all the other required packages once for all')
! sudo python3 setup.py install
print('Installing video editor')
! sudo apt install ffmpeg -y 

Before we start training, let's download the Cone Detection dataset and the corresponding label and intial training weights. 

In [None]:
print("Downloading Training Dataset")
! wget https://storage.googleapis.com/mit-driverless-open-source/YOLO_Dataset.zip
! unzip YOLO_Dataset.zip
! mv YOLO_Dataset dataset/ && rm YOLO_Dataset.zip
print("Downloading YOLOv3 Sample Weights")
! wget https://storage.googleapis.com/mit-driverless-open-source/yolov3-training/sample-yolov3.weights 
print("Downloading Training and Validation Label")
! cd dataset/ && wget https://storage.googleapis.com/mit-driverless-open-source/yolov3-training/all.csv && cd ..
! cd dataset/ && wget https://storage.googleapis.com/mit-driverless-open-source/yolov3-training/train.csv && cd ..
! cd dataset/ && wget https://storage.googleapis.com/mit-driverless-open-source/yolov3-training/validate.csv && cd ..

## 2. Using Pretrained YOLOv3 Weights File to Start Training


First, import all the packages used in this tutorial:

In [6]:
import os
import random
import tempfile
import time
import multiprocessing
import subprocess
import math
import shutil
import math

from datetime import datetime
import torch
import torch.nn as nn
from torch.utils.data import DataLoader

from models import Darknet
from utils.datasets import ImageLabelDataset
from utils.utils import model_info, print_args, Logger, visualize_and_save_to_local,xywh2xyxy
import validate
import warnings
import sys

##### section for all random seeds #####
torch.manual_seed(17)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
########################################

warnings.filterwarnings("ignore")
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

cuda = torch.cuda.is_available()
device = torch.device('cuda:0' if cuda else 'cpu')
num_cpu = multiprocessing.cpu_count() if cuda else 0
if cuda:
    torch.cuda.synchronize()
random.seed(0)
torch.manual_seed(0)

if cuda:
    torch.cuda.manual_seed(0)
    torch.cuda.manual_seed_all(0)
    torch.backends.cudnn.benchmark = True
    torch.cuda.empty_cache()

Successfully imported all packages and configured random seed to 17!

In [7]:
def run_epoch(label_prefix, data_loader, num_steps, optimizer, model, epoch, num_epochs, step):
    print(f"Model in {label_prefix} mode")
    epoch_losses = [0.0] * 7
    epoch_time_total = 0.0
    epoch_num_targets = 1e-12
    t1 = time.time()
    loss_labels = ["Total", "L-x", "L-y", "L-w", "L-h", "L-noobj", "L-obj"]
    for i, (img_uri, imgs, targets) in enumerate(data_loader):
        if step[0] >= num_steps:
            break
        imgs = imgs.to(device, non_blocking=True)
        targets = targets.to(device, non_blocking=True)
        targets.requires_grad_(False)
        step_num_targets = ((targets[:, :, 1:5] > 0).sum(dim=2) > 1).sum().item() + 1e-12
        epoch_num_targets += step_num_targets
        # Compute loss, compute gradient, update parameters
        if optimizer is not None:
            optimizer.zero_grad()
        losses = model(imgs, targets)
        if label_prefix == "train":
            losses[0].sum().backward()
        if optimizer is not None:
            optimizer.step()

        for j, (label, loss) in enumerate(zip(loss_labels, losses)):
            batch_loss = loss.sum().to('cpu').item()
            epoch_losses[j] += batch_loss
        finished_time = time.time()
        step_time_total = finished_time - t1
        epoch_time_total += step_time_total
        
        statement = label_prefix + ' Epoch: ' + str(epoch) + ', Batch: ' + str(i + 1) + '/' + str(len(data_loader))
        count = 0
        for (loss_label, loss) in zip(loss_labels, losses):
            if count == 0:
                statement += ', Total: ' + '{0:10.6f}'.format(loss.item() / step_num_targets)
                tot_loss = loss.item()
                count += 1
            else:
                statement += ',   ' + loss_label + ': {0:5.2f}'.format(loss.item() / tot_loss * 100) + '%'
        print(statement)
        if label_prefix == "train":
            step[0] += 1
    return epoch_losses, epoch_time_total, epoch_num_targets

Training Config

In [8]:
evaluate = False
batch_size = int(5)
optimizer_pick = "Adam"
model_cfg = "model_cfg/yolo_baseline.cfg"
weights_path = "sample-yolov3.weights"
output_path = "automatic"
dataset_path = "dataset/YOLO_Dataset/"
num_epochs = int(2048)
num_steps = 8388608
checkpoint_interval = int(1)
augment_affine = False
augment_hsv = False
lr_flip = False
ud_flip = False
momentum = float(0.9)
gamma = float(0.95)
lr = float(0.001)
weight_decay = float(0.0)
vis_batch = int(0)
data_aug = False
blur = False
salt = False
noise = False
contrast = False
sharpen = False
ts = True
debug_mode = False
upload_dataset = False
xy_loss = float(2)
wh_loss= float(1.6)
no_object_loss = float(25)
object_loss = float(0.1)
vanilla_anchor = False
val_tolerance = int(3)
min_epochs = int(3)

In [9]:
input_arguments = list(locals().items())

print("Initializing model")
model = Darknet(config_path=model_cfg,xy_loss=xy_loss,wh_loss=wh_loss,no_object_loss=no_object_loss,object_loss=object_loss,vanilla_anchor=vanilla_anchor)
img_width, img_height = model.img_size()
bw  = model.get_bw()
validate_uri, train_uri = model.get_links()

if output_path == "automatic":
    current_month = datetime.now().strftime('%B').lower()
    current_year = str(datetime.now().year)
    if not os.path.exists(os.path.join('outputs/', current_month + '-' + current_year + '-experiments/' + model_cfg.split('.')[0].split('/')[-1])):
        os.makedirs(os.path.join('outputs/', current_month + '-' + current_year + '-experiments/' + model_cfg.split('.')[0].split('/')[-1]))
    output_uri = os.path.join('outputs/', current_month + '-' + current_year + '-experiments/' + model_cfg.split('.')[0].split('/')[-1])
else:
    output_uri = output_path

num_validate_images, num_train_images = model.num_images()
conf_thresh, nms_thresh, iou_thresh = model.get_threshs()
num_classes = model.get_num_classes()
loss_constant = model.get_loss_constant()
conv_activation = model.get_conv_activation()
anchors = model.get_anchors()
onnx_name = model.get_onnx_name()

with tempfile.TemporaryDirectory() as tensorboard_data_dir:
    print("Initializing data loaders")
    train_data_loader = torch.utils.data.DataLoader(
        ImageLabelDataset(train_uri, dataset_path=dataset_path, width=img_width, height=img_height, augment_hsv=augment_hsv,
                            augment_affine=augment_affine, num_images=num_train_images,
                            bw=bw, n_cpu=num_cpu, lr_flip=lr_flip, ud_flip=ud_flip,vis_batch=vis_batch,data_aug=data_aug,blur=blur,salt=salt,noise=noise,contrast=contrast,sharpen=sharpen,ts=ts,debug_mode=debug_mode, upload_dataset=upload_dataset),
        batch_size=(1 if debug_mode else batch_size),
        shuffle=(False if debug_mode else True),
        num_workers=(0 if vis_batch else num_cpu),
        pin_memory=cuda)
    print("Num train images: ", len(train_data_loader.dataset))

    validate_data_loader = torch.utils.data.DataLoader(
        ImageLabelDataset(validate_uri, dataset_path=dataset_path, width=img_width, height=img_height, augment_hsv=False,
                            augment_affine=False, num_images=num_validate_images,
                            bw=bw, n_cpu=num_cpu, lr_flip=False, ud_flip=False,vis_batch=vis_batch,data_aug=False,blur=False,salt=False,noise=False,contrast=False,sharpen=False,ts=ts,debug_mode=debug_mode, upload_dataset=upload_dataset),
        batch_size=(1 if debug_mode else batch_size),
        shuffle=False,
        num_workers=(0 if vis_batch else num_cpu),
        pin_memory=cuda)
    print("Num validate images: ", len(validate_data_loader.dataset))

    ##### additional configuration #####
    print("Training batch size: " + str(batch_size))
    
    print("Checkpoint interval: " + str(checkpoint_interval))

    print("Loss constants: " + str(loss_constant))

    print("Anchor boxes: " + str(anchors))

    print("Training image width: " + str(img_width))

    print("Training image height: " + str(img_height))

    print("Confidence Threshold: " + str(conf_thresh))

    print("Number of training classes: " + str(num_classes))

    print("Conv activation type: " + str(conv_activation))

    print("Starting learning rate: " + str(lr))

    if ts:
        print("Tile and scale mode [on]")
    else:
        print("Tile and scale mode [off]")

    if data_aug:
        print("Data augmentation mode [on]")
    else:
        print("Data augmentation mode [off]")

    ####################################

    start_epoch = 0

    weights_path = weights_path
    if optimizer_pick == "Adam":
        print("Using Adam Optimizer")
        optimizer = torch.optim.Adam(filter(lambda p: p.requires_grad, model.parameters()),
                                    lr=lr, weight_decay=weight_decay)
    elif optimizer_pick == "SGD":
        print("Using SGD Optimizer")
        optimizer = torch.optim.SGD(filter(lambda p: p.requires_grad, model.parameters()),
                                lr=lr, momentum=momentum, weight_decay=weight_decay)
    else:
        raise Exception(f"Invalid optimizer name: {optimizer_pick}")
    print("Loading weights")
    model.load_weights(weights_path, model.get_start_weight_dim())

    if torch.cuda.device_count() > 1:
        print('Using ', torch.cuda.device_count(), ' GPUs')
        model = nn.DataParallel(model)
    model = model.to(device, non_blocking=True)

    # Set scheduler
    scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1, gamma=gamma)

    val_loss = 999  # using a high number for validation loss
    val_loss_counter = 0
    step = [0]  # wrapping in an array so it is mutable
    epoch = start_epoch
    while epoch < num_epochs and step[0] < num_steps and not evaluate:
        epoch += 1
        scheduler.step()
        model.train()
        run_epoch(label_prefix="train", data_loader=train_data_loader, epoch=epoch,
                    step=step, model=model, num_epochs=num_epochs, num_steps=num_steps,
                    optimizer=optimizer)
        print('Completed epoch: ', epoch)
        # Update best loss
        if epoch % checkpoint_interval == 0 or epoch == num_epochs or step[0] >= num_steps:
            # First, save the weights
            save_weights_uri = os.path.join(output_uri, "{epoch}.weights".format(epoch=epoch))
            model.save_weights(save_weights_uri)

            with torch.no_grad():
                print("Calculating loss on validate data")
                epoch_losses, epoch_time_total, epoch_num_targets = run_epoch(
                    label_prefix="validate", data_loader=validate_data_loader, epoch=epoch,
                    model=model, num_epochs=num_epochs, num_steps=num_steps, optimizer=None,
                    step=step)
                avg_epoch_loss = epoch_losses[0] / epoch_num_targets
                print('Average Validation Loss: {0:10.6f}'.format(avg_epoch_loss))

                if avg_epoch_loss > val_loss and epoch > min_epochs:
                    val_loss_counter += 1
                    print(f"Validation loss did not decrease for {val_loss_counter}"
                            f" consecutive check(s)")
                else:
                    print("Validation loss decreased. Yay!!")
                    val_loss_counter = 0
                    val_loss = avg_epoch_loss
                    ##### updating best result for optuna study #####
                    result = open("logs/result.txt", "w" )
                    result.write(str(avg_epoch_loss))
                    result.close() 
                    ###########################################
                validate.validate(dataloader=validate_data_loader, model=model, device=device, step=step[0], bbox_all=False,debug_mode=debug_mode)
                if val_loss_counter == val_tolerance:
                    print("Validation loss stopped decreasing over the last " + str(val_tolerance) + " checkpoints, creating onnx file")
                    with tempfile.NamedTemporaryFile() as tmpfile:
                        model.save_weights(tmpfile.name)
                        weights_name = tmpfile.name
                        cfg_name = os.path.join(tempfile.gettempdir(), model_cfg.split('/')[-1].split('.')[0] + '.tmp')
                        onnx_gen = subprocess.call(['python3', 'yolo2onnx.py', '--cfg_name', cfg_name, '--weights_name', weights_name])
                        save_weights_uri = os.path.join(output_uri, onnx_name)
                        os.rename(weights_name, save_weights_uri)
                        try:
                            os.remove(onnx_name)
                        except:
                            pass
                        os.remove(cfg_name)
                    break
    if evaluate:
        validation = validate.validate(dataloader=validate_data_loader, model=model, device=device, step=-1, bbox_all=False, tensorboard_writer=None,debug_mode=debug_mode)
return val_loss


Initializing model
Initializing data loaders
Num train images:  15132
Num validate images:  3077
Training batch size: 5
Checkpoint interval: 1
Loss constants: [2.0, 1.6, 25.0, 0.1]
Anchor boxes: [[13.477251565102941, 11.575637337941961], [22.26137229153354, 17.66822633636613], [33.064472956296804, 25.977976230935123], [46.576946122512396, 36.33784567190777], [62.933468143405676, 48.888459798286966], [81.89252731448791, 65.28682918216337], [108.42925619967495, 83.93800920523023], [151.85350492775095, 108.38056296607171], [224.56576162251665, 148.38432899717495]]
Training image width: 800
Training image height: 800
Confidence Threshold: 0.8
Number of training classes: 80
Conv activation type: leaky
Starting learning rate: 0.001
Tile and scale mode [on]
Data augmentation mode [off]
Using Adam Optimizer
Loading weights
Model in train mode
train Epoch: 1, Batch: 1/3027, Total:   0.140322,   L-x:  1.75%,   L-y:  1.65%,   L-w: 39.55%,   L-h: 16.81%,   L-noobj:  5.19%,   L-obj: 35.06%
train Ep

train Epoch: 1, Batch: 52/3027, Total:   0.054506,   L-x: 10.77%,   L-y:  7.58%,   L-w: 28.01%,   L-h: 23.82%,   L-noobj: 17.99%,   L-obj: 11.82%
train Epoch: 1, Batch: 53/3027, Total:   0.070273,   L-x: 10.07%,   L-y:  7.80%,   L-w: 30.77%,   L-h: 17.91%,   L-noobj: 16.89%,   L-obj: 16.56%
train Epoch: 1, Batch: 54/3027, Total:   0.048401,   L-x:  6.80%,   L-y:  4.91%,   L-w: 29.45%,   L-h: 29.46%,   L-noobj: 15.74%,   L-obj: 13.64%
train Epoch: 1, Batch: 55/3027, Total:   0.100840,   L-x:  3.98%,   L-y:  5.65%,   L-w: 23.96%,   L-h: 48.14%,   L-noobj: 10.84%,   L-obj:  7.43%
train Epoch: 1, Batch: 56/3027, Total:   0.036935,   L-x: 10.56%,   L-y:  7.83%,   L-w: 24.39%,   L-h: 15.10%,   L-noobj: 28.65%,   L-obj: 13.46%
train Epoch: 1, Batch: 57/3027, Total:   9.066699,   L-x:  0.03%,   L-y:  0.04%,   L-w: 54.93%,   L-h: 44.39%,   L-noobj:  0.49%,   L-obj:  0.11%
train Epoch: 1, Batch: 58/3027, Total:   0.074776,   L-x:  3.12%,   L-y:  3.62%,   L-w: 36.13%,   L-h: 25.39%,   L-noobj: 23

train Epoch: 1, Batch: 109/3027, Total:   0.082496,   L-x:  4.76%,   L-y:  5.46%,   L-w: 39.27%,   L-h: 26.11%,   L-noobj: 16.71%,   L-obj:  7.68%
train Epoch: 1, Batch: 110/3027, Total:   0.552456,   L-x:  0.72%,   L-y:  0.40%,   L-w: 44.09%,   L-h: 48.60%,   L-noobj:  5.31%,   L-obj:  0.89%
train Epoch: 1, Batch: 111/3027, Total:   0.044107,   L-x:  4.38%,   L-y:  5.63%,   L-w: 33.30%,   L-h: 26.69%,   L-noobj:  9.20%,   L-obj: 20.80%
train Epoch: 1, Batch: 112/3027, Total:   0.093454,   L-x:  8.43%,   L-y:  9.25%,   L-w: 30.52%,   L-h: 24.27%,   L-noobj: 16.99%,   L-obj: 10.55%
train Epoch: 1, Batch: 113/3027, Total:   1.241210,   L-x:  0.45%,   L-y:  0.36%,   L-w: 42.23%,   L-h: 46.20%,   L-noobj: 10.04%,   L-obj:  0.73%
train Epoch: 1, Batch: 114/3027, Total:   0.131975,   L-x:  3.08%,   L-y:  4.01%,   L-w: 30.57%,   L-h: 33.98%,   L-noobj: 24.34%,   L-obj:  4.02%
train Epoch: 1, Batch: 115/3027, Total:   0.091512,   L-x:  8.45%,   L-y:  3.93%,   L-w: 34.04%,   L-h: 32.21%,   L-no

train Epoch: 1, Batch: 165/3027, Total:   0.374381,   L-x:  1.41%,   L-y:  1.37%,   L-w: 37.31%,   L-h: 41.14%,   L-noobj: 16.33%,   L-obj:  2.44%
train Epoch: 1, Batch: 166/3027, Total:   0.073205,   L-x: 12.83%,   L-y:  8.98%,   L-w: 28.66%,   L-h: 29.58%,   L-noobj: 12.20%,   L-obj:  7.74%
train Epoch: 1, Batch: 167/3027, Total:   0.140436,   L-x:  2.20%,   L-y:  2.82%,   L-w: 32.66%,   L-h: 43.86%,   L-noobj: 15.56%,   L-obj:  2.90%
train Epoch: 1, Batch: 168/3027, Total:   0.047532,   L-x: 10.34%,   L-y:  9.84%,   L-w: 27.53%,   L-h: 15.76%,   L-noobj: 25.22%,   L-obj: 11.29%
train Epoch: 1, Batch: 169/3027, Total:   0.060610,   L-x:  8.81%,   L-y:  9.53%,   L-w: 20.47%,   L-h: 20.42%,   L-noobj: 33.83%,   L-obj:  6.94%
train Epoch: 1, Batch: 170/3027, Total:   0.069860,   L-x:  5.45%,   L-y:  7.46%,   L-w: 30.62%,   L-h: 26.78%,   L-noobj: 20.78%,   L-obj:  8.91%
train Epoch: 1, Batch: 171/3027, Total:   0.067053,   L-x:  4.12%,   L-y:  3.22%,   L-w: 28.12%,   L-h: 29.08%,   L-no

train Epoch: 1, Batch: 221/3027, Total:   0.050320,   L-x: 10.05%,   L-y: 11.43%,   L-w: 26.46%,   L-h: 19.44%,   L-noobj: 21.22%,   L-obj: 11.40%
train Epoch: 1, Batch: 222/3027, Total:   0.062805,   L-x: 11.81%,   L-y: 11.20%,   L-w: 23.86%,   L-h: 21.54%,   L-noobj: 22.86%,   L-obj:  8.74%
train Epoch: 1, Batch: 223/3027, Total:   0.029124,   L-x: 10.71%,   L-y: 11.22%,   L-w: 36.68%,   L-h: 16.40%,   L-noobj:  8.19%,   L-obj: 16.80%
train Epoch: 1, Batch: 224/3027, Total:   0.075955,   L-x:  7.28%,   L-y:  8.37%,   L-w: 34.97%,   L-h: 24.86%,   L-noobj: 17.52%,   L-obj:  6.99%
train Epoch: 1, Batch: 225/3027, Total:   0.043580,   L-x:  8.11%,   L-y: 11.80%,   L-w: 23.24%,   L-h: 23.97%,   L-noobj: 20.59%,   L-obj: 12.30%
train Epoch: 1, Batch: 226/3027, Total:   0.029883,   L-x:  8.10%,   L-y:  7.34%,   L-w: 27.09%,   L-h: 29.16%,   L-noobj: 13.96%,   L-obj: 14.35%
train Epoch: 1, Batch: 227/3027, Total:   0.277330,   L-x:  2.70%,   L-y:  2.11%,   L-w: 30.81%,   L-h: 35.86%,   L-no

train Epoch: 1, Batch: 277/3027, Total:   0.030217,   L-x:  9.94%,   L-y: 10.15%,   L-w: 22.73%,   L-h: 12.82%,   L-noobj: 28.03%,   L-obj: 16.33%
train Epoch: 1, Batch: 278/3027, Total:   0.034928,   L-x: 12.72%,   L-y:  7.68%,   L-w: 22.94%,   L-h: 29.39%,   L-noobj: 16.72%,   L-obj: 10.56%
train Epoch: 1, Batch: 279/3027, Total:   0.041309,   L-x:  5.62%,   L-y: 11.11%,   L-w: 26.90%,   L-h: 27.61%,   L-noobj: 14.11%,   L-obj: 14.64%
train Epoch: 1, Batch: 280/3027, Total:   0.042718,   L-x:  9.29%,   L-y:  9.32%,   L-w: 25.54%,   L-h: 18.44%,   L-noobj: 19.28%,   L-obj: 18.11%
train Epoch: 1, Batch: 281/3027, Total:   0.031070,   L-x: 10.30%,   L-y:  6.69%,   L-w: 31.14%,   L-h: 20.21%,   L-noobj: 15.01%,   L-obj: 16.66%
train Epoch: 1, Batch: 282/3027, Total:   0.179202,   L-x:  3.63%,   L-y:  3.50%,   L-w: 32.42%,   L-h: 33.55%,   L-noobj: 23.56%,   L-obj:  3.34%
train Epoch: 1, Batch: 283/3027, Total:   0.148569,   L-x:  4.25%,   L-y:  4.18%,   L-w: 17.44%,   L-h: 33.17%,   L-no

train Epoch: 1, Batch: 333/3027, Total:   0.046408,   L-x:  7.35%,   L-y:  7.81%,   L-w: 29.91%,   L-h: 24.66%,   L-noobj: 22.22%,   L-obj:  8.05%
train Epoch: 1, Batch: 334/3027, Total:   0.024104,   L-x: 10.51%,   L-y:  9.81%,   L-w: 21.40%,   L-h: 19.07%,   L-noobj: 29.22%,   L-obj:  9.99%
train Epoch: 1, Batch: 335/3027, Total:   0.064102,   L-x:  8.89%,   L-y: 13.71%,   L-w: 23.16%,   L-h: 14.88%,   L-noobj: 28.70%,   L-obj: 10.66%
train Epoch: 1, Batch: 336/3027, Total:  53.450890,   L-x:  0.01%,   L-y:  0.02%,   L-w: 49.44%,   L-h: 50.09%,   L-noobj:  0.43%,   L-obj:  0.02%
train Epoch: 1, Batch: 337/3027, Total:   2.868995,   L-x:  0.22%,   L-y:  0.27%,   L-w: 49.04%,   L-h: 49.37%,   L-noobj:  1.00%,   L-obj:  0.11%
train Epoch: 1, Batch: 338/3027, Total:   0.027051,   L-x:  8.71%,   L-y:  6.06%,   L-w: 25.32%,   L-h: 22.86%,   L-noobj: 28.36%,   L-obj:  8.68%
train Epoch: 1, Batch: 339/3027, Total:   0.046110,   L-x:  4.82%,   L-y:  8.61%,   L-w: 29.75%,   L-h: 25.24%,   L-no

train Epoch: 1, Batch: 389/3027, Total:   0.027310,   L-x: 11.69%,   L-y: 10.31%,   L-w: 24.48%,   L-h: 16.01%,   L-noobj: 24.26%,   L-obj: 13.25%
train Epoch: 1, Batch: 390/3027, Total:   0.923151,   L-x:  0.64%,   L-y:  0.38%,   L-w: 47.14%,   L-h: 48.03%,   L-noobj:  3.43%,   L-obj:  0.38%
train Epoch: 1, Batch: 391/3027, Total:   0.025775,   L-x:  8.23%,   L-y:  9.45%,   L-w: 27.80%,   L-h: 28.34%,   L-noobj: 18.76%,   L-obj:  7.41%
train Epoch: 1, Batch: 392/3027, Total:   0.064494,   L-x:  5.93%,   L-y:  8.05%,   L-w: 23.65%,   L-h: 27.83%,   L-noobj: 27.27%,   L-obj:  7.28%
train Epoch: 1, Batch: 393/3027, Total:   1.070898,   L-x:  0.36%,   L-y:  0.39%,   L-w: 47.59%,   L-h: 48.62%,   L-noobj:  2.71%,   L-obj:  0.33%
train Epoch: 1, Batch: 394/3027, Total:   2.444996,   L-x:  0.30%,   L-y:  0.43%,   L-w: 47.55%,   L-h: 48.94%,   L-noobj:  2.49%,   L-obj:  0.29%
train Epoch: 1, Batch: 395/3027, Total:   0.404301,   L-x:  1.21%,   L-y:  1.14%,   L-w: 40.85%,   L-h: 43.06%,   L-no

train Epoch: 1, Batch: 445/3027, Total:   0.044812,   L-x: 12.58%,   L-y: 16.76%,   L-w: 19.79%,   L-h: 15.04%,   L-noobj: 19.78%,   L-obj: 16.06%
train Epoch: 1, Batch: 446/3027, Total:   0.029001,   L-x:  9.79%,   L-y: 11.99%,   L-w: 26.69%,   L-h: 25.01%,   L-noobj: 13.54%,   L-obj: 12.97%
train Epoch: 1, Batch: 447/3027, Total:   0.062081,   L-x:  4.58%,   L-y:  5.51%,   L-w: 38.98%,   L-h: 34.73%,   L-noobj: 11.67%,   L-obj:  4.54%
train Epoch: 1, Batch: 448/3027, Total:   0.079656,   L-x: 10.97%,   L-y:  9.90%,   L-w: 19.39%,   L-h: 16.64%,   L-noobj: 38.16%,   L-obj:  4.93%
train Epoch: 1, Batch: 449/3027, Total:   0.022862,   L-x:  9.72%,   L-y:  9.21%,   L-w: 24.65%,   L-h: 18.48%,   L-noobj: 24.10%,   L-obj: 13.84%
train Epoch: 1, Batch: 450/3027, Total:   0.032049,   L-x: 14.05%,   L-y: 16.78%,   L-w: 25.03%,   L-h: 22.83%,   L-noobj: 12.08%,   L-obj:  9.22%
train Epoch: 1, Batch: 451/3027, Total:   0.075659,   L-x:  5.15%,   L-y:  3.70%,   L-w: 28.53%,   L-h: 22.13%,   L-no

KeyboardInterrupt: 

## 3. Inference

Download target video file for inference

In [10]:
from IPython.display import Video

! wget https://storage.googleapis.com/mit-driverless-open-source/test_video.mp4

! ffmpeg -i test_video.mp4 test.mp4 && rm test_video.mp4

Video("test.mp4")

--2020-08-25 13:54:22--  https://storage.googleapis.com/mit-driverless-open-source/test_video.mp4
Resolving storage.googleapis.com (storage.googleapis.com)... 172.217.212.128, 172.217.214.128, 172.253.119.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|172.217.212.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 12062655 (12M) [video/mp4]
Saving to: ‘test_video.mp4’


2020-08-25 13:54:22 (273 MB/s) - ‘test_video.mp4’ saved [12062655/12062655]

ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
  configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libf

Download pretrained weights for inference

In [11]:
! wget https://storage.googleapis.com/mit-driverless-open-source/pretrained_yolo.weights

--2020-08-25 13:54:30--  https://storage.googleapis.com/mit-driverless-open-source/pretrained_yolo.weights
Resolving storage.googleapis.com (storage.googleapis.com)... 108.177.111.128, 172.253.119.128, 172.217.214.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|108.177.111.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 248007048 (237M) [application/octet-stream]
Saving to: ‘pretrained_yolo.weights’


2020-08-25 13:54:33 (105 MB/s) - ‘pretrained_yolo.weights’ saved [248007048/248007048]



Import all packages for inference

In [12]:
import os
from os.path import isfile, join
import copy
import cv2
from tensorboardX import SummaryWriter
from PIL import Image, ImageDraw
import torchvision
from utils.nms import nms
from utils.utils import calculate_padding
from tqdm import tqdm

In [13]:
warnings.filterwarnings("ignore")
detection_tmp_path = "/tmp/detect/"

Set up config file for inference

In [14]:
target_path = "test.mp4"
output_path = "outputs/visualization/"
weights_path = "pretrained_yolo.weights"
conf_thres = float(0.8)
nms_thres = float(0.25)

In [15]:
def single_img_detect(target_path,output_path,mode,model,device,conf_thres,nms_thres):

    img = Image.open(target_path).convert('RGB')
    w, h = img.size
    new_width, new_height = model.img_size()
    pad_h, pad_w, ratio = calculate_padding(h, w, new_height, new_width)
    img = torchvision.transforms.functional.pad(img, padding=(pad_w, pad_h, pad_w, pad_h), fill=(127, 127, 127), padding_mode="constant")
    img = torchvision.transforms.functional.resize(img, (new_height, new_width))

    bw = model.get_bw()
    if bw:
        img = torchvision.transforms.functional.to_grayscale(img, num_output_channels=1)

    img = torchvision.transforms.functional.to_tensor(img)
    img = img.unsqueeze(0)
    
    with torch.no_grad():
        model.eval()
        img = img.to(device, non_blocking=True)
        # output,first_layer,second_layer,third_layer = model(img)
        output = model(img)


        for detections in output:
            detections = detections[detections[:, 4] > conf_thres]
            box_corner = torch.zeros((detections.shape[0], 4), device=detections.device)
            xy = detections[:, 0:2]
            wh = detections[:, 2:4] / 2
            box_corner[:, 0:2] = xy - wh
            box_corner[:, 2:4] = xy + wh
            probabilities = detections[:, 4]
            nms_indices = nms(box_corner, probabilities, nms_thres)
            main_box_corner = box_corner[nms_indices]
            if nms_indices.shape[0] == 0:  
                continue
        img_with_boxes = Image.open(target_path)
        draw = ImageDraw.Draw(img_with_boxes)
        w, h = img_with_boxes.size

        for i in range(len(main_box_corner)):
            x0 = main_box_corner[i, 0].to('cpu').item() / ratio - pad_w
            y0 = main_box_corner[i, 1].to('cpu').item() / ratio - pad_h
            x1 = main_box_corner[i, 2].to('cpu').item() / ratio - pad_w
            y1 = main_box_corner[i, 3].to('cpu').item() / ratio - pad_h 
            draw.rectangle((x0, y0, x1, y1), outline="red")

        if mode == 'image':
            img_with_boxes.save(os.path.join(output_path,target_path.split('/')[-1]))
            return os.path.join(output_path,target_path.split('/')[-1])
        else:
            img_with_boxes.save(target_path)
            return target_path

In [16]:
def detect(target_path,
           output_path,
           model,
           device,
           conf_thres,
           nms_thres):

        target_filepath = target_path

        img_formats = ['.jpg', '.jpeg', '.png', '.tif']
        vid_formats = ['.mov', '.avi', '.mp4']

        mode = None

        if os.path.splitext(target_filepath)[-1].lower() in img_formats:
            mode = 'image'
        
        elif os.path.splitext(target_filepath)[-1].lower() in vid_formats:
            mode = 'video'
        
        print("Detection Mode is: " + mode)

        raw_file_name = target_filepath.split('/')[-1].split('.')[0].split('_')[-4:]
        raw_file_name = '_'.join(raw_file_name)
        
        if mode == 'image':
            detection_path = single_img_detect(target_path=target_filepath,output_path=output_path,mode=mode,model=model,device=device,conf_thres=conf_thres,nms_thres=nms_thres)

            print(f'Please check output image at {detection_path}')

        elif mode == 'video':
            if os.path.exists(detection_tmp_path):
                shutil.rmtree(detection_tmp_path)  # delete output folder
            os.makedirs(detection_tmp_path)  # make new output folder

            vidcap = cv2.VideoCapture(target_filepath)
            success,image = vidcap.read()
            count = 0

            

            while success:
                cv2.imwrite(detection_tmp_path + "/frame%d.jpg" % count, image)     # save frame as JPEG file      
                success,image = vidcap.read()
                count += 1

            # Find OpenCV version
            (major_ver, minor_ver, subminor_ver) = (cv2.__version__).split('.')

            if int(major_ver)  < 3 :
                fps = vidcap.get(cv2.cv.CV_CAP_PROP_FPS)
                print ("Frames per second using video.get(cv2.cv.CV_CAP_PROP_FPS): {0}".format(fps))
            else :
                fps = vidcap.get(cv2.CAP_PROP_FPS)
                print ("Frames per second using video.get(cv2.CAP_PROP_FPS) : {0}".format(fps))
            vidcap.release(); 

            frame_array = []
            files = [f for f in os.listdir(detection_tmp_path) if isfile(join(detection_tmp_path, f))]
        
            #for sorting the file names properly
            files.sort(key = lambda x: int(x[5:-4]))
            for i in tqdm(files,desc='Doing Single Image Detection'):
                filename=detection_tmp_path + i
                
                detection_path = single_img_detect(target_path=filename,output_path=output_path,mode=mode,model=model,device=device,conf_thres=conf_thres,nms_thres=nms_thres)
                #reading each files
                img = cv2.imread(detection_path)
                height, width, layers = img.shape
                size = (width,height)
                frame_array.append(img)

            local_output_uri = output_path + raw_file_name + ".mp4"
            
            video_output = cv2.VideoWriter(local_output_uri,cv2.VideoWriter_fourcc(*'DIVX'), fps, size)

            for frame in tqdm(frame_array,desc='Creating Video'):
                # writing to a image array
                video_output.write(frame)
            video_output.release()
            shutil.rmtree(detection_tmp_path)

In [17]:
cuda = torch.cuda.is_available()
device = torch.device('cuda:0' if cuda else 'cpu')
random.seed(0)
torch.manual_seed(0)
if cuda:
    torch.cuda.manual_seed(0)
    torch.cuda.manual_seed_all(0)
    torch.backends.cudnn.benchmark = True
    torch.cuda.empty_cache()
model = Darknet(config_path=model_cfg,xy_loss=xy_loss,wh_loss=wh_loss,no_object_loss=no_object_loss,object_loss=object_loss,vanilla_anchor=vanilla_anchor)

# Load weights
model.load_weights(weights_path, model.get_start_weight_dim())
model.to(device, non_blocking=True)

detect(target_path,
        output_path,
        model,
        device=device,
        conf_thres=conf_thres,
        nms_thres=nms_thres)

Detection Mode is: video


Doing Single Image Detection:   0%|          | 0/646 [00:00<?, ?it/s]

Frames per second using video.get(cv2.CAP_PROP_FPS) : 26.0


Doing Single Image Detection: 100%|██████████| 646/646 [01:15<00:00,  8.55it/s]
Creating Video: 100%|██████████| 646/646 [00:03<00:00, 184.07it/s]


In [18]:
! cd outputs/visualization/ && ffmpeg -i test.mp4 output.mp4 && rm test.mp4 && cd ../..

ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
  configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lib

In [19]:
Video("outputs/visualization/output.mp4")

**Notice:** Again, you can further improve the accuracy of the cone detection network by switching YOLOv3 backbone to the most recent published YOLOv4

![](https://user-images.githubusercontent.com/22118253/70950893-e2de6980-202f-11ea-9a16-399579926ee5.gif)

Congratulations! You've finished all the content of this tutorial!
Hope you enjoy playing with the our object detection model. If you are interested,  please refer to our paper and GitHub Repo for further details.

## Reference
[1] Kieran Strobel, Sibo Zhu, Raphael Chang and Skanda Koppula.
**Accurate, Low-Latency Visual Perception for Autonomous Racing:Challenges, Mechanisms, and Practical Solutions**. In *IROS* 2020.
[[paper]](https://arxiv.org/abs/2007.13971), [[code]](https://github.com/cv-core/MIT-Driverless-CV-TrainingInfra).