# DSI18 Capstone Project:
## Drone detection in images and videos using YOLOv5

credits to <a href = "https://github.com/ultralytics/yolov5">ultralytics</a> and <a href ="https://roboflow.com/">Roboflow</a> for the YOLOv5 and Google Colab code, and to <a href = "https://github.com/chuanenlin/drone-net"> chuan en lin </a> for the dataset

---

# Problem Statement
There is an increasing availability of small, cheap unmanned aerial vehicles equipped with cameras. As such, there is a corresponding requirement for a low-cost, quick and easily deployable system to alert security stakeholders of the presence of such unmanned vehicles so that further actions can be taken.

# Executive Summary
A dataset of 2492 drone images were uploaded to Roboflow. The dataset was then examined and any incorrect bounding boxes corrected and inappropriate images removed. The images were then further transformed, resulting in a final dataset of 4984 images. A custom YOLOv5m model was then trained on the dataset using Google Colab, resulting in a high score of 0.985 mAP. The best weights were then saved, downloaded and utilized to create a standalone deployment model. 

---

# Part 1 - EDA / Data Cleaning

A drone image dataset was obtained from Chuan En Lin (credited at the start of the notebook). The dataset consisted of 2664 images along with associated labels that were normalized to image size, as per YOLO image labeling requiremnets. 

However, a quick examination of the dataset revealed several issues that require further rectification: 
- Several of the label text files had multiple bounding boxes, which did not correlate to the location of the drones.
- Several of the images had incorrect bounding boxes. 
- Several images were not representative of the problem statements. 

The images and labels were uploaded to <a href="https://roboflow.ai">roboflow</a>, an online image store/labeller for image detection datasets.

The dataset was then manually examined, and images were removed/bounding boxes were rectified as required. This resulted in 2492 remaining images. Roboflow also automated the creation of transformed images, as well as train/validation/test splits. 

This reuslted in a final dataset of 4984 images (3508 train, 994 validation, 498 test).

Unfortunately due to size constraints for free accounts, the final dataset had to be removed from roboflow, but will be available in the repo.

### Multiple bounding boxes

The code below was written to strip the label txt files of additional bounding boxes.

In [None]:
# imports
from os import listdir
from os.path import isfile, join

import pandas as pd
from PIL import Image

In [None]:
# obtaining the file paths
mypath = "raw_data/drones/normalized-labels/"

In [None]:
# creating a list of files and removing the ".DS_Store" file (a Mac only issue)
filelist = [i for i in listdir(mypath)]
filelist.remove('.DS_Store')

In [None]:
# stripping lines after the first line
for i in filelist:
    f = open(mypath + i, "r+")
    text = f.read()
    result = ("drone" + text[1:])
    f.seek(0)
    f.write(result)
    f.close()

---
## Incorrect bounding boxes

The images below are examples of images that were incorrectly bounded. These bounding boxes were manually re-drawn in roboflow.
<br>
<br>
<img src = "images/bb1.png">
<img src = "images/bb2.png">

---
## Misrepresentative images

The images below are examples of images that were misrepresentative of the data needed. Images in this category include extreme closeups of drones or bounding boxes that include items other than drones. These images were selective re-bounded or removed. 
<br>
<br>
<img src = "images/wc1.png">
<img src = "images/wc2.png">
---

# Part 2 - Model Training and Results

The training of the model was done using google colab.

Please refer to the colab workbook link <a href = "https://colab.research.google.com/drive/1KI_WCsNCKvP-nLvOPKc6DnrUh6_0aEPt?usp=sharing">here</a>. 

Do note that the direct link for the image dataset from roboflow was removed due to security concerns, so the notebook cannot be run as-is.

---

# Part 3 - Deployment

The cloned YOLOv5 repo, while not large, consistes of a number of extraneous files that were removed (e.g. train.py, test.py). A custom GUI, as well as new input and output locations, were then added to the production model's detect.py file. The default weights and values were also amended accordingly. The requirements.txt file was also amended.

The final size of the application is 42.8 mb.

## Example detect.py code

In [None]:
import argparse
import time
from pathlib import Path

from gooey import Gooey

import cv2
import torch
import torch.backends.cudnn as cudnn
from numpy import random

from models.experimental import attempt_load
from utils.datasets import LoadStreams, LoadImages
from utils.general import check_img_size, check_requirements, non_max_suppression, scale_coords, \
    xyxy2xywh, set_logging, increment_path
from utils.plots import plot_one_box
from utils.torch_utils import select_device, time_synchronized


def detect(opt, save_img=False):
    source, weights, view_img, save_txt, imgsz = opt.source, opt.weights, opt.view_img, opt.save_txt, opt.img_size
    webcam = source.isnumeric() or source.endswith('.txt') or source.lower().startswith(
        ('rtsp://', 'rtmp://', 'http://'))

    # Directories
    save_dir = Path(increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok))  # increment run
    (save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True, exist_ok=True)  # make dir

    # Initialize
    set_logging()
    device = select_device(opt.device)
    half = device.type != 'cpu'  # half precision only supported on CUDA

    # Load model
    model = attempt_load(weights, map_location=device)  # load FP32 model
    imgsz = check_img_size(imgsz, s=model.stride.max())  # check img_size
    if half:
        model.half()  # to FP16

    # Set Dataloader
    vid_path, vid_writer = None, None
    if webcam:
        view_img = True
        cudnn.benchmark = True  # set True to speed up constant image size inference
        dataset = LoadStreams(source, img_size=imgsz)
    else:
        save_img = True
        dataset = LoadImages(source, img_size=imgsz)

    # Get names and colors
    names = model.module.names if hasattr(model, 'module') else model.names
    colors = [[random.randint(0, 255) for _ in range(3)] for _ in names]

    # Run inference
    t0 = time.time()
    img = torch.zeros((1, 3, imgsz, imgsz), device=device)  # init img
    _ = model(img.half() if half else img) if device.type != 'cpu' else None  # run once
    for path, img, im0s, vid_cap in dataset:
        img = torch.from_numpy(img).to(device)
        img = img.half() if half else img.float()  # uint8 to fp16/32
        img /= 255.0  # 0 - 255 to 0.0 - 1.0
        if img.ndimension() == 3:
            img = img.unsqueeze(0)

        # Inference
        t1 = time_synchronized()
        pred = model(img)[0]

        # Apply NMS
        pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres, classes=opt.classes, agnostic=opt.agnostic_nms)
        t2 = time_synchronized()

        # Process detections
        for i, det in enumerate(pred):  # detections per image
            if webcam:  # batch_size >= 1
                p, s, im0, frame = path[i], '%g: ' % i, im0s[i].copy(), dataset.count
            else:
                p, s, im0, frame = path, '', im0s, getattr(dataset, 'frame', 0)

            p = Path(p)  # to Path
            save_path = str(save_dir / p.name)  # img.jpg
            txt_path = str(save_dir / 'labels' / p.stem) + ('' if dataset.mode == 'image' else f'_{frame}')  # img.txt
            s += '%gx%g ' % img.shape[2:]  # print string
            gn = torch.tensor(im0.shape)[[1, 0, 1, 0]]  # normalization gain whwh
            if len(det):
                # Rescale boxes from img_size to im0 size
                det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()

                # Print results
                for c in det[:, -1].unique():
                    n = (det[:, -1] == c).sum()  # detections per class
                    s += f'{n} {names[int(c)]}s, '  # add to string

                # Write results
                for *xyxy, conf, cls in reversed(det):
                    if save_txt:  # Write to file
                        xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh
                        line = (cls, *xywh, conf) if opt.save_conf else (cls, *xywh)  # label format
                        with open(txt_path + '.txt', 'a') as f:
                            f.write(('%g ' * len(line)).rstrip() % line + '\n')

                    if save_img or view_img:  # Add bbox to image
                        label = f'{names[int(cls)]} {conf:.2f}'
                        plot_one_box(xyxy, im0, label=label, color=colors[int(cls)], line_thickness=3)

            # Print time (inference + NMS)
            print(f'{s}Done. ({t2 - t1:.3f}s)')

            # Stream results
            if view_img:
                cv2.imshow(str(p), im0)

            # Save results (image with detections)
            if save_img:
                if dataset.mode == 'image':
                    cv2.imwrite(save_path, im0)
                else:  # 'video'
                    if vid_path != save_path:  # new video
                        vid_path = save_path
                        if isinstance(vid_writer, cv2.VideoWriter):
                            vid_writer.release()  # release previous video writer

                        fourcc = 'mp4v'  # output video codec
                        fps = vid_cap.get(cv2.CAP_PROP_FPS)
                        w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
                        h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
                        vid_writer = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*fourcc), fps, (w, h))
                    vid_writer.write(im0)

    if save_txt or save_img:
        s = f"\n{len(list(save_dir.glob('labels/*.txt')))} labels saved to {save_dir / 'labels'}" if save_txt else ''
        print(f"Results saved to {save_dir}{s}")

    print(f'Done. ({time.time() - t0:.3f}s)')

@Gooey()
def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--weights', nargs='+', type=str, default='weights.pt', help='model.pt path(s)')
    parser.add_argument('--source', type=str, default='../input', help='source')  # file/folder, 0 for webcam
    parser.add_argument('--img-size', type=int, default=640, help='inference size (pixels)')
    parser.add_argument('--conf-thres', type=float, default=0.70, help='object confidence threshold')
    parser.add_argument('--iou-thres', type=float, default=0.45, help='IOU threshold for NMS')
    parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
    parser.add_argument('--view-img', action='store_true', help='display results')
    parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
    parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels')
    parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --class 0, or --class 0 2 3')
    parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')
    parser.add_argument('--project', default='../output', help='save results to project/name')
    parser.add_argument('--name', default='exp', help='save results to project/name')
    parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
    opt = parser.parse_args()
    print(opt)
    check_requirements()

    detect(opt = opt)

if __name__ == '__main__':
    main()

---

# Part 4 - Conclusion / Discussion

## Misclassification examination

While the model was successful in classifying and localizing drones, several issues became apparent when examining the misclassifications. 

Firstly, the model seemed to have trouble with blurry images.

<img src = "images/pred1.jpg">
<br>
<br>
<br>
Secondly, the model has issues with non-DJI drones.

<img src = "images/predlow.jpg">
<br>
<br>
A further examination of the dataset leads to the conclusion that these misclassifications were casued by insufficient data in the dataset. While the dataset consisted of a large number of images, they mostly consisted of DJI drones (mostly featuing 4 rotors and an underslung camera) that were shot at similar angles and ranges. 



## Further Research/ Production Tasks

While the production model works as a standalone system, more work will need to be done to tie in the detections with a customer's alerting system/workflow. 

Additionally, the availability of several versions of YOLOv5 means that any such model might have to be retrained on bigger/smaller networks to suit the customer's hardware requriements and operational needs. Further tweaking of both the IOU and confidence thresholds might also be a requirement. 

Finally, if possible, a larger dataset should be gathered of drones at different ranges in different orientations, as well as non-DJI drones. This will increase the generalizability of the final model.