# Formatting labels as polygonal segments
We consider the data exported from PlainSight in COCO format, reformat this into YOLOv5 format and plot the resulting bounding boxes as a sanity check. We also combine the seperate exports from Pete and Thomas.

In [1]:
import os
import sys

# Root directory
root_dir = '/Users/Holmes/Research/Projects/vespalert'
os.chdir(root_dir)  # Move to root_dir
sys.path.insert(0, root_dir)

# Data directory
data_dir = os.path.join(root_dir, 'datasets/polygons-21')
os.listdir(data_dir)

# Automatically reload imported programmes
%load_ext autoreload
%autoreload 2

## Create YOLOv5 annotations from JSON files
This first function reads the json_file in the list and extracts the annotations attached to each image file. The output is a list of dictionaries, one for each image.

Next it creates a temporary directory ann in which to store the YOLOv5 formatted annotations. We optionally expand the box sides by a proportion of the tight-fitting length; this helps include the legs.

In [None]:
from formatting.polygons import yolov5_polygons_from_json
json_files = [
    'train-kitchen-a.json',
    'train-pete-a.json',
    'train-thomas-a.json',
    'train-thomas-b.json',
    'test-kitchen-a.json',
    'test-pete-a.json',
    'test-thomas-a.json',
    'test-thomas-b.json',
    'validation-kitchen-a.json',
    'validation-pete-a.json',
    'validation-thomas-a.json',
    'validation-thomas-b.json',
]
dict_list = yolov5_polygons_from_json(data_dir, json_files)

In [None]:
dict_list[0]

## Split into data subsets
Next we split the image and annotation files into an 80:10:10 split: `train`, `val` and `test` as subdirectories of the new folders `images` and `labels`.

In [None]:
from formatting.boxes import split_train_val_test

split_train_val_test(data_dir)

Optionally, store locations of the `train`, `val` and `test` files in a locally pointing YAML for the model to read. 

In [None]:
from formatting.boxes import write_yaml

yaml_name = 'config-local.yaml'
write_yaml(data_dir, yaml_name)

## Plot example annotated images
Vespa crabro is segmented in *yellow*;
Vespa velutina in *red*.

In [None]:
import glob
import random
from formatting.polygons import check_polys
%matplotlib inline

annotation_files = glob.glob(os.path.join(data_dir, 'labels/train/*.txt'))

# Randomly choose annotation files to overlay onto images
random.seed(0)
selection = random.choices(annotation_files, k=10)


# `plot_bounding_boxes` uses PIL.ImageDraw: if problem, set `print_labels=False`.
os.makedirs(os.path.join(data_dir, 'examples'), exist_ok=True)
for file in selection:
    print('File name:', file)
    fig, file_name = check_polys(file, print_labels=True)
    fig.savefig(os.path.join(
            data_dir, 'examples', 'labelled-' + os.path.basename(file_name)
        ))
