# How to data parsing.

This notebook parser the data from the video tracking files into image folders with there respective labels. It does this into the following format.
```
../data/fish_tracking
├── images
├── labels
├── test_per_vid.txt
├── train_per_vid.txt
└── val_per_vid.txt
```
Where images and label folder contain the images and label with the same respective name. The text files contain the images names for the respective split. 
The current split is made on seperating videos per video folder. Each folder contains 3 camera angles and are considered to be in the same split. The annotation currently has a point annotation on the head. To create a bounding box around this point we add a radius and define a box from this. This is far from an ideal approach and should be revised in the future. The current data parser only parser the predator fish, in the future the prey could be added by changing the data loader. The reason for only selecting the predator is due to the narrowing of the task to predator first.

In [3]:
# IMPORTS
# Native
import os
import pickle
from datetime import date

# 3th party
import numpy as np
import cv2
from sklearn.model_selection import train_test_split
from progressbar import progressbar

The config variables are defined in the next cell. Please adapt in needed, however please check the influence of your adaption.

In [4]:
# General setting
today = date.today()
today = today.strftime('%Y_%m_%d')


# Directory settings
dir = "../data/tracked_fish_3_camera_angles"
destination_dir = f"../data/{today}_fish_tracking"
video_dir = "tracking_vids"
label_dir = "h5_tracking_files"
camera_angles = ["A","B","C"]

# Predator = 0
# Prey = 1
#classes = ["0", "1"]

# annotation setting
radius = 50

Initialise the output directories.

In [5]:
img_list = []

dest_img_path = os.path.join(destination_dir, "images")
dest_label_path = os.path.join(destination_dir, "labels")

# Create directories if not exists
if not os.path.exists(destination_dir):
    os.makedirs(destination_dir)
    os.makedirs(dest_img_path)
    os.makedirs(dest_label_path)


## Load loading and parsing.
This loops over the respective video and parses them in the image and label directory. This does not include the splitting because this can be subject to change. Currently the labeling is done in a coarse format and is not able to be directly used by the mmdetection framework. A future task would be to directly change this to the coco format. In this loop currently only the predator is parsed, if one would like to parse the prey and extra loop would need to be made. 

In [6]:
# Loop over all subdir
for d in progressbar(sorted(os.listdir(dir))):

    # Define paths
    sub_d = os.path.join(dir, d)
    video_path = os.path.join(sub_d, video_dir)
    annotation_path = os.path.join(sub_d, label_dir)
    
    # loop over different angles
    for angle in camera_angles:

        # Load annotation file
        annotation_file = os.path.join(annotation_path, f"{d}Corr{angle}.pkl")
        annotation_dict = pickle.load(open(annotation_file, "rb"))
        
        # Load video
        video_file = os.path.join(video_path, f"{d}_{angle}.mp4")
        cap = cv2.VideoCapture(video_file)
        
        # Loop over frames with annotation.
        # TODO: Need to be extended later on for prey
        i = 0
        for x,y in zip(annotation_dict["0"]["X"],annotation_dict["0"]["Y"]):
            
            ret, frame = cap.read()
            
            if ~np.isnan(x) & ~np.isnan(y):
                # save frame
                
                # Center coordinates is a single map_images folder but not directly align with the annotations.  The current approach is 
                center_coordinates = (int(x), int(y))
                
                # Determine bbox
                xmin = int(x) - radius
                ymin = int(y) - radius
                xmax = int(x) + radius
                ymax = int(y) + radius

                # Get frame size
                width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH ))
                height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT ))
                # fps =  cap.get(cv2.CAP_PROP_FPS)

                image = frame

                # UNCOMMENT FOR SANITY CHECK
                # Blue color in BGR
                # color = (0, 255, 0)
                
                # Line thickness of 2 px
                # thickness = 2

                # Write onto frame (sanity check)
                # image = cv2.circle(image, center_coordinates, radius, color, thickness)
                
                # Naming scheme
                idx = i
                idx = str(idx)
                idx = idx.zfill(5) 
                image_name = f"{d}_{angle}_{idx}"

                # Add img to list for data splitting
                img_list.append(image_name)
                
                # Write img
                # print(image_name)
                cv2.imwrite(f"{dest_img_path}/{image_name}.jpg", image)
                
                # write label
                label = ["predator", "0", xmin, ymin, xmax, ymax, width, height]
                with open(f'{dest_label_path}/{image_name}.txt', 'w') as f:
                    for item in label:
                        f.write("%s " % item)
                
                i+= 1 

100% (36 of 36) |########################| Elapsed Time: 0:08:55 Time:  0:08:55


In [8]:
vid_list = sorted(os.listdir(dir))
    
train, test = train_test_split(vid_list, test_size=0.2, random_state=1)
train, val = train_test_split(train, test_size=0.2, random_state=1)

print(f"Training videos: {train}")
print(f"Validation videos: {val}")
print(f"Testing videos: {test}")

def create_anno_file(data, data_dir, set_list, anno_file): 
    with open(f'{data_dir}/{anno_file}.txt', 'w') as f:
        for t in sorted(set_list):
            for d in sorted(os.listdir(data)):
                if t == d.split('_')[0]:
                    f.write("%s\n" % os.path.splitext(d)[0])

create_anno_file(dest_img_path, destination_dir, train, "train")
create_anno_file(dest_img_path, destination_dir, val, "val")
create_anno_file(dest_img_path, destination_dir, test, "test")

Training videos: ['325', '461', '131', '447', '453', '238', '204', '494', '413', '148', '248', '451', '297', '456', '239', '293', '243', '252', '306', '458', '419', '449']
Validation videos: ['283', '242', '144', '343', '373', '463']
Testing videos: ['457', '469', '455', '160', '400', '352', '417', '442']
