# Dataset Generation for Model Training

First, we add all the useful dependencies for image collection and processing.  
Be sure to be in the root directory of the project when executiong the following code cells.

In [1]:
import cv2
import time
import os
import uuid
from IPython.display import clear_output
import random
import shutil

We then define the path of the directory into which we will save and manipulate our dataset.  

In [2]:
imagePath = 'Tensorflow/Workspace/images/capturedimages'

Now we define the list of gesture classes _labels_, which will be used to generate data for the model to train onto.  
The _nImages_ variable specifies how many images of a single gesture shall be taken: the higher, the better variance you'll get in the final dataset. However, be warned that all the images for the dataset must be labeled manually, so choose wisely. We will exploit data augmentation to generate a bigger dataset with more variance, in order to improve the final model's generalization capabilities.  

In [20]:
labels = ['LeftL', 'LeftA', 'LeftO', 'LeftV', 'LeftOpenHand', 'RIndexUp', 'RIndexDown', 'RIndexLeft', 'RIndexRight', 'RIndexFront', 'RThumbBack', 'RClearC']
nImages = 25

We define some data aumentation functions to produce multiple output images from a single input.  
Since the gestures we want to recognize are hand-bound, since there are gestures exclusive to the right or left hand, we cannot perform horizontal mirroring. Vertical mirroring and rotations dont't make any sense in this case, since all the gestures are direction bound, so such transformations could make two different classes overlap with each other.  
The only useful data augmentation procedures we can exploit are thus vertical and horizontal shifts (or stretchs) and zoom transformations.  

In [18]:
def vertical_shift(input_img, ratio=0.2):
    if ratio > 1 or ratio < 0:
        print('Value should be less than 1 and greater than 0')
        return input_img
    img = input_img.copy()
    ratio = random.uniform(-ratio, ratio)
    h, w = img.shape[:2]
    to_shift = h*ratio
    if ratio > 0:
        img = img[:int(h-to_shift), :, :]
    if ratio < 0:
        img = img[int(-1*to_shift):, :, :]
    img = cv2.resize(img, (w, h), cv2.INTER_CUBIC)
    return img

def horizontal_shift(input_img, ratio=0.2):
    if ratio > 1 or ratio < 0:
        print('Value should be less than 1 and greater than 0')
        return input_img
    img = input_img.copy()
    ratio = random.uniform(-ratio, ratio)
    h, w = img.shape[:2]
    to_shift = w*ratio
    if ratio > 0:
        img = img[:, :int(w-to_shift), :]
    if ratio < 0:
        img = img[:, int(-1*to_shift):, :]
    img = cv2.resize(img, (w, h), cv2.INTER_CUBIC)
    return img

def zoom(input_img, value=0.85):
    if value > 1 or value < 0:
        print('Value for zoom should be less than 1 and greater than 0')
        return input_img
    img = input_img.copy()
    value = random.uniform(value, 1)
    h, w = img.shape[:2]
    h_taken = int(value*h)
    w_taken = int(value*w)
    h_start = random.randint(0, h-h_taken)
    w_start = random.randint(0, w-w_taken)
    img = img[h_start:h_start+h_taken, w_start:w_start+w_taken, :]
    img = cv2.resize(img, (w, h), cv2.INTER_CUBIC)
    return img

The following code box will warn you about which gesture must be performed. The delay between each frame capture is of 2 seconds, but can be freely adjusted to whatever you need.  
Before the start of the capturing procedure for each class, the user is prompted to press Enter in an input box, to improve usability and provide mid-capture pause scenarios.  

In [19]:
frame_capture_delay = 2 #seconds
for label in labels:
    !mkdir {'Tensorflow\Workspace\images\capturedimages\\' + label}
    clear_output(wait=True)
    print('Collecting images for {}'.format(label))
    cap = cv2.VideoCapture(0)
    input()
    for imageNumber in range(nImages):
        ret, frame = cap.read()
        tmpUUID = format(str(uuid.uuid1()))
        imageName = os.path.join(imagePath, label, label+'_'+f'{tmpUUID}.jpg')
        cv2.imwrite(imageName, frame)
        print('Collected')
        cv2.imshow('frame', frame)
        cv2.imwrite(os.path.join(imagePath, label, label+'_vshift_'+f'{tmpUUID}.jpg'),
                    vertical_shift(input_img=frame))
        cv2.imwrite(os.path.join(imagePath, label, label+'_hshift_'+f'{tmpUUID}.jpg'),
                    horizontal_shift(input_img=frame))
        cv2.imwrite(os.path.join(imagePath, label, label+'_zoom_'+f'{tmpUUID}.jpg'),
                    zoom(input_img=frame))
        time.sleep(frame_capture_delay)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            cv2.destroyWindow('frame')
            break
    cap.release()
cv2.destroyWindow('frame')

Collecting images for RClearC
Collected


At the end of the capture step, go into the _capturedimages_ directory.  
Check all the sub-directories generated output images, to ensure that all of them contain well structured data: if not, repeat the capture process by specifying only the number of images and classes you need to replace.  

The following cell will then move all the images from the class sub-directories to the _capturedimages_ directory.  
If you redefined the _labels_ list to capture some extra images, be sure to redefine the _labels_ list before executing the following cell, so that all files get correctly moved.

In [25]:
for label in labels:
    source_dir = imagePath + f"/{label}/"
    file_names = os.listdir(source_dir)
    for file_name in file_names:
        # Remove comment for verbose output
        #print(f"Moving {file_name} from {source_dir} to {imagePath}")
        shutil.move(os.path.join(source_dir, file_name), imagePath)
    os.rmdir(source_dir)

# Dataset labeling

All commands must be run from the root directory of the project, unless otherwise stated.

## Path setup

In [4]:
WORKSPACE_PATH = 'Tensorflow/Workspace'
SCRIPTS_PATH = 'Tensorflow/ConversionScripts'
APIMODEL_PATH = 'Tensorflow/models'
ANNOTATION_PATH = WORKSPACE_PATH+'/annotations'
IMAGE_PATH = WORKSPACE_PATH+'/images'
MODEL_PATH = WORKSPACE_PATH+'/trained-models'
PRETRAINED_MODEL_PATH = WORKSPACE_PATH+'/pre-trained-model'
CONFIG_PATH = MODEL_PATH+'/custom_trained_model/pipeline.config'
CHECKPOINT_PATH = MODEL_PATH+'/custom_trained_model/'
CUSTOM_MODEL_NAME = 'custom_trained_model' 

The label map hereby generated contains all classes we want to train our model onto and assigns an integer id to each one.  
The map can be modified in order to account for number of different labels, just be sure that it features exactly the same classes previously defined in the _labels_ list.  
The code box down below will generate the _label_map.pbtxt_ file.  

In [9]:
!mkdir "{ANNOTATION_PATH}"
label_map = []
id = 1
for label in labels:
    label_map.append({'name':f"{label}", 'id':f"{id}"})
    id = id + 1

with open(ANNOTATION_PATH + '\label_map.pbtxt', 'w') as f:
    for entry in label_map:
        f.write('item { \n')
        f.write('\tname:\'{}\'\n'.format(entry['name']))
        f.write('\tid:{}\n'.format(entry['id']))
        f.write('}\n')

## Label images with LabelImg

The follwing code cell moves the notebook's shell to the LabelImg directory and runs LabelImg.  

In the LabelImg instance that just opened, be sure to check in the upper left menu _View->Auto Save Mode_.  
Click onto _Open Dir_ on the left vertical menu bar and select the _collected images_ directory, then _Select directory_.  
Click onto _Change Save Dir_ on the left menu bar and select the _collected images_ directory, then _Select directory_.  

You should now see all the images you previously captured.  
Use _w_ to activate the draw bounding box utility and select the gesture in the image, confirming by clicking with the left mouse button. A small input window will open, asking you to assign a name to the object class for the bounding box you just drew: for consistency's sake, we used the same names as for the _labels_ list.  
Repeat the process until all the images have been labeled with exactly one label. It's possible to specify multiple bounding boxes for multiple elements in a single image, however this would make the dataset more complex to manage. Forthis reason, in this case, each dataset entry will contain one single gesture.  

In [None]:
%cd "labelImg"
!pyrcc5 -o libs/resources.py resources.qrc # Solves "No module named 'libs.resources'" bug, if present
!python labelImg.py
%cd ".."

## Generate Tensorflow .record files