# 2 - Training

Before doing anything let us just create paths to aur folders so our work remains manageable and understandable. 

In [None]:
PATH_TO_IMAGES_FOLDER='images'
PATH_TO_ANNOTATIONS_FOLDER='annotations'

## Clean XML files

The command bellow will run the xmlconversion.py script. This script creates a new directory called cleaned inside images directory which contains cleaned XML files (remves all un-necessary spaces).
(Only need to run this command if using any labeling module other than the recommended ('labelimg'))

## Train-Test Partition

To keep the model from being biased, we need to evaluate it on unseen data. For this purpose we will divide the dataset into two parts, train and test. The training partition will be used to train the model, and the test partition, being unseen to the model, will be used to evaluate the model performance. The below line will run partition_dataset.py file which will select a percentage (10% in this case) of images and corresponding xmls, in random order, from the images folder and create a new subfolder named test with these images coppied in it. The remaining images will be coppied to a subfolder called train.

In [None]:
!python partition_dataset.py -x -i ./{PATH_TO_IMAGES_FOLDER} -r 0.1

## Create the TF Record

Object Detection API rquires the images to be stored in the binary form in a TFRecords file which makes the training faster and easier for the machine as it takes less memory.

### Create the .PBTXT file

As Object Detection API requires us to assign numeric labels to each object to be detected, we need to create a PBTX file that contains these labels for the four birds species, namely, Erithacus Rubecula, Periparus Ater, Pica Pica, and Turdus Merula. The cell below, when run, creates a numeric 'id' for each bird 'name'.

In [None]:
labels = [{'name':'Erithacus_Rubecula', 'id':1}, {'name':'Periparus_ater', 'id':2},
         {'name':'Pica_pica', 'id':3}, {'name':'Turdus_merula', 'id':4}]

with open('annotations/label_map.pbtxt', 'w') as f:
    for label in labels:
        f.write('item { \n')
        f.write('\tname:\'{}\'\n'.format(label['name']))
        f.write('\tid:{}\n'.format(label['id']))
        f.write('}\n')

In [None]:
# Verify the created PBTX file by opening it in VS Code
!code {PATH_TO_ANNOTATIONS_FOLDER}'/label_map.pbtxt'

### Create the TF Record (Train)

In [None]:
!python generate_tfrecord.py -x {PATH_TO_IMAGES_FOLDER}/train -l {PATH_TO_ANNOTATIONS_FOLDER}/label_map.pbtxt -o {PATH_TO_ANNOTATIONS_FOLDER}/train.record

Successfully created the TFRecord file: annotations/train.record


### Create the TF Record (Test)

In [None]:
!python generate_tfrecord.py -x {PATH_TO_IMAGES_FOLDER}/test -l {PATH_TO_ANNOTATIONS_FOLDER}/label_map.pbtxt -o {PATH_TO_ANNOTATIONS_FOLDER}/test.record

Successfully created the TFRecord file: annotations/test.record
