# Label dataset and train

Resources Used
- wget.download('https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/_downloads/da4babe668a8afb093cc7776d7e630f3/generate_tfrecord.py')
- Setup https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/install.html

All commands must be run from the root directory of the project, unless otherwise stated.

## Path setup

In [1]:
WORKSPACE_PATH = 'Tensorflow/workspace'
SCRIPTS_PATH = 'Tensorflow/scripts'
APIMODEL_PATH = 'Tensorflow/models'
ANNOTATION_PATH = WORKSPACE_PATH+'/annotations'
IMAGE_PATH = WORKSPACE_PATH+'/images'
MODEL_PATH = WORKSPACE_PATH+'/models'
PRETRAINED_MODEL_PATH = WORKSPACE_PATH+'/pre-trained-models'
CONFIG_PATH = MODEL_PATH+'/my_ssd_mobnet/pipeline.config'
CHECKPOINT_PATH = MODEL_PATH+'/my_ssd_mobnet/'
CUSTOM_MODEL_NAME = 'my_ssd_mobnet' 

## Create Label Map

The label map hereby used contains all the letters of the Italian alphabet.
The map can be modified in order to account to any number of different labels.

The code box down below will generate the label_map.pbtxt file.

In [2]:
labels = [
    {'name':'A', 'id':1},
    {'name':'B', 'id':2},
    {'name':'C', 'id':3},
    {'name':'D', 'id':4},
    {'name':'E', 'id':5},
    {'name':'F', 'id':6},
    {'name':'G', 'id':7},
    {'name':'H', 'id':8},
    {'name':'I', 'id':9},
    {'name':'L', 'id':10},
    {'name':'M', 'id':11},
    {'name':'N', 'id':12},
    {'name':'O', 'id':13},
    {'name':'P', 'id':14},
    {'name':'Q', 'id':15},
    {'name':'R', 'id':16},
    {'name':'S', 'id':17},
    {'name':'T', 'id':18},
    {'name':'U', 'id':19},
    {'name':'V', 'id':20},
    {'name':'Z', 'id':21},
]

with open(ANNOTATION_PATH + '\label_map.pbtxt', 'w') as f:
    for label in labels:
        f.write('item { \n')
        f.write('\tname:\'{}\'\n'.format(label['name']))
        f.write('\tid:{}\n'.format(label['id']))
        f.write('}\n')

## Label images with LabelImg

Move the shell to the LabelImg directory and run LabelImg.

In [12]:
%cd "Tensorflow\labelImg"
!python labelImg.py
%cd "..\.."

[WinError 3] Impossibile trovare il percorso specificato: 'Tensorflow\\labelImg'
E:\PoliTO\Comp Vision\Computer-Vision-Project\Tensorflow\labelImg
E:\PoliTO\Comp Vision\Computer-Vision-Project


In the LabelImg instance that just opened, be sure to check in the upper left menu "View->Auto Save Mode".  
Click onto "Open dir" on the left menu bar and select the "collected images" directory, then "Select directory".  
Click onto "Change Save dir" on the left menu bar and select the "collected images" directory, then "Select directory".  

You should now see all the images you previously captured.  
Use "w" to activate the draw bounding box utility and select the gesture in the image, confirming by clicking with the left mouse button. A small menu will open, asking you to assign a name to the label for the bounding box you just drew: insert the appropriate name.  
Repeat the process untill all the images have been labeled with at least one label.  

## Create Tensorflow records

These special purpose files are used by Tensorflow to find the entries used for training and testing.

In [3]:
!python {SCRIPTS_PATH + '/generate_tfrecord.py'} -x {IMAGE_PATH + '/train'} -l {ANNOTATION_PATH + '/label_map.pbtxt'} -o {ANNOTATION_PATH + '/train.record'}
!python {SCRIPTS_PATH + '/generate_tfrecord.py'} -x{IMAGE_PATH + '/test'} -l {ANNOTATION_PATH + '/label_map.pbtxt'} -o {ANNOTATION_PATH + '/test.record'}

Successfully created the TFRecord file: Tensorflow/workspace/annotations/train.record
Successfully created the TFRecord file: Tensorflow/workspace/annotations/test.record


## Download or Update the Tensorflow Model Zoo

Clone the official Tensorflow Model Zoo library.

In [4]:
!cd Tensorflow && git clone https://github.com/tensorflow/models

Cloning into 'models'...
Updating files:  95% (2463/2591)
Updating files:  96% (2488/2591)
Updating files:  97% (2514/2591)
Updating files:  98% (2540/2591)
Updating files:  99% (2566/2591)
Updating files: 100% (2591/2591)
Updating files: 100% (2591/2591), done.


Download, if not present, the ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8 we'll later use for transfer learning.  
This can be done with the following command or manually via the following link:  
[coco17_tpu](http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz)  
Be sure that the model is saved and then unpacked in the correct folder:  
\Tensorflow\workspace\pre-trained-models  

In [6]:
wget.download('http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz')
!mv ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz {PRETRAINED_MODEL_PATH}
!cd {PRETRAINED_MODEL_PATH} && tar -zxvf ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz

## Copy Model Config to Training Folder

Copy the pipeline.config file into the correct path. Can also be done manually.

In [9]:
!mkdir {'Tensorflow\workspace\models\\'+CUSTOM_MODEL_NAME}
!cp {PRETRAINED_MODEL_PATH+'/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/pipeline.config'} {MODEL_PATH+'/'+CUSTOM_MODEL_NAME}

Sottodirectory o file Tensorflow\workspace\models\my_ssd_mobnet gi… esistente.
"cp" non Š riconosciuto come comando interno o esterno,
 un programma eseguibile o un file batch.


## Update Config For Transfer Learning

The pipeline.config file must be updated with the local paths in order for Tensorflow to later find all the required data to perform training.

In [3]:
import tensorflow as tf
from object_detection.utils import config_util
from object_detection.protos import pipeline_pb2
from google.protobuf import text_format

In [4]:
CONFIG_PATH = MODEL_PATH+'/'+CUSTOM_MODEL_NAME+'/pipeline.config'

In [5]:
config = config_util.get_configs_from_pipeline_file(CONFIG_PATH)

In [9]:
pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
with tf.io.gfile.GFile(CONFIG_PATH, "r") as f:                                                                                                                                                                                                                     
    proto_str = f.read()                                                                                                                                                                                                                                          
    text_format.Merge(proto_str, pipeline_config)  

At this point, be sure that the "pipeline_config.model.ssd.num_classes" variable in the code box down here equals exactly the number of labels you previously specified. In our case it will be 21.

In [10]:
pipeline_config.model.ssd.num_classes = 21
pipeline_config.train_config.batch_size = 4
pipeline_config.train_config.fine_tune_checkpoint = PRETRAINED_MODEL_PATH+'/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/checkpoint/ckpt-0'
pipeline_config.train_config.fine_tune_checkpoint_type = "detection"
pipeline_config.train_input_reader.label_map_path= ANNOTATION_PATH + '/label_map.pbtxt'
pipeline_config.train_input_reader.tf_record_input_reader.input_path[:] = [ANNOTATION_PATH + '/train.record']
pipeline_config.eval_input_reader[0].label_map_path = ANNOTATION_PATH + '/label_map.pbtxt'
pipeline_config.eval_input_reader[0].tf_record_input_reader.input_path[:] = [ANNOTATION_PATH + '/test.record']

In [11]:
config_text = text_format.MessageToString(pipeline_config)                                                                                                                                                                                                        
with tf.io.gfile.GFile(CONFIG_PATH, "wb") as f:                                                                                                                                                                                                                     
    f.write(config_text)   

## Train the model

The following code box will print out the command you'll have to print in a shell located at the root directory of the project in order to run Tensorflow and start the training. It is highly sugested to configure a GPU for training, as doing so will speed up by a significant margin the computation time, although this is not strictly necessary.

The number of training steps can be configured by changing the "--num_train_steps" value: more steps generally means a more accurate model and longer computation times.  
For information purposes, I will state that our model took 9 hours in order to complete the 40000 steps of training on an Intel i7-4770 CPU at 3.4 GHz.

In [13]:
print("""python {}/research/object_detection/model_main_tf2.py --model_dir={}/{} --pipeline_config_path={}/{}/pipeline.config --num_train_steps=40000""".format(APIMODEL_PATH, MODEL_PATH,CUSTOM_MODEL_NAME,MODEL_PATH,CUSTOM_MODEL_NAME))

python Tensorflow/models/research/object_detection/model_main_tf2.py --model_dir=Tensorflow/workspace/models/my_ssd_mobnet --pipeline_config_path=Tensorflow/workspace/models/my_ssd_mobnet/pipeline.config --num_train_steps=40000


At the end of this step, you should see various files named under the convention "ckpt-xx.data-00000-of-00001", where "xx" corresponds to a number dependent on the "--num_train_steps" previously specified.