<a href="https://colab.research.google.com/github/AIWintermuteAI/aXeleRate/blob/master/resources/aXeleRate_person_detector.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Person Detection model Training and Inference

In this notebook we will use axelerate, Keras-based framework for AI on the edge, to quickly setup model training and then after training session is completed convert it to .tflite and .kmodel formats.

First, let's take care of some administrative details. 

1) Before we do anything, make sure you have choosen GPU as Runtime type (in Runtime - > Change Runtime type).

2) We need to mount Google Drive for saving our model checkpoints and final converted model(s). Press on Mount Google Drive button in Files tab on your left. 

In the next cell we clone axelerate Github repository and import it. 

**It is possible to use pip install or python setup.py install, but in that case you will need to restart the enironment.** Since I'm trying to make the process as streamlined as possibile I'm using sys.path.append for import.

In [0]:
%tensorflow_version 1.x
!git clone https://github.com/AIWintermuteAI/aXeleRate.git
import sys
sys.path.append('/content/aXeleRate')
from axelerate import setup_training,setup_inference

At this step you typically need to get the dataset. You can use !wget command to download it from somewhere on the Internet or !cp to copy from My Drive as in this example
```
!cp -r /content/drive/'My Drive'/pascal_20_segmentation.zip .
!unzip --qq pascal_20_segmentation.zip
```
For this notebook well use gdown command line tool to download the dataset for person detection I shared on Google Drive and then unzip it with unzip command. It is based on INRIA person detection dataset, which I converted to PASCAL-VOC annotation format.
https://dbcollection.readthedocs.io/en/latest/datasets/inria_ped.html
When actually training the model myself I added about 400 pictures of our office staff, which I cannot share online. I recommend you also augment this dataset by taking and annotating pictures of your family/friends. The annotation tool I use is LabelImg
https://github.com/tzutalin/labelImg

Let's visualize our detection model test dataset. There are images in validation folder with corresponding annotations in PASCAL-VOC format in validation annotations folder.


In [0]:
!gdown https://drive.google.com/uc?id=1UWwxlJm5JH_JiBY9PoLgGyHsRDzBqRGU #dataset
!gdown https://drive.google.com/uc?id=1-2fiBxykZVZBRcux9I6mKZaS3yAHq6hk #pre-trained model

!unzip --qq person_dataset.zip

from axelerate.networks.yolo.backend.utils.augment import visualize_dataset

visualize_dataset(img_folder='person_dataset/imgs_validation', ann_folder='person_dataset/anns_validation', img_size=None, jitter=None)

Next step is defining a config dictionary. Most lines are self-explanatory.

Type is model frontend - Classifier, Detector or Segnet

Architecture is model backend (feature extractor) 

- Full Yolo
- Tiny Yolo
- MobileNet1_0
- MobileNet7_5 
- MobileNet5_0 
- MobileNet2_5 
- SqueezeNet
- VGG16
- ResNet50

For more information on **anchors**, please read here
https://github.com/pjreddie/darknet/issues/568

**Labels** are labels present in your dataset.
IMPORTANT: Please, list all the labels present in the dataset.

**object_scale** determines how much to penalize wrong prediction of confidence of object predictors

**no_object_scale** determines how much to penalize wrong prediction of confidence of non-object predictors

**coord_scale** determines how much to penalize wrong position and size predictions (x, y, w, h)

**class_scale** determines how much to penalize wrong class prediction

## Parameters for Person Detection

K210, which is where we will run the network, has constrained memory (5.5 RAM) available, so with Micropython firmware, the largest model you can run is about 2 MB, which limits our architecture choice to Tiny Yolo, MobileNet(up to 0.75 alpha) and SqueezeNet. Out of these 3 architectures, only one comes with pre-trained model - MobileNet. So, to save the training time we will use Mobilenet with alpha 0.75, which has ... parameters. For objects that do not have that much variety, you can use MobileNet with lower alpha, down to 0.25.

In [0]:
config = {
        "model":{
            "type":                 "Detector",
            "architecture":         "MobileNet7_5",
            "input_size":           224,
            "anchors":              [0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828],
            "labels":               ["person"],
            "coord_scale" : 		1.0,
            "class_scale" : 		1.0,
            "object_scale" : 		5.0,
            "no_object_scale" : 	1.0
        },
        "weights" : {
            "full":   				"",
            "backend":   		    "imagenet"
        },
        "train" : {
            "actual_epoch":         100,
            "train_image_folder":   "person_dataset/imgs",
            "train_annot_folder":   "person_dataset/anns",
            "train_times":          1,
            "valid_image_folder":   "person_dataset/imgs_validation",
            "valid_annot_folder":   "person_dataset/anns_validation",
            "valid_times":          1,
            "valid_metric":         "mAP",
            "batch_size":           10,
            "learning_rate":        1e-3,
            "saved_folder":   		F"/content/drive/My Drive/person_detector",
            "first_trainable_layer": "",
            "augumentation":				True,
            "is_only_detect" : 		False
        },
        "converter" : {
            "type":   				["k210","tflite"]
        }
    }

Let's check what GPU we have been assigned in this Colab session, if any.

In [0]:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()

Finally we start the training by passing config dictionary we have defined earlier to setup_training function. The function will start the training with Checkpoint, Reduce Learning Rate on Plateau and Early Stopping callbacks. After the training has stopped, it will convert the best model into the format you have specified in config and save it to the project folder.

In [0]:
from keras import backend as K 
K.clear_session()
model_path = setup_training(config_dict=config)

After training it is good to check the actual perfomance of your model by doing inference on your validation dataset and visualizing results. This is exactly what next block does.

In [0]:
from keras import backend as K 
K.clear_session()
setup_inference(config, model_path)

The pre-trained weights inference results are: {'fscore': 0.918918918918919, 'precision': 0.8947368421052632, 'recall': 0.9444444444444444}, final validation mAP 0.5657894736842105 
**weights name:  YOLO_best_mAP.h5**

Good luck and happy training! Have a look at these articles, that would allow you to get the most of Google Colab or connect to local runtime if there are no GPUs available;

https://medium.com/@oribarel/getting-the-most-out-of-your-google-colab-2b0585f82403

https://research.google.com/colaboratory/local-runtimes.html