- It is assumed here, that all the pre-requisites required for running Caffe-jacinto are met.
- caffe-jacinto should have been built using commands make -j16 and make pycaffe at the caffe-jacinto root directory.
- Set environment variable CAFFE_ROOT pointing to caffe-jacinto folder, e.g. export CAFFE_ROOT=/user/github/caffe-jacinto.
- Open a bash terminal and change directory into the scripts folder, as explained earlier.
We use the same LMDB format as used by original Caffe-SSD implementation.
-
The main training script is located ../scripts/train_image_object_detection.sh.
-
There are four example configurations provided in the script, two for PASCAL VOC0712 and other two for custom datasets.
-
Appropriate dataset can be set at this location.
-
Also solver params need to be set based on the size of one epoch in the dataset.
-
Look at gpus variable at this location. This should reflect the number of gpus that you have. For example, if you have two NVIDIA CUDA supported gpus, the gpus variable should be set to "0,1". If you have more GPUs, modify this field to reflect it so that the training will complete faster.
-
Execute the training by running the training script from $root/scripts folder,
-
[./train_image_object_detection.sh].
-
There are three stages in this training.
Stage-1: Initial stage with L2 regularization training
Uses imagenet pre-trained model and trains it for object detect task for the dataset set earlier. For PASCAL VCOC0712, this stage runs for 120k iteration which approximately takes 20 hrs on 2 GTX 1080 GPUs. The trained model is stored at ./training/dataset/model_name/folder_name/initial/. The folder_name is specified at this location. Similarly dataset and model_name are specified in the file, ./train_image_object_detection.sh
Stage-2: L1 regularization training This stage fine tunes stage-1 trained model to make CNN n/w amenable for sparsification.The trained model is stored at ./training/dataset/model_name/folder_name/l1reg/.
Stage-3: Sparsification training This stage starts with trained model in stage-2 and induces sparsity gradually. The config parameters can be adjusted to achieve desired level of sparsity at this location.The trained model is stored at ./training/dataset/model_name/folder_name/sparse/.
The validation accuracy in the form of mean average precision (mAP) is printed in the run.log in the respective folder for each stage.
Configuration-Dataset VOC0712 | mAP |
---|---|
Initial L2 regularized training | 68.66% |
L1 regularized fine tuning | 68.07% |
Sparse fine tuned(nearly 61% zero coefficients) | 65.77% |
Overall impact due to sparseness | 2.89% |
-
61.1% sparsity (i.e. zero coefficients in convolution weights) implies that the complexity of inference can be potentially reduced by 2.5x - by using a suitable sparse convolution implementation.
-
It is possible to change the value of sparsity applied - see the training script for more details.
*The pre-trained models are made available for PASCAL VOC0712.
-
PASCAL VOC0712:
The two custom configuration examples are provided. The dataset needs to be set to ti-custom-cfg1,ti-custom-cfg2 at this place For your own custom dataset the following parameters need to be set to appropriate values,
-
train_data, test_data, name_size_file, label_map_file, num_test_image and num_classes.
SSD generates best quality for PASCAL VOC0712 dataset when square input resolution like 512x512 or 300x300 is used. However sometimes there may be need of training with non-square input resolution for custom dataset. So for an illustration purpose we provide script to train for non-square input resolution using VOC0712 dataset by resizing input data. In the training file ./train_image_object_detection.sh, set voc0712_cfg_type="TYPE2" at this place. The trained model and weights can be downloaded at the following place.
PASCAL VOC0712 models for Non square resolution
The script to run trained model through video files can be executed by the following simple commands.
-
cd $root/scripts/
-
python ./infer_video_object.py
- Set caffe_root path to folder pointing to caffe_jacinto in ./infer_video_object.py.
- The path of the input video needs to be updated along with video names by updating dataset in the script.
- Output videos with detected objects are stored at the path provided by params.OpPath in the script.
- Detected outputs are stored in the text files too.