## SSD ReadToMe Model Training for DeepLens

#### You can run the following steps to train the model for the ReadToMeProject or you can replace the custom data with your own custom data set and train your own object detection model

#### First, lets install some dependancies. I am running Cuda 9.1 on my system so I am grabbing the MXNet version which is built for Cuda 9.1, you may need to adjust accordinly depending on what is installed on your machine.

In [24]:
%%bash

pip install mxnet-cu91
pip install numpy
pip install opencv-python
pip install matplotlib



You are using pip version 8.1.1, however version 10.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
You are using pip version 8.1.1, however version 10.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
You are using pip version 8.1.1, however version 10.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
You are using pip version 8.1.1, however version 10.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.


#### Next, we need to grab the MXNet repo from Github

In [25]:
%%bash

echo checking for incubator-mxnet
DIR=incubator-mxnet
if [[ -d $DIR ]]; then
    echo found existing git repo
    echo deleting incubator-mxnet.git
    rm -rf incubator-mxnet
fi

git clone --recursive https://github.com/apache/incubator-mxnet.git

checking for incubator-mxnet
found existing git repo
deleting incubator-mxnet.git
Submodule '3rdparty/cub' (https://github.com/dmlc/cub) registered for path '3rdparty/cub'
Submodule '3rdparty/dlpack' (https://github.com/dmlc/dlpack) registered for path '3rdparty/dlpack'
Submodule '3rdparty/dmlc-core' (https://github.com/dmlc/dmlc-core.git) registered for path '3rdparty/dmlc-core'
Submodule '3rdparty/googletest' (https://github.com/google/googletest.git) registered for path '3rdparty/googletest'
Submodule '3rdparty/mkldnn' (https://github.com/intel/mkl-dnn.git) registered for path '3rdparty/mkldnn'
Submodule '3rdparty/mshadow' (https://github.com/dmlc/mshadow.git) registered for path '3rdparty/mshadow'
Submodule '3rdparty/nnvm' (https://github.com/dmlc/nnvm) registered for path '3rdparty/nnvm'
Submodule '3rdparty/openmp' (https://github.com/llvm-mirror/openmp) registered for path '3rdparty/openmp'
Submodule '3rdparty/ps-lite' (https://github.com/dmlc/ps-lite) registered for path '3rdpar

Cloning into 'incubator-mxnet'...
Cloning into '3rdparty/cub'...
Cloning into '3rdparty/dlpack'...
Cloning into '3rdparty/dmlc-core'...
Cloning into '3rdparty/googletest'...
Cloning into '3rdparty/mkldnn'...
Cloning into '3rdparty/mshadow'...
Cloning into '3rdparty/nnvm'...
Cloning into 'dmlc-core'...
Cloning into 'tvm'...
Cloning into 'HalideIR'...
Cloning into 'dlpack'...
Cloning into 'dmlc-core'...
Cloning into '3rdparty/openmp'...
Cloning into '3rdparty/ps-lite'...



#### This is where we deviate from the example instructions on Github. The example tells us to grab a model from this [list](https://github.com/apache/incubator-mxnet/tree/master/example/ssd#map) however, these models do not work with the current version of MXNet, at least not without modifying the symbol names. So lets grab a pretrained model that does work.


In [None]:
%%bash

rm -rf incubator-mxnet/example/ssd/model/*
cd incubator-mxnet/example/ssd/model/
wget https://github.com/zhreshold/mxnet-ssd/releases/download/v0.2-beta/vgg16_reduced.zip
unzip vgg16_reduced.zip
rm vgg16_reduced.zip
mv vgg16_reduced-symbol.json ssd_vgg16_reduced_300-symbol.json
mv vgg16_reduced-0001.params ssd_vgg16_reduced_300-0001.params


#### Now we need to organize our data into directories so that the example scripts will work. You can read more about this structure by Googling Pascal VOC. There are other Pascal VOC datasets available online that you can use to train models with. If you plan to train a model on your own dataset, you should first check to see if someone else has already made it available online.


In [None]:
import os
import zipfile
import shutil
from pathlib import Path
import random

    
# # remove training dir if exists
if os.path.exists("incubator-mxnet/example/ssd/data/VOCdevkit/VOC2018"):
    shutil.rmtree('incubator-mxnet/example/ssd/data/VOCdevkit/VOC2018')

# # text-block-custom dataset
os.makedirs('incubator-mxnet/example/ssd/data/VOCdevkit/VOC2018/JPEGImages')
os.makedirs('incubator-mxnet/example/ssd/data/VOCdevkit/VOC2018/Annotations')

destination_dir = 'incubator-mxnet/example/ssd/data/VOCdevkit/VOC2018/'


for file in os.listdir('2018'):
    if file.endswith('.jpg'):
        outfile_path = destination_dir + 'JPEGImages/' + file
        shutil.copy(os.path.join('2018', file), outfile_path)
    elif file.endswith('.xml'):
        outfile_path = destination_dir + 'Annotations/' + file
        shutil.copy(os.path.join('2018', file), outfile_path)

files = []
for filename in os.listdir('incubator-mxnet/example/ssd/data/VOCdevkit/VOC2018/JPEGImages/'):
    if filename.endswith('.jpg'):
        files.append('{0}'.format(Path(filename).stem))

# # Take 10% of the data and use it for validation, the rest goes to training
training = []
validation = []
validationPercent = 10
k = int(len(files) * validationPercent // 100)
indices = random.sample(range(len(files)), k)
for index, file in enumerate(files):
    if index not in indices:
        training.append(file)
    else:
        validation.append(file)


print(len(files))
print(len(training))
print(len(validation))

os.makedirs('incubator-mxnet/example/ssd/data/VOCdevkit/VOC2018/ImageSets/Main/')
with open('incubator-mxnet/example/ssd/data/VOCdevkit/VOC2018/ImageSets/Main/trainval.txt', 'w') as training_list:
    for _, row in enumerate(training):
        training_list.write('{}\n'.format(row))

with open('incubator-mxnet/example/ssd/data/VOCdevkit/VOC2018/ImageSets/Main/test.txt', 'w') as validation_list:
    for _, row in enumerate(validation):
        validation_list.write('{}\n'.format(row))
        

#### It is reccomended that we turn our dataset into .rec files so that MXNet can iterate over the data much more efficiently. 

#### This next step calls a wrapper script which altimately calls im2rec.py. This generates a ".rec" file for our training dataset and one for our validation dataset.


In [None]:
%%bash

# change directories to the example/ssd directory
cd incubator-mxnet/example/ssd
# update the names list to only include our single class name 'text_block'
echo text_block > dataset/names/pascal_voc.names

# generate the .rec files we will use to train with
python tools/prepare_dataset.py --dataset pascal --year 2018 --set test --target ./data/test.lst --root data/VOCdevkit/
python tools/prepare_dataset.py --dataset pascal --year 2018 --set trainval --target ./data/train.lst --root data/VOCdevkit/

#### Finally we can call train on our data. This will run for 250 epochs by default, but you can change that by passing in the "--end-epoch" flag and specifying an epoch you would like to stop at.

#### You can view the progress of this training by switching to a termial and tailing the train.log file

##### i.e. "tail -f train.log"

#### once the validation is acceptible, you can stop training.

In [None]:
%%bash

# change directories to the example/ssd directory
cd incubator-mxnet/example/ssd

#finetune  the model using our custom data set
python train.py --train-path data/train.rec --val-path data/test.rec --class-names text_block --num-class 1 --finetune 1 --gpus 0

#### You can test the model by calling the following command from the terminal

**(Note: You will need to change the path of the image to point to a sample image in your dataset.)**

`python demo.py --epoch 2 --network vgg16_reduced --images ./data/VOCdevkit/VOC2018/JPEGImages/MVIMG_20180129_210518.jpg --thresh 0.5 --data-shape 300 --class-names text_block --gpu 0`

#### You should also evaluate the model against the test dataset using the following command

**(Note: You will need to change the epoch flag to point to the epoch you would like to evaluate.)**

`python evaluate.py --gpus 0 --network vgg16_reduced --epoch 184 --class-names text_block --num-class 1 --rec-path data/test.rec`



#### Once you are finished you will need to deploy the model
**Pick the best epoch and deploy the model using the following command**

`python deploy.py --network vgg16_reduced --epoch 172 --num-class 1 --data-shape 300`

#### In order to deploy your model to the deeplens, you will need to tar.gz up your ".params" and your ".json" deployed model files and create a new model in the DeepLens Service inside the AWS Console online. You can then refer to the model using the model name in your project's lambda file. To optimize the model using the DeepLens Model Optimizer package, follow the instructions [here](https://docs.aws.amazon.com/deeplens/latest/dg/deeplens-model-optimizer-api-methods.html)