Single Shot Multibox Detector (SSD)
This example demonstrates how to train and test the Single Shot Multibox Detector (SSD) model on the PASCAL VOC and KITTI datasets. This object localization model learns to detect objects in natural scenes and provide bounding boxes and category information for each object.
"SSD: Single Shot MultiBox Detector" https://arxiv.org/abs/1512.02325
Installation of scipy is required.
. .venv/bin/activate pip install scipy
First, download the PASCALVOC 2007 training, testing, and PASCALVOC 2012 training datasets to a local directory. These datasets consist of images of scenes and corresponding annotation files with bounding box and category information for each object in the scene.
Then, run the
ingest_pascalvoc.py script to decompress and process the files into an output
directoy which we use the shell variable
python datasets/ingest_pascalvoc.py --input_dir <dir/containing/tar/files> --output_dir $DATA --height 300 --width 300
The above script will:
Decompress the tar files into the output directory, inside the folder
Convert the annotations from XML to the json format expected by our dataloader. The converted json files are saved to the folder
Write manifest files for the training and testing sets. These are written to
Write a SSD model config file, written to
Write a configuration file to pass to neon. The config file is written to
pascalvoc_300x300.cfg. The config file contains the paths to the manifest files, as well as some other dataset-specific settings. For example:
height = 300 epochs = 230 width = 300 manifest_root = /usr/local/data/VOCdevkit manifest = [train:/usr/local/data/VOCdevkit/train_300x300.csv, val:/usr/local/data/VOCdevkit/val_300x300.csv] ssd_config = [train:/usr/local/data/VOCdevkit/pascalvoc_ssd_300x300.cfg, val:/usr/local/data/VOCdevkit/pascalvoc_ssd_300x300_val.cfg]
Then, run the
ingest_kitti.py script to decompress and process the files into an output directoy
specified by the
--output_dir command line option. This script will also resize the KITTI images
from the original 375 by 1242 pixels to 300 x 994 pixels. This maintains the original aspect ratio
while reducing the size for processing with the SSD model.
python ingest_kitti.py --input_dir <dir/containing/tar/files> --output_dir $DATA
The script will unzip the data into the folder
$DATA/kitti/, and carry out a similar procedure as above for PASCALVOC dataset. The configuration file will be saved as
Note that the SSD model configuration is slightly different from the PASCAL dataset since the aspect ratio of the images are different. See the SSD configuration files for more details on the differences.
Spacenet is a dataset of satellite imagery with corresponding building footprints. Instructions for the downloading the dataset are found here. Afer downloading the extracting the archive files, you should have folders for each of the cities, located in the
$DATA folder. Then, run the ingest script:
python ingest_spacenet.py --data_dir $DATA --height 512 --weight 512
Note the above ingest script only works on the 3-band images.
The script will preprocess the images and convert the building footprints into enclosing bounding boxes. Several pre-processing steps are done:
- Convert images to 512x512 shape, and resave as
- Compute the enclosing bounding box for the building footprints.
- Shrink the bounding box to 80% of original size.
- Remove images with >50% blank pixels.
- Remove buildings smaller than 1% of the image width or height.
The converted images and annotations are saved in each city's folder. For example:
$DATA/AOI_2_Vegas_Train/RGB-PanSharpen-512x512, and the config file for the entire dataset (combined across all cities), is saved in
To train the model, use:
python train.py --config <path/to/config/file> --verbose --save_path model.prm -z 32
The above command will train the model and save the model to the file
By default, the SSD has several convolution and linear layers that are initialized from a pre-trained VGG16 model. These are automatically downloaded by the script.
To evaluate the trained model using the Mean Average Precision (MAP) metric, use the below command.
python inference.py --config <path/to/config/file> --model_file frcn_model.prm --output results.prm