re-implementation of yolo v2 detection using torchvision vgg16 bn model.
A pytorch implementation of vgg16 version of yolo v2 described in YOLO9000: Better, Faster, Stronger paper by Joseph Redmon, Ali Farhadi. The goal of this repo. is to re-implement a famous one-stage object detection, yolo v2 using torchvision models.
- Python 3.7
- Pytorch 1.5.0
- visdom
- numpy
- cv2
- matplotlib
1. download model weight from here move into ./saves
2. prefare PATH that contains the image you want to detect -- PATH (e.g. 'D:\image_path')
3. check the TYPE of image. -- TYPE (e.g. 'jpg')
4. Enter tne command : python demo.py --demo_img_path PATH --demo_img_type TYPE
methods | Traning Dataset | Testing Dataset | Resolution | mAP | Fps |
---|---|---|---|---|---|
original papers | VOC2007 train + VOC2012 train | VOC2007 test | 416 x 416 | 76.8 | 67 |
ours | VOC2007 train + VOC2012 train | VOC2007 test | 416 x 416 | 77.03 | 58 |
detection result of voc 2007.
Firstly, you make a dataset file structure like bellow for voc train and test.
VOCtrainval needs to contain TRAIN file and VOCtest contain TEST file.
root|-- TEST
|-- VOC2007
|-- Annotations
|-- ImageSets
|-- JPEGImages
|-- SegmentationClass
|-- SegmentationObject
|-- TRAIN
|-- VOC2007
|-- Annotations
|-- ImageSets
|-- JPEGImages
|-- SegmentationClass
|-- SegmentationObject
|-- VOC2012
|-- Annotations
|-- ImageSets
|-- JPEGImages
|-- SegmentationClass
|-- SegmentationObject
to train, we used voc2007trainval + voc2012trainval dataset,
to test, we used voc2007test dataset
Unlike the existing yolo v2, the backbone uses vgg instead of darknet 19, and the modules behind it have been modified a little bit.
01- what is the cell concept?
yolo considers the final layer feature map size as a cell size.
For example, an image of 416 resolution becomes a cell of 13 size.
02- make_target
to assign gt bbox to anchors.
so we get positive anchors if iou(bbox, anchors) > 0.5
For positive anchors, it is xy_gt that scales 0 to 1 to which position in the center of gt_bbox corresponds to the cell.
also, for positive anchors, wh_gt is the ratio of gt_bbox and anchor boxes.
gt_conf is max iou(pred_bbox, gt_bbox) for each anchor in cells.
no conf is 1 - gt_conf
03- whole loss
whole loss consists of xy centor loss, wh ratio loss, confidence loss, no conf loss, and classification loss. original paper losses are sum square errors of each component, except to wh ration loss is root sse.
optimizer is SGD (weight_decay : 5e-4, momentum : 0.9)
train until convergence (about 150 epochs)
learning rate decay
Epoches | Learning rate |
---|---|
000-099 | 1e-4 |
100-149 | 1e-5 |
evaluation is a voc metric, mAP(iou>0.5) and exactly same to official python mAP code https://github.com/Cartucho/mAP
- for training
# python main.py
usage: main.py [-h] [--batch_size] [--conf_thres]
[--epochs] [--lr] [--num_workers]
[--save_file_name] [--save_path] [--data_path]
[--start_epoch]
-h, --help show this help message and exit
--batch_size for training batch size, test batch is only 1.
--conf_thres for testing, confience threshold, default 0.01
--epochs whole traning epochs (default 200)
--lr initial learning rate (default 1e-4)
--num_workers dataset num_workers (default 4)
--save_file_name when you do experiment, you can change save_file_name to distinguish other pths.
--save_path the path to save .pth file
--data_path data path for training and testing refer to Implementations/dataset (default="D:\Data\VOC_ROOT")
--num_classes number of dataset classes (voc : 20, coco:80) (default=20)
--dataset_type which dataset you want to use VOC or COC0 (default='voc')
--start_epoch when you resume, set the start epochs.
before test and demo, we must have trained .pth file(weight params) you can download 149 epoch weights at https://livecauac-my.sharepoint.com/:u:/g/personal/csm8167_cau_ac_kr/EWshHPoe9-tOgLUtreWDUeEBmMwMXaAA1VT1rniLf_x7Sg?e=0MwAUa
and then make ./saves file place the weight in the file.
- for testing
# python test.py
usage: test.py [-h] [--data_path] [--test_epoch]
[--save_path] [--save_file_name] [--conf_thres]
-h, --help show this help message and exit
--data_path for testing, voc 2007 path (because split =='TEST') (default="D:\Data\VOC_ROOT")
--test_epoch for testing, which epoch param do we get
--save_path for testing, params path (default './saves')
--save_file_name save_file_name to distinguish other params. (default 'yolo_v2_vgg_16')
--conf_thres for testing, confience threshold which detector detect above the thres (default 0.01)
- for demo
# python demo.py
usage: demo.py [-h] [--demo_img_path] [--demo_img_type]
[--vis] [--epoch] [--save_path]
[--save_file_name] [--conf_thres]
-h, --help show this help message and exit
--demo_img_path The path that contains the image you want to detect
--demo_img_type The type of images you want to detect
--vis Whether to visualize (default False)
--epoch for demo, which epoch param do we get
--save_path for demo, params path (default './saves')
--save_file_name save_file_name to distinguish other params. (default 'yolo_v2_vgg_16')
--conf_thres for demo, confience threshold which detector detect above the thres (default 0.35)