Skip to content

Latest commit



188 lines (132 loc) · 6.81 KB

File metadata and controls

188 lines (132 loc) · 6.81 KB

yolo v2 vgg16 pytorch

:octocat: re-implementation of yolo v2 detection using torchvision vgg16 bn model.


A pytorch implementation of vgg16 version of yolo v2 described in YOLO9000: Better, Faster, Stronger paper by Joseph Redmon, Ali Farhadi. The goal of this repo. is to re-implement a famous one-stage object detection, yolo v2 using torchvision models.


  • Python 3.7
  • Pytorch 1.5.0
  • visdom
  • numpy
  • cv2
  • matplotlib

Quick start detection

1. download model weight from here move into ./saves

2. prefare PATH that contains the image you want to detect -- PATH (e.g. 'D:\image_path')

3. check the TYPE of image. -- TYPE (e.g. 'jpg')

4. Enter tne command : python --demo_img_path PATH --demo_img_type TYPE


methods Traning Dataset Testing Dataset Resolution mAP Fps
original papers VOC2007 train + VOC2012 train VOC2007 test 416 x 416 76.8 67
ours VOC2007 train + VOC2012 train VOC2007 test 416 x 416 77.03 58


detection result of voc 2007.


  • Dataset

Firstly, you make a dataset file structure like bellow for voc train and test.

VOCtrainval needs to contain TRAIN file and VOCtest contain TEST file.

root|-- TEST
        |-- VOC2007
            |-- Annotations
            |-- ImageSets
            |-- JPEGImages
            |-- SegmentationClass
            |-- SegmentationObject
    |-- TRAIN
        |-- VOC2007
            |-- Annotations
            |-- ImageSets
            |-- JPEGImages
            |-- SegmentationClass
            |-- SegmentationObject
        |-- VOC2012
            |-- Annotations
            |-- ImageSets
            |-- JPEGImages
            |-- SegmentationClass
            |-- SegmentationObject

to train, we used voc2007trainval + voc2012trainval dataset,

to test, we used voc2007test dataset

  • Model

Unlike the existing yolo v2, the backbone uses vgg instead of darknet 19, and the modules behind it have been modified a little bit.



  • Loss

01- what is the cell concept?

yolo considers the final layer feature map size as a cell size.

For example, an image of 416 resolution becomes a cell of 13 size.

02- make_target

to assign gt bbox to anchors.

so we get positive anchors if iou(bbox, anchors) > 0.5

For positive anchors, it is xy_gt that scales 0 to 1 to which position in the center of gt_bbox corresponds to the cell.

also, for positive anchors, wh_gt is the ratio of gt_bbox and anchor boxes.

gt_conf is max iou(pred_bbox, gt_bbox) for each anchor in cells.

no conf is 1 - gt_conf

03- whole loss

whole loss consists of xy centor loss, wh ratio loss, confidence loss, no conf loss, and classification loss. original paper losses are sum square errors of each component, except to wh ration loss is root sse.


  • Train

optimizer is SGD (weight_decay : 5e-4, momentum : 0.9)

train until convergence (about 150 epochs)

learning rate decay

Epoches Learning rate
000-099 1e-4
100-149 1e-5
  • Evaluation

evaluation is a voc metric, mAP(iou>0.5) and exactly same to official python mAP code

Start Guide for Train / Test / Demo

  • for training
# python 
usage: [-h] [--batch_size] [--conf_thres] 
               [--epochs] [--lr] [--num_workers]
               [--save_file_name] [--save_path] [--data_path]

  -h, --help            show this help message and exit
  --batch_size          for training batch size, test batch is only 1.
  --conf_thres          for testing, confience threshold, default 0.01 
  --epochs              whole traning epochs   (default 200)
  --lr                  initial learning rate (default 1e-4) 
  --num_workers         dataset num_workers (default 4)
  --save_file_name      when you do experiment, you can change save_file_name to distinguish other pths.
  --save_path           the path to save .pth file
  --data_path           data path for training and testing refer to Implementations/dataset (default="D:\Data\VOC_ROOT")
  --num_classes         number of dataset classes (voc : 20, coco:80) (default=20)
  --dataset_type        which dataset you want to use VOC or COC0 (default='voc')
  --start_epoch         when you resume, set the start epochs. 

before test and demo, we must have trained .pth file(weight params) you can download 149 epoch weights at

and then make ./saves file place the weight in the file.

  • for testing
# python 
usage: [-h] [--data_path] [--test_epoch] 
               [--save_path] [--save_file_name] [--conf_thres]

  -h, --help            show this help message and exit
  --data_path           for testing, voc 2007 path (because split =='TEST') (default="D:\Data\VOC_ROOT")
  --test_epoch          for testing, which epoch param do we get
  --save_path           for testing, params path (default './saves') 
  --save_file_name      save_file_name to distinguish other params. (default 'yolo_v2_vgg_16')
  --conf_thres          for testing, confience threshold which detector detect above the thres (default 0.01) 
  • for demo
# python 
usage: [-h] [--demo_img_path] [--demo_img_type] 
               [--vis] [--epoch] [--save_path]
               [--save_file_name] [--conf_thres]

  -h, --help            show this help message and exit
  --demo_img_path       The path that contains the image you want to detect
  --demo_img_type       The type of images you want to detect
  --vis                 Whether to visualize (default False)
  --epoch               for demo, which epoch param do we get
  --save_path           for demo, params path (default './saves') 
  --save_file_name      save_file_name to distinguish other params. (default 'yolo_v2_vgg_16')
  --conf_thres          for demo, confience threshold which detector detect above the thres (default 0.35)