Skip to content

bobo0810/pytorchSSD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SSD: Single Shot MultiBox Object Detector, in PyTorch

该仓库收录于PytorchNetHub

注:关于本仓库,本人新增内容在最下面

该仓库为 原作者的内容,本人仅阅读并在代码中加入大量中文注释,以便理解。

最近工作:

  • 正在基于原作者代码进行重构(吐槽:大牛就是大牛,代码写的这么乱。强迫症的我必须试着重新整理一下)
  • 目前已经完成该项目的训练部分重构工作,全部完成后将给出重构代码地址
  • 重构版本 强烈推荐!

A PyTorch implementation of Single Shot MultiBox Detector from the 2016 paper by Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang, and Alexander C. Berg. The official and original Caffe code can be found here.

Table of Contents

       

Installation

  • Install PyTorch by selecting your environment on the website and running the appropriate command.
  • Clone this repository.
    • Note: We currently only support Python 3+.
  • Then download the dataset by following the instructions below.
  • We now support Visdom for real-time loss visualization during training!
    • To use Visdom in the browser:
    # First install Python server and client
    pip install visdom
    # Start the server (probably in a screen or tmux)
    python -m visdom.server
    • Then (during training) navigate to http://localhost:8097/ (see the Train section below for training details).
  • Note: For training, we currently support VOC and COCO, and aim to add ImageNet support soon.

Datasets

To make things easy, we provide bash scripts to handle the dataset downloads and setup for you. We also provide simple dataset loaders that inherit torch.utils.data.Dataset, making them fully compatible with the torchvision.datasets API.

COCO

Microsoft COCO: Common Objects in Context

Download COCO 2014
# specify a directory for dataset to be downloaded into, else default is ~/data/
sh data/scripts/COCO2014.sh

VOC Dataset

PASCAL VOC: Visual Object Classes

Download VOC2007 trainval & test
# specify a directory for dataset to be downloaded into, else default is ~/data/
sh data/scripts/VOC2007.sh # <directory>
Download VOC2012 trainval
# specify a directory for dataset to be downloaded into, else default is ~/data/
sh data/scripts/VOC2012.sh # <directory>

Training SSD

mkdir weights
cd weights
wget https://s3.amazonaws.com/amdegroot-models/vgg16_reducedfc.pth
  • To train SSD using the train script simply specify the parameters listed in train.py as a flag or manually change them.
python train.py
  • Note:
    • For training, an NVIDIA GPU is strongly recommended for speed.
    • For instructions on Visdom usage/installation, see the Installation section.
    • You can pick-up training from a checkpoint by specifying the path as one of the training parameters (again, see train.py for options)

Evaluation

To evaluate a trained network:

python eval.py

You can specify the parameters listed in the eval.py file by flagging them or manually changing them.

Performance

VOC2007 Test

mAP
Original Converted weiliu89 weights From scratch w/o data aug From scratch w/ data aug
77.2 % 77.26 % 58.12% 77.43 %
FPS

GTX 1060: ~45.45 FPS

Demos

Use a pre-trained SSD network for detection

Download a pre-trained network

SSD results on multiple datasets

Try the demo notebook

  • Make sure you have jupyter notebook installed.
  • Two alternatives for installing jupyter notebook:
    1. If you installed PyTorch with conda (recommended), then you should already have it. (Just navigate to the ssd.pytorch cloned repo and run): jupyter notebook

    2. If using pip:

# make sure pip is upgraded
pip3 install --upgrade pip
# install jupyter notebook
pip install jupyter
# Run this inside ssd.pytorch
jupyter notebook

Try the webcam demo (经测试,存在 bug )

  • Works on CPU (may have to tweak cv2.waitkey for optimal fps) or on an NVIDIA GPU
  • This demo currently requires opencv2+ w/ python bindings and an onboard webcam
    • You can change the default webcam in demo/live.py
  • Install the imutils package to leverage multi-threading on CPU:
    • pip install imutils
  • Running python -m demo.live opens the webcam and begins detecting!

TODO

We have accumulated the following to-do list, which we hope to complete in the near future

  • Still to come:
    • Support for the MS COCO dataset
    • Support for SSD512 training and testing
    • Support for training on custom datasets

Authors

Note: Unfortunately, this is just a hobby of ours and not a full-time job, so we'll do our best to keep things up to date, but no guarantees. That being said, thanks to everyone for your continued help and feedback as it is really appreciated. We will try to address everything as soon as possible.

References

本人新增内容

  • 环境:
python版本 pytorch版本
3.5 0.3.0
  • 说明:

运行train.py之前请确保启动可视化工具visdom

总体思路

  • 1、数据预处理
  • 2、网络模型搭建
  • 3、损失函数定义

1、数据预处理

  • 读取图像及对应xml,返回经过处理的一张图像及对应的真值框和类别

2、网络结构搭建

  • 总体结构

  • 详细结构

  • 各网络具体结构

vgg基础网络

extras新增层

head(loc定位、conf分类)

loc定位

conf分类

  • 网络细节

当训练时,网络模型返回loc、conf、priors

一张图片(若干feature map)共生成8732个锚

loc: 通过网络输出的定位的预测 [32,8732,4]

conf: 通过网络输出的分类的预测 [32,8732,21]

priors:不同feature map根据公式生成的锚结果 [8732,4] (称之为之所以称为锚,而不叫预测框。是因为锚是通过公式生成的,而不是通过网络预测输出的)

3、损失函数定义

  • 分类损失

使用多类别softmax loss

  • 回归损失

使用 Smooth L1 loss

匹配策略:

1、通过使用IOU最大来匹配每一个 真值框 与 锚,这样就能保证每一个真值框 与 唯一的一个 锚 对应起来。

2、之后又将 锚 与 每一个 真值框 配对,只要两者之间的 IOU 大于一个阈值,这里本文的阈值为 0.5。

这样的结果是 每个真实框对应多个预测框。

Hard negative mining(硬性负开采):

1、先将每一个物体位置上是 负样本 的 锚框 按照 confidence 的大小进行排序

2、选择最高的几个,保证最后 negatives、positives 的比例在 3:1。

这样的比例可以更快的优化,训练也更稳定。

结果

图片说明图片说明
图片说明图片说明

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published