Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add demo for SSD model (WIP) #137

Merged
merged 9 commits into from
Aug 11, 2017
Merged

Conversation

pkuyym
Copy link
Contributor

@pkuyym pkuyym commented Jun 28, 2017

resolves #136

@pkuyym pkuyym changed the title Init demo. Add demo for SSD model (WIP) Jun 28, 2017
@pkuyym pkuyym requested a review from luotao1 July 6, 2017 09:32
ssd/README.md Outdated
## 概述
SSD全称为Single Shot MultiBox Detector,是目标检测领域较新且效果较好的检测算法之一,具体参见论文\[[1](#引用)\]。SSD算法主要特点是检测速度快且检测精度高。PaddlePaddle已集成SSD算法,本示例旨在介绍如何使用PaddlePaddle的SSD模型进行目标检测。下文展开顺序为:首先简要介绍SSD原理,然后介绍示例包含文件及作用,接着介绍如何在PASCAL VOC数据集上训练、评估及检测,最后简要介绍如何在自有数据集上使用SSD。
## SSD原理
SSD使用一个卷积神经网络实现“端到端”的检测,所谓“端到端”指输入为原始图像,输出为检测结果,无需借助外部工具或流程进行特征提取、候选框生成等。论文中SSD的基础模型为VGG-16,其在VGG-16的某些层后面增加了一些额外的层进行候选框的提取,下图为模型的总体结构:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

其在->且在?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ssd/README.md Outdated

<p align="center">
<img src="images/ssd_network.png" width="600" hspace='10'/> <br/>
图1. SSD网络结构
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 图过小,里面的英文字都看不清楚。
  2. 图中的英文字能翻译成中文么,不然小白用户还是看不懂。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

重新梳理了这一部分

ssd/README.md Outdated
图1. SSD网络结构
</p>

如图所示,候选框的生成规则是预先设定的,比如Conv7输出的特征图每个像素点会对应6个候选框,这些候选框长宽比或面积有区分。在预测阶段模型会对这些提取出来的候选框做后处理,然后输出作为最终的检测结果。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

图的解释可以多一点。12行我看的不是很明白。

  1. 生成规则是预先设定的,从图中的哪儿可以看出来?
  2. Conv7对应的6个候选框分别是图中的哪儿?也可能是图比较小,我没找到Conv7
  3. 后处理是指什么?是一个专有名词么

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

重新梳理了这一部分

ssd/README.md Outdated
data/prepare\_voc\_data.py | 准备训练PASCAL VOC数据列表

</center>
<center>表1. 示例文件</center>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

表名一般放在表格的上方,图名一般放在图的下方,请修改下位置。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ssd/README.md Outdated
</center>
<center>表1. 示例文件</center>

训练阶段需要对数据做预处理,包括裁剪、采样等,这部分操作在```image_util.py```和```data_provider.py```中完成;值得注意的是,```config/vgg_config.py```为参数配置文件,包括训练参数、神经网络参数等,本配置文件包含参数是针对PASCAL VOC数据配置的,当训练自有数据时,需要仿照该文件配置新的参数;```data/prepare_voc_data.py```脚本用来生成文件列表,包括切分训练集和测试集,使用时需要用户事先下载并解压数据,默认采用VOC2007和VOC2012。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32行中的分号都改成句号。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ssd/README.md Outdated
### 数据准备
首先需要下载数据集,VOC2007\[[2](#引用)\]和VOC2012\[[3](#引用)\],VOC2007包含训练集和测试集,VOC2012只包含训练集,将下载好的数据解压,目录结构为```VOCdevkit/{VOC2007,VOC2012}```。进入```data```目录,运行```python prepare_voc_data.py```即可生成```trainval.txt```和```test.txt```,默认```prepare_voc_data.py```和```VOCdevkit```在相同目录下,且生成的文件列表也在该目录。需注意```trainval.txt```既包含VOC2007的训练数据,也包含VOC2012的训练数据,```test.txt```只包含VOC2007的测试数据。
### 预训练模型准备
下载预训练的VGG-16模型,我们提供了一个转换好的模型,具体下载地址为:,下载好模型后,放置路径为```vgg/vgg_model.tar.gz```。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

具体下载地址为,后面是空的,是以后补上么

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

最后会补上这个链接,Mark一下

ssd/README.md Outdated
### 预训练模型准备
下载预训练的VGG-16模型,我们提供了一个转换好的模型,具体下载地址为:,下载好模型后,放置路径为```vgg/vgg_model.tar.gz```。
### 模型训练
直接执行```python train.py```即可进行训练。需要注意本示例仅支持CUDA GPU环境,无法在CPU上训练。```train.py```的一些关键执行逻辑:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么无法在CPU上训练,能否简单讲一下原因。CPU部分是后续要集成么

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

主要是CPU的训练比较慢,图像任务一般都使用GPU,这里的实现采用硬编码方式使用cuDNN,后续不打算集成CPU版本

ssd/README.md Outdated
init_model_path='./vgg/vgg_model.tar.gz')
```

调用```paddle.init```指定使用4卡GPU训练;调用```data_provider.Settings```配置数据预处理所需参数,其中```cfg.IMG_HEIGHT```和```cfg.IMG_WIDTH```在配置文件```config/vgg_config.py```中设置,这里均为300;调用```train```执行训练,其中```train_file_list```指定训练数据列表,```dev_file_list```指定评估数据列表,```init_model_path```指定预训练模型位置。训练过程中会打印一些日志信息,每训练10个batch会输出当前的轮数、当前batch的cost及mAP,每训练一个pass,会保存一次模型,默认保存在```checkpoints```目录下(注:需事先创建)。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 为什么均为300,是和论文保持统一么?
  2. mAP是什么?
  3. 需事先创建,现在的代码是没有创建,就无法保存么 @qingqing01

Copy link
Contributor Author

@pkuyym pkuyym Jul 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 300x300是论文中比较典型的一个配置,也包含512*512,用户可自行修改配置实现
  2. mAP是指mean average precision,是物体检测领域常用的评测指标
  3. 应该是需要要求目录已经存在的

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可将回复的3点加入文中。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ssd/README.md Outdated
另一个重要的事情就是根据图像大小及检测物体的大小等更改网络结构的配置,主要是仿照```config/vgg_config.py```创建自己的配置文件,参数设置经验请参照论文\[[1](#引用)\]。

## 引用
1. Liu, Wei, et al. "SSD: Single shot multibox detector." European conference on computer vision. Springer, Cham, 2016.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请放论文链接地址,论文作者可以多列几个。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ssd/README.md Outdated

## 引用
1. Liu, Wei, et al. "SSD: Single shot multibox detector." European conference on computer vision. Springer, Cham, 2016.
2. http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里应该放 The PASCAL Visual Object Classes Challenge 2007,然后带155行的链接。下同。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

@pkuyym pkuyym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow comments and reorganize SSD introduction section.

ssd/README.md Outdated
## 概述
SSD全称为Single Shot MultiBox Detector,是目标检测领域较新且效果较好的检测算法之一,具体参见论文\[[1](#引用)\]。SSD算法主要特点是检测速度快且检测精度高。PaddlePaddle已集成SSD算法,本示例旨在介绍如何使用PaddlePaddle的SSD模型进行目标检测。下文展开顺序为:首先简要介绍SSD原理,然后介绍示例包含文件及作用,接着介绍如何在PASCAL VOC数据集上训练、评估及检测,最后简要介绍如何在自有数据集上使用SSD。
## SSD原理
SSD使用一个卷积神经网络实现“端到端”的检测,所谓“端到端”指输入为原始图像,输出为检测结果,无需借助外部工具或流程进行特征提取、候选框生成等。论文中SSD的基础模型为VGG-16,其在VGG-16的某些层后面增加了一些额外的层进行候选框的提取,下图为模型的总体结构:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ssd/README.md Outdated
data/prepare\_voc\_data.py | 准备训练PASCAL VOC数据列表

</center>
<center>表1. 示例文件</center>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ssd/README.md Outdated
</center>
<center>表1. 示例文件</center>

训练阶段需要对数据做预处理,包括裁剪、采样等,这部分操作在```image_util.py```和```data_provider.py```中完成;值得注意的是,```config/vgg_config.py```为参数配置文件,包括训练参数、神经网络参数等,本配置文件包含参数是针对PASCAL VOC数据配置的,当训练自有数据时,需要仿照该文件配置新的参数;```data/prepare_voc_data.py```脚本用来生成文件列表,包括切分训练集和测试集,使用时需要用户事先下载并解压数据,默认采用VOC2007和VOC2012。
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ssd/README.md Outdated

## PASCAL VOC数据集
### 数据准备
首先需要下载数据集,VOC2007\[[2](#引用)\]和VOC2012\[[3](#引用)\],VOC2007包含训练集和测试集,VOC2012只包含训练集,将下载好的数据解压,目录结构为```VOCdevkit/{VOC2007,VOC2012}```。进入```data```目录,运行```python prepare_voc_data.py```即可生成```trainval.txt```和```test.txt```,默认```prepare_voc_data.py```和```VOCdevkit```在相同目录下,且生成的文件列表也在该目录。需注意```trainval.txt```既包含VOC2007的训练数据,也包含VOC2012的训练数据,```test.txt```只包含VOC2007的测试数据。
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ssd/README.md Outdated
### 预训练模型准备
下载预训练的VGG-16模型,我们提供了一个转换好的模型,具体下载地址为:,下载好模型后,放置路径为```vgg/vgg_model.tar.gz```。
### 模型训练
直接执行```python train.py```即可进行训练。需要注意本示例仅支持CUDA GPU环境,无法在CPU上训练。```train.py```的一些关键执行逻辑:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

主要是CPU的训练比较慢,图像任务一般都使用GPU,这里的实现采用硬编码方式使用cuDNN,后续不打算集成CPU版本

ssd/README.md Outdated
另一个重要的事情就是根据图像大小及检测物体的大小等更改网络结构的配置,主要是仿照```config/vgg_config.py```创建自己的配置文件,参数设置经验请参照论文\[[1](#引用)\]。

## 引用
1. Liu, Wei, et al. "SSD: Single shot multibox detector." European conference on computer vision. Springer, Cham, 2016.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ssd/README.md Outdated

## 引用
1. Liu, Wei, et al. "SSD: Single shot multibox detector." European conference on computer vision. Springer, Cham, 2016.
2. http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ssd/README.md Outdated

<p align="center">
<img src="images/ssd_network.png" width="600" hspace='10'/> <br/>
图1. SSD网络结构
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

重新梳理了这一部分

ssd/README.md Outdated
图1. SSD网络结构
</p>

如图所示,候选框的生成规则是预先设定的,比如Conv7输出的特征图每个像素点会对应6个候选框,这些候选框长宽比或面积有区分。在预测阶段模型会对这些提取出来的候选框做后处理,然后输出作为最终的检测结果。
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

重新梳理了这一部分

ssd/README.md Outdated
### 数据准备
首先需要下载数据集,VOC2007\[[2](#引用)\]和VOC2012\[[3](#引用)\],VOC2007包含训练集和测试集,VOC2012只包含训练集,将下载好的数据解压,目录结构为```VOCdevkit/{VOC2007,VOC2012}```。进入```data```目录,运行```python prepare_voc_data.py```即可生成```trainval.txt```和```test.txt```,默认```prepare_voc_data.py```和```VOCdevkit```在相同目录下,且生成的文件列表也在该目录。需注意```trainval.txt```既包含VOC2007的训练数据,也包含VOC2012的训练数据,```test.txt```只包含VOC2007的测试数据。
### 预训练模型准备
下载预训练的VGG-16模型,我们提供了一个转换好的模型,具体下载地址为:,下载好模型后,放置路径为```vgg/vgg_model.tar.gz```。
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

最后会补上这个链接,Mark一下

ssd/README.md Outdated
## SSD原理
SSD使用一个卷积神经网络实现“端到端”的检测,所谓“端到端”指输入为原始图像,输出为检测结果,无需借助外部工具或流程进行特征提取、候选框生成等。论文中SSD的基础模型为VGG-16,其在VGG-16的某些层后面增加了一些额外的层进行候选框的提取,下图为模型的总体结构
SSD使用一个卷积神经网络实现“端到端”的检测,所谓“端到端”指输入为原始图像,输出为检测结果,无需借助外部工具或流程进行特征提取、候选框生成等。论文中SSD的基础模型为VGG16\[[2](#引用)\],不同于原始VGG16网络模型,SSD做了一些改变:1. 将最后的FC6、FC7全连接层变为卷积层,卷积层参数通过对原始FC6、FC7参数采样得到; 2. 将Pool5层的参数由2x2-s2(kernel大小为2x2,stride size为2)更改为3x3-s1-p1(kernel大小为3x3,stride size为1,padding size为1)3. 在conv4\_3、conv7、conv8\_2、conv9\_2、conv10\_2及pool11层后面接了priorbox层,priorbox层的主要目的是根据输入的feature map生成一系列的矩形候选框。关于SSD的更详细的介绍可以参考论文\[[1](#引用)\]。下图为模型(300x300)的总体结构
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1,2,3点写成如下格式,会更清爽

  1. XXX
  2. XXX
  3. XXX

ssd/README.md Outdated
VOCdevkit/VOC2007/JPEGImages/000019.jpg VOCdevkit/VOC2007/Annotations/000019.xml
VOCdevkit/VOC2007/JPEGImages/000020.jpg VOCdevkit/VOC2007/Annotations/000020.xml
VOCdevkit/VOC2007/JPEGImages/000021.jpg VOCdevkit/VOC2007/Annotations/000021.xml
VOCdevkit/VOC2007/JPEGImages/000023.jpg VOCdevkit/VOC2007/Annotations/000023.xml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里列的太多了,列个2-3行就够了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ssd/README.md Outdated
3. http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html
1. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. "SSD: Single shot multibox detector." European conference on computer vision. Springer, Cham, 2016.
2. Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
3. The PASCAL Visual Object Classes Challenge 2007. http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 改成这种格式,下同:The PASCAL Visual Object Classes Challenge 2007.
  2. 前面两篇也加下链接地址

ssd/README.md Outdated
### 预训练模型准备
下载预训练的VGG-16模型,我们提供了一个转换好的模型,具体下载地址为:,下载好模型后,放置路径为```vgg/vgg_model.tar.gz```。
### 模型训练
直接执行```python train.py```即可进行训练。需要注意本示例仅支持CUDA GPU环境,无法在CPU上训练。```train.py```的一些关键执行逻辑:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

主要是CPU的训练比较慢,图像任务一般都使用GPU,这里的实现采用硬编码方式使用cuDNN,后续不打算集成CPU版本

可将回复的这段话加入文中。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ssd/README.md Outdated
init_model_path='./vgg/vgg_model.tar.gz')
```

调用```paddle.init```指定使用4卡GPU训练;调用```data_provider.Settings```配置数据预处理所需参数,其中```cfg.IMG_HEIGHT```和```cfg.IMG_WIDTH```在配置文件```config/vgg_config.py```中设置,这里均为300;调用```train```执行训练,其中```train_file_list```指定训练数据列表,```dev_file_list```指定评估数据列表,```init_model_path```指定预训练模型位置。训练过程中会打印一些日志信息,每训练10个batch会输出当前的轮数、当前batch的cost及mAP,每训练一个pass,会保存一次模型,默认保存在```checkpoints```目录下(注:需事先创建)。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可将回复的3点加入文中。

Copy link
Contributor Author

@pkuyym pkuyym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow comments.

ssd/README.md Outdated
VOCdevkit/VOC2007/JPEGImages/000019.jpg VOCdevkit/VOC2007/Annotations/000019.xml
VOCdevkit/VOC2007/JPEGImages/000020.jpg VOCdevkit/VOC2007/Annotations/000020.xml
VOCdevkit/VOC2007/JPEGImages/000021.jpg VOCdevkit/VOC2007/Annotations/000021.xml
VOCdevkit/VOC2007/JPEGImages/000023.jpg VOCdevkit/VOC2007/Annotations/000023.xml
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ssd/README.md Outdated
### 预训练模型准备
下载预训练的VGG-16模型,我们提供了一个转换好的模型,具体下载地址为:,下载好模型后,放置路径为```vgg/vgg_model.tar.gz```。
### 模型训练
直接执行```python train.py```即可进行训练。需要注意本示例仅支持CUDA GPU环境,无法在CPU上训练。```train.py```的一些关键执行逻辑:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ssd/README.md Outdated
init_model_path='./vgg/vgg_model.tar.gz')
```

调用```paddle.init```指定使用4卡GPU训练;调用```data_provider.Settings```配置数据预处理所需参数,其中```cfg.IMG_HEIGHT```和```cfg.IMG_WIDTH```在配置文件```config/vgg_config.py```中设置,这里均为300;调用```train```执行训练,其中```train_file_list```指定训练数据列表,```dev_file_list```指定评估数据列表,```init_model_path```指定预训练模型位置。训练过程中会打印一些日志信息,每训练10个batch会输出当前的轮数、当前batch的cost及mAP,每训练一个pass,会保存一次模型,默认保存在```checkpoints```目录下(注:需事先创建)。
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ssd/README.md Outdated
SSD使用一个卷积神经网络实现“端到端”的检测,所谓“端到端”指输入为原始图像,输出为检测结果,无需借助外部工具或流程进行特征提取、候选框生成等。论文中SSD的基础模型为VGG16\[[2](#引用)\],不同于原始VGG16网络模型,SSD做了一些改变:

1. 将最后的fc6、fc7全连接层变为卷积层,卷积层参数通过对原始fc6、fc7参数采样得到
2. 将pool5层的参数由2x2-s2(kernel大小为2x2,stride size为2)更改为3x3-s1-p1(kernel大小为3x3,stride size为1,padding size为1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1和2后面加句号。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ssd/README.md Outdated
另一个重要的事情就是根据图像大小及检测物体的大小等更改网络结构的配置,主要是仿照```config/vgg_config.py```创建自己的配置文件,参数设置经验请参照论文\[[1](#引用)\]。

## 引用
1. [Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. "SSD: Single shot multibox detector." European conference on computer vision. Springer, Cham, 2016.](https://arxiv.org/abs/1512.02325)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single shot multibox detector. European conference on computer vision. Springer, Cham, 2016.
  2. Simonyan, Karen, and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ssd/README.md Outdated
init_model_path='./vgg/vgg_model.tar.gz')
```

调用```paddle.init```指定使用4卡GPU训练;调用```data_provider.Settings```配置数据预处理所需参数,其中```cfg.IMG_HEIGHT```和```cfg.IMG_WIDTH```在配置文件```config/vgg_config.py```中设置,这里均为300,300x300是一个典型配置,兼顾效率和检测精度,也可以通过修改配置文件扩展到500x500;调用```train```执行训练,其中```train_file_list```指定训练数据列表,```dev_file_list```指定评估数据列表,```init_model_path```指定预训练模型位置。训练过程中会打印一些日志信息,每训练10个batch会输出当前的轮数、当前batch的cost及mAP(mean Average Precision),每训练一个pass,会保存一次模型,默认保存在```checkpoints```目录下(注:需事先创建)。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

由于这段内容较多,分点进行描述会清楚,可改成如下:

  • 调用paddle.init指定使用4卡GPU训练。
  • 调用data_provider.Settings配置数据预处理所需参数,其中cfg.IMG_HEIGHTcfg.IMG_WIDTH在配置文件config/vgg_config.py中设置,这里均为300。300x300是一个典型配置,兼顾效率和检测精度,也可以通过修改配置文件扩展到500x500。
  • 调用train执行训练,其中train_file_list指定训练数据列表,dev_file_list指定评估数据列表,init_model_path指定预训练模型位置。
  • 训练过程中会打印一些日志信息,每训练10个batch会输出当前的轮数、当前batch的cost及mAP(mean Average Precision),每训练一个pass,会保存一次模型,默认保存在checkpoints目录下(注:需事先创建)。

mAP的括号里加个中文名词,是平均精度的意思么?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ssd/README.md Outdated

1. 将最后的fc6、fc7全连接层变为卷积层,卷积层参数通过对原始fc6、fc7参数采样得到
2. 将pool5层的参数由2x2-s2(kernel大小为2x2,stride size为2)更改为3x3-s1-p1(kernel大小为3x3,stride size为1,padding size为1)
3. 在conv4\_3、conv7、conv8\_2、conv9\_2、conv10\_2及pool11层后面接了priorbox层,priorbox层的主要目的是根据输入的feature map生成一系列的矩形候选框。关于SSD的更详细的介绍可以参考论文\[[1](#引用)\]。下图为模型(300x300)的总体结构:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

关于SSD的更详细的介绍可以参考论文[1]。下图为模型(300x300)的总体结构:

这句不在第3点内,应该换行写。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# SSD目标检测
## 概述
SSD全称为Single Shot MultiBox Detector,是目标检测领域较新且效果较好的检测算法之一,具体参见论文\[[1](#引用)\]。SSD算法主要特点是检测速度快且检测精度高。PaddlePaddle已集成SSD算法,本示例旨在介绍如何使用PaddlePaddle中的SSD模型进行目标检测。下文展开顺序为:首先简要介绍SSD原理,然后介绍示例包含文件及作用,接着介绍如何在PASCAL VOC数据集上训练、评估及检测,最后简要介绍如何在自有数据集上使用SSD。
## SSD原理
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. SSD原理部分够清楚了么 @qingqing01 @lcy-seso ,还需要更详细一点么?
  2. 概述中写了:SSD的检测速度快且检测精度高,文章中需要简单介绍为什么在VGG16上做了这些改变后,就速度快且精度高了呢?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

@pkuyym pkuyym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow comments and add more introduction information for SSD.

ssd/README.md Outdated
SSD使用一个卷积神经网络实现“端到端”的检测,所谓“端到端”指输入为原始图像,输出为检测结果,无需借助外部工具或流程进行特征提取、候选框生成等。论文中SSD的基础模型为VGG16\[[2](#引用)\],不同于原始VGG16网络模型,SSD做了一些改变:

1. 将最后的fc6、fc7全连接层变为卷积层,卷积层参数通过对原始fc6、fc7参数采样得到
2. 将pool5层的参数由2x2-s2(kernel大小为2x2,stride size为2)更改为3x3-s1-p1(kernel大小为3x3,stride size为1,padding size为1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ssd/README.md Outdated

1. 将最后的fc6、fc7全连接层变为卷积层,卷积层参数通过对原始fc6、fc7参数采样得到
2. 将pool5层的参数由2x2-s2(kernel大小为2x2,stride size为2)更改为3x3-s1-p1(kernel大小为3x3,stride size为1,padding size为1)
3. 在conv4\_3、conv7、conv8\_2、conv9\_2、conv10\_2及pool11层后面接了priorbox层,priorbox层的主要目的是根据输入的feature map生成一系列的矩形候选框。关于SSD的更详细的介绍可以参考论文\[[1](#引用)\]。下图为模型(300x300)的总体结构:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ssd/README.md Outdated
init_model_path='./vgg/vgg_model.tar.gz')
```

调用```paddle.init```指定使用4卡GPU训练;调用```data_provider.Settings```配置数据预处理所需参数,其中```cfg.IMG_HEIGHT```和```cfg.IMG_WIDTH```在配置文件```config/vgg_config.py```中设置,这里均为300,300x300是一个典型配置,兼顾效率和检测精度,也可以通过修改配置文件扩展到500x500;调用```train```执行训练,其中```train_file_list```指定训练数据列表,```dev_file_list```指定评估数据列表,```init_model_path```指定预训练模型位置。训练过程中会打印一些日志信息,每训练10个batch会输出当前的轮数、当前batch的cost及mAP(mean Average Precision),每训练一个pass,会保存一次模型,默认保存在```checkpoints```目录下(注:需事先创建)。
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ssd/README.md Outdated
另一个重要的事情就是根据图像大小及检测物体的大小等更改网络结构的配置,主要是仿照```config/vgg_config.py```创建自己的配置文件,参数设置经验请参照论文\[[1](#引用)\]。

## 引用
1. [Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. "SSD: Single shot multibox detector." European conference on computer vision. Springer, Cham, 2016.](https://arxiv.org/abs/1512.02325)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# SSD目标检测
## 概述
SSD全称为Single Shot MultiBox Detector,是目标检测领域较新且效果较好的检测算法之一,具体参见论文\[[1](#引用)\]。SSD算法主要特点是检测速度快且检测精度高。PaddlePaddle已集成SSD算法,本示例旨在介绍如何使用PaddlePaddle中的SSD模型进行目标检测。下文展开顺序为:首先简要介绍SSD原理,然后介绍示例包含文件及作用,接着介绍如何在PASCAL VOC数据集上训练、评估及检测,最后简要介绍如何在自有数据集上使用SSD。
## SSD原理
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

ssd/README.md Outdated

图中每个矩形盒子代表一个卷积层,最后的两个矩形框分别表示汇总各卷积层输出结果和后处理阶段。具体地,在预测阶段网络会输出一组候选矩形框,每个矩形包含两类信息:位置和类别得分,图中倒数第二个矩形框即表示网络的检测结果的汇总处理,由于候选矩形框数量较多且很多矩形框重叠严重,这时需要经过后处理来筛选出质量较高的少数矩形框,这里的后处理主要指非极大值抑制(Non-maximum Suppression)。

从SSD的网络结构可以看出,候选矩形框在多个feature map上生成,不同的feature map具有的感受野不同,这样可以在不同尺度扫描图像,相对于其他检测方法可以生成更丰富的候选框,从而提高检测精度;另一方面SSD对VGG16的扩展部分以较小的代价实现对候选框的位置和类别得分的计算,整个过程只需要一个卷积神经网络完成,所以速度较快。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feature map -> 特征图(feature map)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ssd/README.md Outdated
ftest.write(item[0] + ' ' + item[1] + '\n')
```

该函数首先对每个year的数据进行处理,然后将训练图像的文件路径列表进行随机打乱,最后保存训练文件列表和测试文件列表。默认```prepare_voc_data.py```和```VOCdevkit```在相同目录下,且生成的文件列表也在该目录。需注意```trainval.txt```既包含VOC2007的训练数据,也包含VOC2012的训练数据,```test.txt```只包含VOC2007的测试数据。我们这里提供```trainval.txt```前几行输入作为样例:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

每个year -> 每一年(year)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ssd/README.md Outdated
2. 调用```data_provider.Settings```配置数据预处理所需参数,其中```cfg.IMG_HEIGHT```和```cfg.IMG_WIDTH```在配置文件```config/vgg_config.py```中设置,这里均为300,300x300是一个典型配置,兼顾效率和检测精度,也可以通过修改配置文件扩展到500x500。
3. 调用```train```执行训练,其中```train_file_list```指定训练数据列表,```dev_file_list```指定评估数据列表,```init_model_path```指定预训练模型位置。
4. 训练过程中会打印一些日志信息,每训练10个batch会输出当前的轮数、当前batch的cost及mAP(mean Average Precision,平均精度均值),每训练一个pass,会保存一次模型,默认保存在```checkpoints```目录下(注:需事先创建)。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

最好加上mAP曲线图~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ssd/README.md Outdated
```

其中```eval_file_list```指定图像路径列表;```save_path```指定预测结果保存路径;```data_args```如上;```batch_size```为每多少样本预测一次;```model_path```指模型的位置;```threshold```为置信度阈值,只有得分大于或等于该值的才会输出。示例还提供了一个可视化脚本,直接运行```python visual.py```即可,须指定输出检测结果路径及输出目录。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以给出一张可视化图~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

```python
infer(
eval_file_list='./data/infer.txt',
save_path='infer.res',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以给出infer.res结果样例~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

name=layer_name + "2",
input=conv1,
filter_size=3,
num_channels=num_filters,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这行可以去掉~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

封装了conv_group,简化配置

name=layer_name + "3",
input=conv2,
filter_size=3,
num_channels=num_filters,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这行可去掉~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

封装了conv_group,简化配置

pool = paddle.layer.img_pool(
input=conv3,
pool_size=pool_size,
num_channels=num_filters,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

去掉这行

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

封装了conv_group,简化配置

name="conv1_2",
input=conv1_1,
filter_size=3,
num_channels=64,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_channels除过第一个conv需要设置外,后面的conv可以不用设置的~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

封装了conv_group,简化配置

pool_type=paddle.pooling.CudnnMax(),
pool_size=2,
num_channels=128,
stride=2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

img_conv_group可以用来简化上面配置。 但是参数名字可能会变化,导致无法加载初始化模型。

但也可以封装下conv层,让上面配置简单一些,每个conv都写一堆参数,导致配置比较长,不够简洁~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

封装了conv_group,简化配置

@pkuyym pkuyym force-pushed the fix-136 branch 5 times, most recently from 61b769e to 125c2c3 Compare August 11, 2017 03:29
@pkuyym pkuyym force-pushed the fix-136 branch 12 times, most recently from 7d404e0 to f622467 Compare August 11, 2017 04:16
@qingqing01 qingqing01 merged commit e102c95 into PaddlePaddle:develop Aug 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add demo for SSD model (WIP)
3 participants