Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add scene_text_recognition #329

Merged
merged 11 commits into from
Oct 29, 2017
Merged

Conversation

peterzhang2029
Copy link
Contributor

resolve #328


在现实生活中,包括路牌、菜单、大厦标语在内的很多场景均会有文字出现,这些场景的照片中的文字为图片场景的理解提供了更多信息,\[[1](#参考文献)\]使用深度学习模型自动识别路牌中的文字,帮助街景应用获取更加准确的地址信息。

本文将针对 **场景文字识别 (STR, Scene Text Recognition)** 任务,演示如何用 PaddlePaddle 实现 一个端对端 CTC 的模型 **CRNN(Convolutional Recurrent Neural Network)**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

本例演示如何用 PaddlePaddle 完成场景文字识别任务。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

## 使用 PaddlePaddle 训练与预测

### 模型训练
训练脚本参照 [./train.py](./train.py),设置了如下命令行参数:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 什么叫:”训练脚本参照 train.py,设置了如下命令行参数“ ?这句话并不通顺。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

- `test_file_list` 测试数据的列表文件,格式同上

### 预测
预测部分由infer.py完成,本示例对于ctc的预测使用的是最优路径解码算法(CTC greedy decoder),即在每个时间步选择一个概率最大的字符。在使用过程中,需要在infer.py中指定具体的模型目录、图片固定尺寸、batch_size和图片文件的列表文件。例如:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

预测使用的是最优路径解码算法,即:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE


import os
from paddle.v2.image import load_image
import cv2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 请按照 Paddle repo的风格,统一docstring的风格。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

@with_bn: whether with batch normal
'''
assert num % 4 == 0

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 用for循环来定义 17 ~ 56 行。
  • 如果要用 Class 来封装模型的定义,不要混用。把这个函数移入 Model 类中。

act=paddle.activation.Softmax())

# warp CTC to calculate cost for a CTC task.
if self.is_infer == False:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请加上 ctc error evaluator。

@@ -0,0 +1,102 @@
import logging
import argparse
import paddle.v2 as paddle
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请把python 本身的model 放在前面,空一行,然后是自定义的module。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

args.model_output_prefix, event.pass_id, event.batch_id,
result.cost)
with gzip.open(path, 'w') as f:
params.to_tar(f)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请为这个脚本写一个main 函数。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

在现实生活中,包括路牌、菜单、大厦标语在内的很多场景均会有文字出现,这些场景的照片中的文字为图片场景的理解提供了更多信息,\[[1](#参考文献)\]使用深度学习模型自动识别路牌中的文字,帮助街景应用获取更加准确的地址信息。

本文将针对 **场景文字识别 (STR, Scene Text Recognition)** 任务,演示如何用 PaddlePaddle 实现 一个端对端 CTC 的模型 **CRNN(Convolutional Recurrent Neural Network)**
\[[2](#参考文献)\],具体的,本文使用如下图片进行训练,需要识别文字对应的文字 "keep"。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 请不要写本文使用如下图片进行训练,需要识别文字对应的文字 "keep"。
  • 如果要使用这个例子,请简单以这个图作为一个实例,介绍STR需要作什么。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

其中最重要的几个参数包括:

- `image_shape` 图片的尺寸
- `train_file_list` 训练数据的列表文件,每行一个路径加对应的text,格式类似:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 不要用 ”格式类似:“,格式是什么样子就是什么样子,不是 ”格式类似:“

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

@peterzhang2029
Copy link
Contributor Author

@lcy-seso update the commit

1.从官方网站下载数据\[[2](#参考文献)\](Task 2.3: Word Recognition (2013 edition)),会有三个文件: Challenge2_Training_Task3_Images_GT.zip、Challenge2_Test_Task3_Images.zip和 Challenge2_Test_Task3_GT.txt。
分别对应训练集的图片和图片对应的单词,测试集的图片,测试数据对应的单词,然后执行以下命令,对数据解压并移动至目标文件夹:

```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

```bash
以下还有几个地方须做同样修改

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


在现实生活中,包括路牌、菜单、大厦标语在内的很多场景均会有文字出现,这些场景的照片中的文字为图片场景的理解提供了更多信息,\[[1](#参考文献)\]使用深度学习模型自动识别路牌中的文字,帮助街景应用获取更加准确的地址信息。

本例将演示如何用 PaddlePaddle 完成 **场景文字识别 (STR, Scene Text Recognition)** 任务。以下图为例,给定一个场景图片,STR需要从图片中识别出对应的文字"keep":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

句末用句号

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

'''
image = load_image(path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# resize all images to a fixed shape
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

注释前可空行,注释后不要空行



if __name__ == "__main__":
model_path = "model.ctc-pass-9-batch-150-test.tar.gz"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

预测用的具体参数通过命令行传入会比较方便

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

此处的多个参数都是如此

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

:type input_image: LayerOutput
:param num: number of CONV filters.
:type num: int
:param with_bn: whether with batch normal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch normalization

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

path = "{}-pass-{}-batch-{}-test.tar.gz".format(
args.model_output_prefix, event.pass_id, event.batch_id)
with gzip.open(path, 'w') as f:
params.to_tar(f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trainer.save_parameter_to_tar(f)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

class Model(object):
def __init__(self, num_classes, shape, is_infer=False):
'''
:param num_classes: size of the character dict.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

参数解释的首字母大写,以下也需要修改

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

fixed_shape=None,
is_infer=False):
'''
:param train_image_paths_generator:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

参数含义解释首字母大写,下同

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


def load_image(self, path):
'''
load image and transform to 1-dimention vector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

函数解释首字母大写


def word2ids(self, sent):
'''
transform a word to a list of ids.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

函数解释首字母大写

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# CNN output image features, 128 float matrixes
conv_features = self.conv_groups(self.image, 8, True)

# cutting CNN output into a sequence of feature vectors, which are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cut ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

gru_backward = simple_gru(
input=sliced_feature, size=128, act=Relu(), reverse=True)

# map each step of RNN to character distribution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map the output of RNNs to ...

height=self.shape[0],
width=self.shape[1])

# label input as a ID list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a -> an


def get_file_list(image_file_list):
'''
Generate the file list for train and test data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

train -> training

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

```
- `test_file_list` 测试数据的列表文件,格式同上。
- `label_dict_path` 训练数据中标记字典的存储路径,如果指定路径中字典文件不存在,程序会使用训练数据中的标记数据自动生成标记字典。
- `model_save_dir` 模型参数会的保存目录目录, 默认为当前目录下的`models`目录。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请修改 "模型参数会的保存目录目录"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

```
- `test_file_list` :测试数据的列表文件,格式同上。
- `label_dict_path` :训练数据中标记字典的存储路径,如果指定路径中字典文件不存在,程序会使用训练数据中的标记数据自动生成标记字典。
- `model_save_dir` :模型参数的保存目录,默认为当前路径下的`models`目录。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

默认为./models


### 指定训练配置参数

`config.py` 脚本中包含了模型配置的参数以及对应的详细解释,代码如下:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

模型配置和训练相关的参数


## STR任务简介

在现实生活中,许多图片中的文字为图片所处场景的理解提供了丰富的语义信息(例如:路牌、菜单、街道标语等)。同时,针对场景图片的文字识别技术的发展也促进了一些新型应用的产生,例如:\[[1](#参考文献)\]通过使用深度学习模型来自动识别路牌中的文字,帮助街景应用获取更加准确的地址信息。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

场景图片文字识别技术, 去掉“针对” 和 第一个 “的”

...
```

通过修改 `config.py` 脚本可以实现对参数的调整。例如,通过修改 `use_gpu` 参数来指定是否使用 GPU 进行训练。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

去掉 第一个 “通过”



### 预测
预测部分由 `infer.py` 完成,使用的是最优路径解码算法,即:在每个时间步选择一个概率最大的字符。在使用过程中,需要在 `infer.py` 中指定具体的模型保存路径、图片固定尺寸、batch_size(默认设置为10)、标记词典路径和图片文件的列表文件。执行如下代码:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

默认值为10


- 由于模型依赖的 `warp CTC` 只有CUDA的实现,本模型只支持 GPU 运行。
- 本模型参数较多,占用显存比较大,实际执行时可以通过调节 `batch_size` 来控制显存占用。
- 本示例使用的数据集较小,如有需要,可以选用其他更大的数据集\[[3](#参考文献)\]来训练模型。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

保持前后一致 ,“本示例” -> “本例”

@@ -2,9 +2,9 @@

## STR任务简介

在现实生活中,包括路牌、菜单、大厦标语在内的很多场景均会有文字出现,这些场景的照片中的文字为图片场景的理解提供了更多信息,\[[1](#参考文献)\]使用深度学习模型自动识别路牌中的文字,帮助街景应用获取更加准确的地址信息。
在现实生活中,许多图片中的文字为图片所处场景的理解提供了丰富的语义信息(例如:路牌、菜单、街道标语等)。同时,场景图片文字识别技术的发展也促进了一些新型应用的产生,例如:\[[1](#参考文献)\]通过使用深度学习模型来自动识别路牌中的文字,帮助街景应用获取更加准确的地址信息。
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -21,7 +21,7 @@ pip install -r requirements.txt

### 指定训练配置参数

通过 `config.py` 脚本修改训练和模型配置参数,脚本中有对可配置参数的详细解释,示例如下:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -43,7 +43,8 @@ class ModelConfig(object):

...
```
修改 `config.py` 对参数进行调整。例如,通过修改 `use_gpu` 参数来指定是否使用 GPU 进行训练。

修改 `config.py` 脚本可以实现对参数的调整。例如,通过修改 `use_gpu` 参数来指定是否使用 GPU 进行训练。
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

```
4.训练过程中,模型参数会自动备份到指定目录,默认会保存在 `./models` 目录下。


### 预测
预测部分由 `infer.py` 完成,使用的是最优路径解码算法,即:在每个时间步选择一个概率最大的字符。在使用过程中,需要在 `infer.py` 中指定具体的模型目录、图片固定尺寸、batch_size(默认设置为10)和图片文件的列表文件。执行如下代码:
预测部分由 `infer.py` 完成,使用的是最优路径解码算法,即:在每个时间步选择一个概率最大的字符。在使用过程中,需要在 `infer.py` 中指定具体的模型保存路径、图片固定尺寸、batch_size(默认为10)、标记词典路径和图片文件的列表文件。执行如下代码:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- `model_save_dir` 模型参数会的保存目录目录, 默认为当前目录下的`models`目录。
- `test_file_list` :测试数据的列表文件,格式同上。
- `label_dict_path` :训练数据中标记字典的存储路径,如果指定路径中字典文件不存在,程序会使用训练数据中的标记数据自动生成标记字典。
- `model_save_dir` :模型参数的保存目录,默认为`./models`。
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- 本模型使用的数据集较小,可以选用其他更大的数据集\[[3](#参考文献)\]来训练需要的模型
- 由于模型依赖的 `warp CTC` 只有CUDA的实现,本模型只支持 GPU 运行
- 本模型参数较多,占用显存比较大,实际执行时可以通过调节 `batch_size` 来控制显存占用。
- 本例使用的数据集较小,如有需要,可以选用其他更大的数据集\[[3](#参考文献)\]来训练模型。
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -45,12 +45,11 @@ def __build_nn__(self):
'''
Build the network topology.
'''
# CNN output image features.
# Get the image features with CNN.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -68,7 +67,7 @@ def __build_nn__(self):
act=Relu(),
reverse=True)

# Map each step of RNN to character distribution.
# Map the output of RNN to character distribution.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

@lcy-seso lcy-seso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM.

else:
num_channels = None

tmp = img_conv_group(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return img_conv_group


## STR任务简介

在现实生活中,许多图片中的文字为图片所处场景的理解提供了丰富的语义信息(例如:路牌、菜单、街道标语等)。同时,场景图片文字识别技术的发展也促进了一些新型应用的产生,例如:\[[1](#参考文献)\]通过使用深度学习模型来自动识别路牌中的文字,帮助街景应用获取更加准确的地址信息。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • "许多图片中的文字为图片所处场景的理解提供了丰富的语义信息 " 这句话不通顺。
  • 第二句话:同时,场景图片文字识别技术的发展也促进了一些新型应用的产生,请先给“场景图片文字识别”下定义。


在现实生活中,许多图片中的文字为图片所处场景的理解提供了丰富的语义信息(例如:路牌、菜单、街道标语等)。同时,场景图片文字识别技术的发展也促进了一些新型应用的产生,例如:\[[1](#参考文献)\]通过使用深度学习模型来自动识别路牌中的文字,帮助街景应用获取更加准确的地址信息。

本例将演示如何用 PaddlePaddle 完成 **场景文字识别 (STR, Scene Text Recognition)** 任务。如下图所示,给定一张场景图片,`STR` 需要从中识别出对应的文字"keep"。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如下图所示, --> 任务如下图所示


<p align="center">
<img src="./images/503.jpg"/><br/>
图 1. 数据示例 "keep"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

输入数据示例

pip install -r requirements.txt
```

### 指定训练配置参数
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

修改配置参数


### 指定训练配置参数

`config.py` 脚本中包含了模型配置和训练相关的参数以及对应的详细解释,代码如下:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

代码片段如下。

...
```

修改 `config.py` 脚本可以实现对参数的调整。例如,通过修改 `use_gpu` 参数来指定是否使用 GPU 进行训练。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 通过修改

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加 ”通过” 就缺乏主语了 @lcy-seso

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

祈使句可以不需要主语~ 不过这里都可以。我觉得可以按照个人意愿酌情修改。


```

- `train_file_list` :训练数据的列表文件,每行由图片的存储路径和对应的标记文本组成,具体格式为:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

格式为:

Copy link
Collaborator

@lcy-seso lcy-seso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT.

@lcy-seso lcy-seso merged commit d2d3b0e into PaddlePaddle:develop Oct 29, 2017
@peterzhang2029 peterzhang2029 deleted the str_dev branch October 31, 2017 05:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add scene_text_recognition
3 participants