Paper: Real-time Scene Text Detection with Differentiable Binarization
Label: Text Detection
In recent years, segmentation based methods are very popular in scene text detection, because segmentation results can more accurately describe various shapes of scene text, such as curved text. However, post-processing of binarization is essential for segmentation based detection, which converts the probability graph generated by the segmentation method into the boundary box/region of the text. DBNet has proposed a module called differentiable binarization (DB), It can perform the binarization process in the segmentation network. The segmentation network optimized with the DB module can adaptively set the binarization threshold, which not only simplifies the post-processing, but also improves the performance of text detection.
Datasets used: ICDAR2015
- Size: 132M
- Training Set:
- image: 88.5M(1000 images)
- label: 157KB
- Evaluation Set:
- image: 43.3M(500 images)
- label: 244KB
- Training Set:
- Data format: image, label
- Device(Ascend/GPU/CPU)
- Use Ascend/GPU/CPU as hardware environment. Refer to MindSpore to install the runtime environment.
- MindSpore >= 1.9
git clone https://gitee.com/mindspore/models.git
cd models/official/cv/DBNet
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
Model | pretrained Model | config | Train Set | Test Set | Device Num | Epoch | Test Size | Recall | Precision | Hmean | CheckPoint | Graph Train Log |
---|---|---|---|---|---|---|---|---|---|---|---|---|
DBNet-R18 | R18 | cfg | ICDAR2015 Train | ICDAR2015 Test | 1 | 1200 | 736 | 78.63 | 84.21 | 81.32 | download | download |
DBNet-R50 | R50 | cfg | ICDAR2015 Train | ICDAR2015 Test | 1 | 1200 | 736 | 81.05 | 88.07 | 84.41 | download | download |
device | Model | dataset | Params(M) | PyNative train 1P bs=16 (ms/step) | PyNative train 8P bs=8 (ms/step) | PyNative infer(FPS) | Graph train 1P bs=16 (ms/step) | Graph train 8P bs=8 (ms/step) | Graph infer(FPS) |
---|---|---|---|---|---|---|---|---|---|
Ascend | DBNet-R18 | ICDAR2015 | 11.78 M | 370 | 530 | - | 224 | 195 | 40.62 |
GPU | DBNet-R18 | ICDAR2015 | 11.78 M | 710 | 880 | - | 560 | 435 | 30.97 |
Ascend | DBNet-R50 | ICDAR2015 | 24.28 M | 524 | 680 | - | 273 | 220 | 33.88 |
GPU | DBNet-R50 | ICDAR2015 | 24.28 M | 935 | 1054 | - | 730 | 547 | 23.95 |
This model is greatly affected by data processing, and the performance data on different machines fluctuate greatly. The above data are for reference.
The above data are tested at:
Ascend 910 32G 8 devices; Operating system: Euler2.8; Memory: 756 G; ARM 96 cores CPU;
GPU v100 PCIE 32G 8 devices; Operating system: Ubuntu 18.04; Memory: 502 G; x86 72 cores CPU.
Run standalone:
bash run_standalone_train.sh [CONFIG_PATH] [DEVICE_ID] [LOG_NAME](optional)
Run distribution:
bash run_distribution_train.sh [DEVICE_NUM] [CONFIG_PATH] [LOG_NAME](optional)
Evaluation:
bash run_eval.sh [CONFIG_PATH] [CKPT_PATH] [DEVICE_ID] [LOG_NAME](optional)
If you need to modify the device or other configurations, please modify the corresponding items in the configuration file.
bash run_standalone_train.sh [CONFIG_PATH] [DEVICE_ID] [LOG_NAME](optional)
# CONFIG_PATH: The configuration file path, device target default is Ascend. If you need to modify it, please modify the device in the config file.
# DEVICE_ID: Device id used for training
# LOG_NAME: The name of the saved log and output folder. The default is standalone_train
Executing the above command will run in the background. You can view the results through the [LOG_NAME].txt file
After the training, you can find the checkpoint file in [LOG_NAME].
bash run_distribution_train.sh [DEVICE_NUM] [CONFIG_PATH] [LOG_NAME](optional)
# DEVICE_NUM: Device number used for training.
# CONFIG_PATH: The configuration file path, device target default is Ascend. If you need to modify it, please modify the device in the config file.
# LOG_NAME: The name of the saved log and output folder. The default is distribution_train
Executing the above command will run in the background. You can view the results through the [LOG_NAME].txt file.
- Configure the ModelArts parameter in the config file:
- setting enable_modelarts=True
- Setting OBS dataset path data_url:
- Set OBS training return path train_url:
- Referring to ModelArts executing training.
bash run_eval.sh [CONFIG_PATH] [CKPT_PATH] [DEVICE_ID] [LOG_NAME](optional)
# CONFIG_PATH: The configuration file path, device target default is Ascend. If you need to modify it, please modify the device in the config file.
# DEVICE_ID: Device id used for training
# LOG_NAME: The name of the saved log and output folder. The default is eval
Executing the above command will run in the background. You can view the results through the [LOG_NAME].txt file.
python export.py --config_path=[CONFIG_PATH] --ckpt_path=[CKPT_PATH]
You can find the MINDIR file in output_dir
in config file.
Plaese refer to MindSpore Inference with C++ Deployment Guide to set environment variables.
bash scripts/run_cpp_infer.sh [MINDIR_PATH] [CONFIG_PATH] [OUTPUT_DIR] [DEVICE_TARGET] [DEVICE_ID]
# MINDIR_PATH: The path of MindIR file
# CONFIG_PATH: The configuration file path
# OUTPUT_DIR: Data preprocessing and result saving path
# DEVICE_TARGET: Should be in [Ascend, GPU, CPU], 310 Inference choose Ascend
# DEVICE_ID: Device id
Models only provide scripts for downloading and preprocessing public data sets. We do not own these datasets, nor are we responsible for their quality or maintenance. Please ensure that you have the permission to use the datasets under the permission of the datasets. The models trained on these datasets are only used for non-commercial research and teaching purposes.
To the dataset owner: If you do not want to include the data set in MindSpore models or want to update it in any way, we will delete or update all public content as required. Please contact us through Gitee. Thank you for your understanding and contribution to the community.
This version of DBNet draws on some excellent open source projects, including:
https://github.com/MhLiao/DB.git
https://gitee.com/yanan0122/dbnet-and-dbnet_pp-by-mind-spore.git
Please refer to Models FAQ to find some common public questions.
Q: When there is not enough memory or too many threads with WARNING, how to solve it?
A: Adjust the num_workers
, prefetch_size
, max_rowsize
in configuration file.
Generally, excessive CPU consumption needs to be reduced num_workers
; Excessive memory consumption needs to be reduced num_workers
, prefetch_size
and max_rowsize
.
Q: What to do if loss does not converge in GPU environment?
A: Setting mix_precision
to False
in configuration file.
Q: Why TotalText has dataset interface but no configuration file?
A: TotalText needs to use the pretrained parameters on the SynthText dataset. Currently, no pre-trained parameter file on the SynthText dataset is provided.