This repo offers varied solution to Scene text detection based on
-
CTPN (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be found here.
-
& also ,alternative older approaches, such as
- CRNN Model
- DenseNet-OCR
Please Note:I have to reonstruct this repo. The repo is written based on older dependencies**, such as
- tensorflow-gpu==1.4.0 (to be upgraded to latest stable version - tensorflow-gpu==1.12. 0 and cuda==9.0 , the compatible cuDNN version==7.1. 4).
- Model is saved in hdf5 (not so optimized model servable / model serialization format). I'll re-factor the code with TF2.0, and save the model as protobuff (.pb) model serialization format.
- It's trained on GTX1070 GPU. I'm planning to retrain this on TPU. IT IS VERY WELL POSSIBLE TO run the program on CPU only, but it's extremely slow due to the non-optimal CPU implementation.
- I'll reonstruct this repo with serving using TF_Serving for Quantization and Pruning.
CTPN Model Architecture | NVIDIA-GPU Architecture |
---|---|
Please refer to CTPN Architecture explaine | Please refer to NVIDIA-GPU |
VGGNet -> BLSTM of 256D ->FC of 512D ->output layer MLP for text/non-text scores | check CUDA version using nvcc --version . cuDNN version using cat /usr/include/cudnn.h -pipeto- grep CUDNN_MAJOR -A 2 . tensorflow-gpu version using pip freeze -pipeto- grep tensorflow-gpu |
- CTPN Model
- DenseNet-OCR
- CRNN
CTPN Model is the latest as of date. & that's the way to go about. As
- CTPN is latest & the most effective for Scene text detection.
- Even though most data sets are based on English, CTPN perform well in Chinese positioning as well.
- CTPN Model for Scene Text Detection
- DenseNet-OCR Model for Scene Text Detection
- CRNN Model for Scene Text Detection
- Vantage view of 2 older AI approaches
CRNN: VGGNet + BLSTM + BLSTM + CTC
DenseNet-OCR :DenseNet + CTC
- Result Comparision
Model Architecture | run time on GPU | Accuracy | Model Servable size |
---|---|---|---|
CRNN | 60ms | 0.972 | oops missed it. |
DenseNet-OCR | 8ms | 0.982 | 18.9MB |
NOTICE:
all the photos used below are collected from the internet. If it affects you, please contact me to delete them.
- oriented text connector has been implemented, i's working, but still need futher improvement.
Interactive Live Demo of 'Scene Text Detection' Interactive demo reference
End to End Chinese OCR by CTPN Model End to End Chinese OCR - a forked repo