Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
datasets add link to CASIA-10K (google drive) Nov 14, 2018
.DS_Store update Dec 28, 2018
README.md update Dec 28, 2018

README.md

SceneTextPapers

Tracking the latest progress in Scene Text Detection and Recognition: must-read papers well organized

Information about this repository

This repo serves as a complement to our working paper:

  • Scene Text Detection and Recognition: The Deep Learning Era. Shangbang Long, Xin He, Cong Yao. Draft Version

Papers

I. Other Survey Papers:

  1. Scene text detection and recognition: Recent advances and future trends. Zhu, Yingying and Yao, Cong and Bai, Xiang. Frontiers of Computer Science, 2016[paper]
  2. Text detection, tracking and recognition in video: A comprehensive survey. Yin, Xu-Cheng and Zuo, Ze-Yu and Tian, Shu and Liu, Cheng-Lin. TIP, 2016 [paper]
  3. Text detection and recognition in imagery: A survey. Ye, Qixiang and Doermann, David. TPAMI, 2015 [paper]
  4. Text localization and recognition in images and video. Uchida, Seiichi. 2014 [paper]

II. Main: Scene Text Detection and Recognition

2.1 Detection

2.1.1 Pipeline Simplification
Anchor-based methods
  1. Single Shot Text Detector With Regional Attention. He, Pan and Huang, Weilin and He, Tong and Zhu, Qile and Qiao, Yu and Li, Xiaolin. ICCV, 2017 [paper] [code]
  2. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. Liao, Minghui and Shi, Baoguang and Bai, Xiang and Wang, Xinggang and Liu, Wenyu. AAAI, 2017 [paper] [code]
  3. Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection. Liu, Yuliang and Jin, Lianwen. CVPR, 2017 [paper]
  4. Detecting Oriented Text in Natural Images by Linking Segments. Shi, Baoguang and Bai, Xiang and Belongie, Serge. CVPR, 2017 [paper] [code]
  5. EAST: An Efficient and Accurate Scene Text Detector. Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun. CVPR, 2017 [paper] [code]
Region proposal methods
  1. Detecting Curve Text in the Wild: New Dataset and New Solution. Yuliang, Liu and Lianwen, Jin and Shuaitao, Zhang and Sheng, Zhang. 2017 [paper] [code]
  2. R2CNN: rotational region CNN for orientation robust scene text detection. Jiang, Yingying and Zhu, Xiangyu and Wang, Xiaobing and Yang, Shuli and Li, Wei and Wang, Hua and Fu, Pei and Luo, Zhenbo. 2017 [paper]
  3. Arbitrary-Oriented Scene Text Detection via Rotation Proposals. Ma, Jianqi and Shao, Weiyuan and Ye, Hao and Wang, Li and Wang, Hong and Zheng, Yingbin and Xue, Xiangyang. T MULTIMEDIA, 2017 [paper] [code]
  4. weakly supervised text attention network for generating text proposals in scene images. Rong, Li and MengYi, En and JianQiang, Li and HaiBin, Zhang. ICDAR, 2017 [paper]
  5. Rotation-Sensitive Regression for Oriented Scene Text Detection. Liao, Minghui and Zhu, Zhen and Shi, Baoguang and Xia, Gui-song and Bai, Xiang. CVPR, 2018 [paper] [code]
  6. Feature Enhancement Network: A Refined Scene Text Detector. Sheng, Zhang and Yuliang, Liu and Lianwen, Jin and Canjie, Luo. AAAI, 2017 [paper]
2.1.2 Differnt Prediction Units
Text instance level
  1. Detecting Curve Text in the Wild: New Dataset and New Solution. Yuliang, Liu and Lianwen, Jin and Shuaitao, Zhang and Sheng, Zhang. 2017 [paper] [code]
  2. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. Liao, Minghui and Shi, Baoguang and Bai, Xiang and Wang, Xinggang and Liu, Wenyu. AAAI, 2017 [paper] [code]
  3. EAST: An Efficient and Accurate Scene Text Detector. Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun. CVPR, 2017 [paper] [code]
  4. R2CNN: rotational region CNN for orientation robust scene text detection. Jiang, Yingying and Zhu, Xiangyu and Wang, Xiaobing and Yang, Shuli and Li, Wei and Wang, Hua and Fu, Pei and Luo, Zhenbo. 2017 [paper]
  5. Arbitrary-Oriented Scene Text Detection via Rotation Proposals. Ma, Jianqi and Shao, Weiyuan and Ye, Hao and Wang, Li and Wang, Hong and Zheng, Yingbin and Xue, Xiangyang. T MULTIMEDIA, 2017 [paper] [code]
  6. Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection. Liu, Yuliang and Jin, Lianwen. CVPR, 2017 [paper]
  7. Deep Direct Regression for Multi-Oriented Scene Text Detection. He, Wenhao and Zhang, Xu-Yao and Yin, Fei and Liu, Cheng-Lin. ICCV, 2017 [paper]
  8. Fused Text Segmentation Networks for Multi-oriented Scene Text Detection. Dai, Yuchen and Huang, Zheng and Gao, Yuting and Chen, Kai. 2017 [paper]
  9. Feature Enhancement Network: A Refined Scene Text Detector. Sheng, Zhang and Yuliang, Liu and Lianwen, Jin and Canjie, Luo. AAAI, 2017 [paper]
  10. Rotation-Sensitive Regression for Oriented Scene Text Detection. Liao, Minghui and Zhu, Zhen and Shi, Baoguang and Xia, Gui-song and Bai, Xiang. CVPR, 2018 [paper] [code]
Bottom-up (Pixel)
  1. Scene text detection via holistic, multi-channel prediction. Yao, Cong and Bai, Xiang and Sang, Nong and Zhou, Xinyu and Zhou, Shuchang and Cao, Zhimin. 2016 [paper]
  2. Multi-oriented text detection with fully convolutional networks. Zhang, Zheng and Zhang, Chengquan and Shen, Wei and Yao, Cong and Liu, Wenyu and Bai, Xiang. CVPR, 2016 [paper] [code]
  3. Self-organized Text Detection with Minimal Post-processing via Border Learning. Wu, Yue and Natarajan, Prem. CVPR, 2017 [paper]
  4. Multi-scale FCN with Cascaded Instance Aware Segmentation for Arbitrary Oriented Word Spotting in the Wild. He, Dafang and Yang, Xiao and Liang, Chen and Zhou, Zihan and Ororbia, Alexander G and Kifer, Daniel and Giles, C Lee. CVPR, 2017 [paper]
  5. Single Shot Text Detector With Regional Attention. He, Pan and Huang, Weilin and He, Tong and Zhu, Qile and Qiao, Yu and Li, Xiaolin. ICCV, 2017 [paper] [code]
  6. PixelLink: Detecting Scene Text via Instance Segmentation. Dan, Deng and Haifeng, Liu and Xuelong, Li and Deng, Cai. AAAI, 2018 [paper] [code]
Bottom-up (Components)
  1. Detecting text in natural image with connectionist text proposal network. Tian, Zhi and Huang, Weilin and He, Tong and He, Pan and Qiao, Yu. ECCV, 2016 [paper] [code]
  2. Aggregating local context for accurate scene text detection. He, Dafang and Yang, Xiao and Huang, Wenyi and Zhou, Zihan and Kifer, Daniel and Giles, C Lee. ACCV, 2016 [paper]
  3. Detecting Oriented Text in Natural Images by Linking Segments. Shi, Baoguang and Bai, Xiang and Belongie, Serge. CVPR, 2017 [paper] [code]
  4. Scene Text Detection with Novel Superpixel Based Character Candidate Extraction. Wang, Cong and Yin, Fei and Liu, Cheng-Lin. 2017 [paper]
  5. Deep Residual Text Detection Network for Scene Text. Zhu, Xiangyu and Jiang, Yingying and Yang, Shuli and Wang, Xiaobing and Li, Wei and Fu, Pei and Wang, Hua and Luo, Zhenbo. ICDAR, 2017 [paper]
  6. Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation. Lyu, Pengyuan and Yao, Cong and Wu, Wenhao and Yan, Shuicheng and Bai, Xiang. CVPR, 2018 [paper]
2.1.3 Specific Targets
Long text
  1. Detecting Oriented Text in Natural Images by Linking Segments. Shi, Baoguang and Bai, Xiang and Belongie, Serge. CVPR, 2017 [paper] [code]
  2. R2CNN: rotational region CNN for orientation robust scene text detection. Jiang, Yingying and Zhu, Xiangyu and Wang, Xiaobing and Yang, Shuli and Li, Wei and Wang, Hua and Fu, Pei and Luo, Zhenbo. 2017 [paper]
  3. Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation. Lyu, Pengyuan and Yao, Cong and Wu, Wenhao and Yan, Shuicheng and Bai, Xiang. CVPR, 2018 [paper]
Multi-oriented text
  1. R2CNN: rotational region CNN for orientation robust scene text detection. Jiang, Yingying and Zhu, Xiangyu and Wang, Xiaobing and Yang, Shuli and Li, Wei and Wang, Hua and Fu, Pei and Luo, Zhenbo. 2017 [paper]
  2. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. Liao, Minghui and Shi, Baoguang and Bai, Xiang and Wang, Xinggang and Liu, Wenyu. AAAI, 2017 [paper] [code]
  3. Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection. Liu, Yuliang and Jin, Lianwen. CVPR, 2017 [paper]
  4. Arbitrary-Oriented Scene Text Detection via Rotation Proposals. Ma, Jianqi and Shao, Weiyuan and Ye, Hao and Wang, Li and Wang, Hong and Zheng, Yingbin and Xue, Xiangyang. T MULTIMEDIA, 2017 [paper] [code]
  5. Detecting Oriented Text in Natural Images by Linking Segments. Shi, Baoguang and Bai, Xiang and Belongie, Serge. CVPR, 2017 [paper] [code]
  6. EAST: An Efficient and Accurate Scene Text Detector. Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun. CVPR, 2017 [paper] [code]
  7. Rotation-Sensitive Regression for Oriented Scene Text Detection. Liao, Minghui and Zhu, Zhen and Shi, Baoguang and Xia, Gui-song and Bai, Xiang. CVPR, 2018 [paper] [code]
  8. Geometry-Aware Scene Text Detection With Instance Transformation Network. Wang, Fangfang and Zhao, Liming and Li, Xi and Wang, Xinchao and Tao, Dacheng. CVPR, 2018 [paper] [code]
Irregular text
  1. Detecting Curve Text in the Wild: New Dataset and New Solution. Yuliang, Liu and Lianwen, Jin and Shuaitao, Zhang and Sheng, Zhang. 2017 [paper] [code]
  2. Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes. Lyu, Pengyuan and Liao, Minghui and Yao, Cong and Wu, Wenhao and Bai, Xiang. ECCV, 2018 [paper]
  3. TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes. Long, Shangbang and Ruan, Jiaqiang and Zhang, Wenjie and He, Xin and Wu, Wenhao and Yao, Cong. ECCV, 2018 [paper]
Speed up
  1. EAST: An Efficient and Accurate Scene Text Detector. Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun. CVPR, 2017 [paper] [code]
Easy instance segmentation
  1. Multi-scale FCN with Cascaded Instance Aware Segmentation for Arbitrary Oriented Word Spotting in the Wild. He, Dafang and Yang, Xiao and Liang, Chen and Zhou, Zihan and Ororbia, Alexander G and Kifer, Daniel and Giles, C Lee. CVPR, 2017 [paper]
  2. Self-organized Text Detection with Minimal Post-processing via Border Learning. Wu, Yue and Natarajan, Prem. CVPR, 2017 [paper]
  3. WordFence: Text Detection in Natural Images with Border Awareness. Polzounov, Andrei and Ablavatski, Artsiom and Escalera, Sergio and Lu, Shijian and Cai, Jianfei. ICIP, 2017 [paper]
  4. PixelLink: Detecting Scene Text via Instance Segmentation. Dan, Deng and Haifeng, Liu and Xuelong, Li and Deng, Cai. AAAI, 2018 [paper] [code]
Retrieving designated text
  1. Unambiguous text localization and retrieval for cluttered scenes. Rong, Xuejian and Yi, Chucai and Tian, Yingli. CVPR, 2017 [paper]
Against complex background
  1. Single Shot Text Detector With Regional Attention. He, Pan and Huang, Weilin and He, Tong and Zhu, Qile and Qiao, Yu and Li, Xiaolin. ICCV, 2017 [paper] [code]

2.2 Recognition

2.2.1 CTC based methods
  1. Unconstrained on-line handwriting recognition with recurrent neural networks. Graves, Alex and Liwicki, Marcus and Bunke, Horst and Schmidhuber, Jurgen and Fernandez, Santiago. NIPS, 2008 [paper]
  2. Accurate scene text recognition based on recurrent neural network. Su, Bolan and Lu, Shijian. ACCV, 2014 [paper]
  3. STAR-Net: A SpaTial Attention Residue Network for Scene Text Recognition. Liu, Wei and Chen, Chaofeng and Wong, Kwan-Yee K and Su, Zhizhong and Han, Junyu. BMVC, 2016 [paper]
  4. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. Shi, Baoguang and Bai, Xiang and Yao, Cong. TPAMI, 2017 [paper] [code]
  5. Reading Scene Text with Attention Convolutional Sequence Modeling. Gao, Yunze and Chen, Yingying and Wang, Jinqiao and Lu, Hanqing. 2017 [paper],
  6. Scene Text Recognition with Sliding Convolutional Character Models. Yin, Fei and Wu, Yi-Chao and Zhang, Xu-Yao and Liu, Cheng-Lin. 2017 [paper]
2.2.2 Attention based methods
  1. Robust scene text recognition with automatic rectification. Shi, Baoguang and Wang, Xinggang and Lyu, Pengyuan and Yao, Cong and Bai, Xiang. CVPR, 2016 [paper]
  2. Recursive recurrent nets with attention modeling for ocr in the wild. Lee, Chen-Yu and Osindero, Simon. CVPR, 2016 [paper]
  3. Visual attention models for scene text recognition. Ghosh, Suman K and Valveny, Ernest and Bagdanov, Andrew D. ICDAR, 2017 [paper]
  4. Focusing Attention: Towards Accurate Text Recognition in Natural Images. Cheng, Zhanzhan and Bai, Fan and Xu, Yunlu and Zheng, Gang and Pu, Shiliang and Zhou, Shuigeng. ICCV, 2017 [paper]
  5. Learning to Read Irregular Text with Attention Mechanisms. Yang, Xiao and He, Dafang and Zhou, Zihan and Kifer, Daniel and Giles, C Lee. IJCAI, 2017 [paper]
  6. Arbitrarily-Oriented Text Recognition. Cheng, Zhanzhan and Liu, Xuyang and Bai, Fan and Niu, Yi and Pu, Shiliang and Zhou, Shuigeng. CVPR, 2017 [paper]
  7. Edit Probability for Scene Text Recognition., Bai, Fan and Cheng, Zhanzhan and Niu, Yi and Pu, Shiliang and Zhou, Shuigeng. CVPR, 2018 [paper]
  8. SqueezedText: A Real-time Scene Text Recognition by Binary Convolutional Encoder-decoder Network. Liu, Zichuan and Li, Yixing and Ren, Fengbo and Yu, Hao and Goh, Wangling. AAAI, 2018 [paper]

2.3 End-to-End Text Spotting

2.3.1 Separately Trained Two-Stage Methods
  1. Reading text in the wild with convolutional neural networks. Jaderberg, Max and Simonyan, Karen and Vedaldi, Andrea and Zisserman, Andrew. IJCV, 2016 [paper]
  2. Synthetic data for text localisation in natural images. Gupta, Ankush and Vedaldi, Andrea and Zisserman, Andrew. CVPR, 2016 [paper] [code]
  3. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. Liao, Minghui and Shi, Baoguang and Bai, Xiang and Wang, Xinggang and Liu, Wenyu. AAAI, 2017 [paper] [code]
2.3.2 Jointly Trained Two-Stage Methods
  1. SEE: Towards Semi-Supervised End-to-End Scene Text Recognition. Bartz, Christian and Yang, Haojin and Meinel, Christoph. 2017 [paper] [code]
  2. Deep TextSpotter: An End-To-End Trainable Scene Text Localization and Recognition Framework. Busta, Michal and Neumann, Lukas and Matas, Jiri. ICCV, 2017 [paper] [code]
  3. Towards End-To-End Text Spotting With Convolutional Recurrent Neural Networks. Li, Hui and Wang, Peng and Shen, Chunhua. ICCV, 2017 [paper]
  4. An End-to-End TextSpotter With Explicit Alignment and Attention. He, Tong and Tian, Zhi and Huang, Weilin and Shen, Chunhua and Qiao, Yu and Sun, Changming. CVPR, 2018 [paper]
  5. FOTS: Fast Oriented Text Spotting with a Unified Network. Liu, Xuebo and Liang, Ding and Yan, Shi and Chen, Dagui and Qiao, Yu and Yan, Junjie. CVPR, 2018 [paper]
  6. Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes. Lyu, Pengyuan and Liao, Minghui and Yao, Cong and Wu, Wenhao and Bai, Xiang. ECCV, 2018 [paper]

2.4 Auxilliary Techs

2.4.1 Synthetic Data
  1. Synthetic data and artificial neural networks for natural scene text recognition. Jaderberg, Max and Simonyan, Karen and Vedaldi, Andrea and Zisserman, Andrew. NIPS, 2014 [paper]
  2. Synthetic data for text localisation in natural images. Gupta, Ankush and Vedaldi, Andrea and Zisserman, Andrew. CVPR, 2016 [paper] [code]
  3. Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes. Zhan, Fangneng and Lu, Shijian and Xue, Chuhui. ECCV, 2018 [paper] [code]
2.4.2 Bootstrapping
  1. Wetext: Scene text detection under weak supervision. Tian, Shangxuan and Lu, Shijian and Li, Chongshou. ICCV, 2017 [paper]
  2. weakly supervised text attention network for generating text proposals in scene images. Rong, Li and MengYi, En and JianQiang, Li and HaiBin, Zhang. ICDAR, 2017 [paper]
  3. Wordsup: Exploiting word annotations for character based text detection. Hu, Han and Zhang, Chengquan and Luo, Yuxuan and Wang, Yuzhuo and Han, Junyu and Ding, Errui. ICCV, 2018 [paper]
2.4.3 Deblurring
  1. Convolutional neural networks for direct text deblurring. Hradis, Michal and Kotera, Jan and Zemcik, Pavel and Sroubek, Filip. BMVC, 2015 [paper] [code]
  2. A blind deconvolution model for scene text detection and recognition in video. Khare, Vijeta and Shivakumara, Palaiahnakote and Raveendran, Paramesran and Blumenstein, Michael. PR, 2016 [paper]
2.4.4 Context Information
  1. Could scene context be beneficial for scene text detection? Zhu, Anna and Gao, Renwu and Uchida, Seiichi. PR, 2016 [paper]
2.4.5 Adversarial Attack
  1. Adaptive Adversarial Attack on Scene Text Recognition. Yuan, Xiaoyong and He, Pan and Li, Xiaolin Andy. 2018 [paper]

III. Datasets

Dataset (Year) Image Num (train/test) Text Num (train/test) Orientation Language Characteristics Detec/Recog Task
End2End ==== ==== ==== ==== ==== ====
ICDAR03 (2003) 509 (258/251) 2276 (1110/1156) Horizontal En - ✓/✓
ICDAR13 Scene Text(2013) 462 (229/233) - (848/1095) Horizontal En - ✓/✓
ICDAR15 Incidental Text(2015) 1500 (1000/500) - (-/-) Multi-Oriented En Blur, Small, Defocused ✓/✓
ICDAR17 / RCTW (2017) 12263 (8034/4229) - (-/-) Multi-Oriented Cn - ✓/✓
Total-Text (2017) 1555 (1255/300) - (-/-) Multi-Oriented, Curved En, Cn Irregular polygon label ✓/✓
SVT (2010) 350 (100/250) 904 (257/647) Horizontal En - ✓/✓
KAIST (2010) 3000 (-/-) 5000 (-/-) Horizontal En, Ko Distorted ✓/✓
NEOCR (2011) 659 (-/-) 5238 (-/-) Multi-oriented 8 langs - ✓/✓
CUTE (2014) 80 (-/80) - (-/-) Curved En - ✓/✓
CTW (2017) 32K ( 25K/6K) 1M ( 812K/205K) Multi-Oriented Cn Fine-grained annotation ✓/✓
CASIA-10K (2018) 10K (7K/3K) - (-/-) Multi-Oriented Cn ✓/✓
Detection Only ==== ==== ==== ==== ==== ====
OSTD (2011) 89 (-/-) 218 (-/-) Multi-oriented En - ✓/-
MSRA-TD500 (2012) 500 (300/200) 1719 (1068/651) Multi-Oriented En, Cn Long text ✓/-
HUST-TR400 (2014) 400 (400/-) - (-/-) Multi-Oriented En, Cn Long text ✓/-
ICDAR17 / RRC-MLT (2017) 18000 (9000/9000) - (-/-) Multi-Oriented 9 langs - ✓/-
CTW1500 (2017) 1500 (1000/500) - (-/-) Multi-Oriented, Curved En Bounding box with 14 vertexes ✓/-
Recognition Only ==== ==== ==== ==== ==== ====
Char74k (2009) 74107 (-/-) 74107 (-/-) Horizontal En, Kannada Character label -/✓
IIIT 5K-Word (2012) 5000 (-/-) 5000 (2000/3000) Horizontal - cropped -/✓
SVHN (2010) - (-/-) 600000 (-/-) Horizontal - House number digits -/✓
SVTP (2013) 639 (-/639) - (-/-) En Distorted -/✓