Skip to content

cxfyxl/VIPTR

Repository files navigation

VIPTR: A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition

| paper | English datasets |Chinese datasets| pretrained model: Google driver or Baidu Netdisk (passwd:7npu) |

Getting Started

Dependency

  • This work was tested with PyTorch 1.8.0, CUDA 10.1, python 3.6.13 and Ubuntu 18.04.
  • requirements : lmdb, Pillow, torchvision, nltk, natsort, timm, mmcv
pip install lmdb pillow torchvision nltk natsort timm mmcv

Download lmdb dataset for training and evaluation from following

English datasets:

  • Synthetic image datasets: MJSynth (MJ) and SynthText (ST) and SynthAdd (password:627x);
  • Real image datasets: the union of trainsets IIIT5K, SVT, IC03, IC13, IC15, COCO-Text, SVTP, CUTE80; (baidu|google)
  • Validation datasets : the union of the sets IC13 (857), SVT, IIIT5k (3000), IC15 (1811), SVTP, and CUTE80;
  • Evaluation datasets : English benchmark datasets, consist of IIIT5k (3000), SVT, IC13 (857), IC15 (1811), SVTP, and CUTE80.

Chinese datasets:

  • Download Chinese training sets, validation sets and evaluation sets from here .

Run benchmark with pretrained model

  1. Download pretrained model from Google driver or Baidu Netdisk (passwd:7npu) ;

  2. Set models path, testsets path and characters list ;

  3. Run test_benchmark.py ;

    CUDA_VISIBLE_DEVICES=0 python test_benchmark.py --benchmark_all_eval --Transformation TPS19 --FeatureExtraction VIPTRv1T --SequenceModeling None --Prediction CTC --batch_max_length 25 --imgW 96 --output_channel 192
  4. Run test_chn_benchmark.py

    CUDA_VISIBLE_DEVICES=0 python test_chn_benchmark.py --benchmark_all_eval --Transformation TPS19 --FeatureExtraction VIPTRv1T --SequenceModeling None --Prediction CTC --batch_max_length 64 --imgW 320 --output_channel 192

Results on benchmark datasets and comparison with SOTA

VIPTR_SOTA

Citation

Please consider citing this work in your publications if it helps your research.

@article{cheng2024viptr,
  title={VIPTR: A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition},
  author={Cheng, Xianfu and Zhou, Weixiao and Li, Xiang and Chen, Xiaoming and Yang, Jian and Li, Tongliang and Li, Zhoujun},
  journal={arXiv preprint arXiv:2401.10110},
  year={2024}
}

Acknowledgements

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages