VIPTR: A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition

Getting Started

Dependency

This work was tested with PyTorch 1.8.0, CUDA 10.1, python 3.6.13 and Ubuntu 18.04.
requirements : lmdb, Pillow, torchvision, nltk, natsort, timm, mmcv

pip install lmdb pillow torchvision nltk natsort timm mmcv

Download lmdb dataset for training and evaluation from following

English datasets:

Synthetic image datasets: MJSynth (MJ) and SynthText (ST) and SynthAdd (password:627x);
Real image datasets: the union of trainsets IIIT5K, SVT, IC03, IC13, IC15, COCO-Text, SVTP, CUTE80; (baidu|google)
Validation datasets : the union of the sets IC13 (857), SVT, IIIT5k (3000), IC15 (1811), SVTP, and CUTE80;
Evaluation datasets : English benchmark datasets, consist of IIIT5k (3000), SVT, IC13 (857), IC15 (1811), SVTP, and CUTE80.

Chinese datasets:

Download Chinese training sets, validation sets and evaluation sets from here .

Run benchmark with pretrained model

Download pretrained model from Google driver or Baidu Netdisk (passwd:7npu) ;
Set models path, testsets path and characters list ;

Run test_benchmark.py ;

CUDA_VISIBLE_DEVICES=0 python test_benchmark.py --benchmark_all_eval --Transformation TPS19 --FeatureExtraction VIPTRv1T --SequenceModeling None --Prediction CTC --batch_max_length 25 --imgW 96 --output_channel 192

Run test_chn_benchmark.py

CUDA_VISIBLE_DEVICES=0 python test_chn_benchmark.py --benchmark_all_eval --Transformation TPS19 --FeatureExtraction VIPTRv1T --SequenceModeling None --Prediction CTC --batch_max_length 64 --imgW 320 --output_channel 192

Results on benchmark datasets and comparison with SOTA

Citation

Please consider citing this work in your publications if it helps your research.

@article{cheng2024viptr,
  title={VIPTR: A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition},
  author={Cheng, Xianfu and Zhou, Weixiao and Li, Xiang and Chen, Xiaoming and Yang, Jian and Li, Tongliang and Li, Zhoujun},
  journal={arXiv preprint arXiv:2401.10110},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
images		images
modules		modules
LICENSE		LICENSE
README.md		README.md
doc_dict.txt		doc_dict.txt
model.py		model.py
scene_dict.txt		scene_dict.txt
test_benchmark.py		test_benchmark.py
train_benchmark.py		train_benchmark.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images

modules

modules

LICENSE

LICENSE

README.md

README.md

doc_dict.txt

doc_dict.txt

model.py

model.py

scene_dict.txt

scene_dict.txt

test_benchmark.py

test_benchmark.py

train_benchmark.py

train_benchmark.py

utils.py

utils.py

Repository files navigation

VIPTR: A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition

Getting Started

Dependency

Download lmdb dataset for training and evaluation from following

English datasets:

Chinese datasets:

Run benchmark with pretrained model

Results on benchmark datasets and comparison with SOTA

Citation

Acknowledgements

About

Releases

Packages

Languages

License

cxfyxl/VIPTR

Folders and files

Latest commit

History

Repository files navigation

VIPTR: A Vision Permutable Extractor for Fast and Efficient Scene Text Recognition

Getting Started

Dependency

Download lmdb dataset for training and evaluation from following

English datasets:

Chinese datasets:

Run benchmark with pretrained model

Results on benchmark datasets and comparison with SOTA

Citation

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Languages