KD-STR

An official Pytorch implement of the paper "One-stage Low-resolution Text Recognition with High-resolution Knowledge Transfer" (MM2023).

Authors: Hang Guo, Tao Dai, Mingyan Zhu, GuangHao Meng, Bin Chen, Zhi Wang, Shu-Tao Xia.

Motivation

This work focus on the problem of text recognition on the low-resolution. A novel knowledge distillation framework is proposed, which can directly adapt the text recognizer to low-resolution. We hope that our work can inspire more studies on one-stage low-resolution text recognition.

Pipeline

The architecture of the proposed framework is as follows.

Pre-trained Weight

We refer to the student model adapted to low-resolution inputs as ABINet-LTR, MATRN-LTR and PARSeq-LTR, respectively. As pointed out in the paper, since the input images between the two branches are of different resolutions, we modified the convolution stride (for CNN backbone) or patch sizes (for ViT backbone) to ensure the consistency of the deep visual features. The pretrained weights can be downloaded as follows.

Model	ABINet-LTR	MATRN-LTR	PARSeq-LTR
Performance	72.45%	73.27%	78.23%

Please be noted that the pre-trained HR teacher model is still needed for both training and testing, you can download the model in their coresponding offical github repository, i.e. ABINet, MATRN and PARSeq.

Datasets

In this work, we use STISR datasets TextZoom and five STR benchmarks, i.e., ICDAR2013, ICDAR2015, CUTE80, SVT and SVTP for model comparison. All the datasets are in lmdb format. One can download these datasets from the following table.

Datasets	TextZoom	IC13	IC15	CUTE80	SVT	SVTP
Download Link	link	link	link	link	link	link

How to Run?

We have set some default hype-parameters in the config.yaml and main.py, so you can directly implement training and testing after you modify the path of datasets and pre-trained model.

Training

python main.py

Testing

python main.py --go_test

Main Results

Quantitative Comparison

Qualitative Comparison

Robustness Comparison

Citation

If you find our work helpful, please consider citing us.

@inproceedings{guo2023one,
  title={One-stage Low-resolution Text Recognition with High-resolution Knowledge Transfer},
  author={Guo, Hang and Dai, Tao and Zhu, Mingyan and Meng, Guanghao and Chen, Bin and Wang, Zhi and Xia, Shu-Tao},
  booktitle={Proceedings of the 31st ACM International Conference on Multimedia},
  pages={2189--2198},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
config		config
dataset		dataset
interfaces		interfaces
loss		loss
model		model
utils		utils
LICENSE.txt		LICENSE.txt
README.md		README.md
main.py		main.py
setup.py		setup.py

License

csguoh/KD-LTR

Folders and files

Latest commit

History

Repository files navigation