LEMMA

An official Pytorch implement of the paper "Towards Robust Scene Text Image Super-resolution via Explicit Location Enhancement" (IJCAI2023).

Authors: Hang Guo, Tao Dai, Guanghao Meng, and Shu-Tao Xia

This work proposes the Location Enhanced Multi-ModAl network (LEMMA) to address the challenges posed by complex backgrounds in scene text images with explicit positional enhancement. The architecture of LEMMA is as follows.

Pre-trained Model

As the previous code is a bit of a mess, we re-organize the code and retrain our LEMMA. The performance of this re-trained model is as follows (better performance than that reported in the paper).

Text Recognizer	Easy	Medium	Hard	avgAcc
CRNN	64.98%	59.89%	43.48%	56.73%
MORAN	76.90%	64.28%	46.84%	63.60%
ASTER	81.53%	67.40%	48.85%	66.93%

One can download this model using this link which contains the parameters of both the super-resolution brach and guidance generation branch.

The log file of training is also available with this link.

Prepare Datasets

In this work, we use STISR datasets TextZoom and four STR benchmarks, i.e., ICDAR2015, CUTE80, SVT and SVTP for model comparison. All the datasets are lmdb format. One can download these datasets from the this link we have prepared for you. And please do not forget to accustom your own dataset path in ./comfig.yaml , such as the parameter train_data_dir and val_data_dir.

Text Recognizers

Following previous STISR works, we also use CRNN, MORAN and ASTER as the downstream text recognizer.

Moreover, the code also supports some new text recognizers, such as ABINet, MATRN and PARSeq. You can find the detailed comparison using these three new text recognizers in the supplementary material we provided and can also test LEMMA with these recognizers by modifying the command (e.g., --test_model='ABINet'). Please download these pre-trained text recognition models from the corresponding repositories we have provided above.

You also need to modify the text recognizer model path in the ./config.yaml file. Moreover, we employ the text focus loss proposed by STT during model training, since this text focus loss uses a pre-trained transformer based text recognizer, please download this recognition model here and also accustom the ckpt path.

How to Run?

We have set some default hype-parameters in the config.yaml and main.py, so you can directly implement training and testing after you modify the path of datasets and pre-trained model.

Training

python main.py

Testing

python main.py --test

NOTE: You can also auccstom other hype-parameters in the config.yaml and main.py file, such as the n_gpu.

Main Results

Quantitative Comparison

Qualitative Comparison

Citation

If you find our work helpful, please consider citing us.

@inproceedings{ijcai2023p87,
  title     = {Towards Robust Scene Text Image Super-resolution via Explicit Location Enhancement},
  author    = {Guo, Hang and Dai, Tao and Meng, Guanghao and Xia, Shu-Tao},
  booktitle = {Proceedings of the Thirty-Second International Joint Conference on
               Artificial Intelligence, {IJCAI-23}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},
  pages     = {782--790},
  year      = {2023},
  month     = {8},
  note      = {Main Track},
  doi       = {10.24963/ijcai.2023/87},
  url       = {https://doi.org/10.24963/ijcai.2023/87},
}

Acknowledgement

The code of this work is based on TBSRN, TATT, and C3-STISR. Thanks for your contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
config		config
dataset		dataset
interfaces		interfaces
loss		loss
model		model
utils		utils
LICENSE.txt		LICENSE.txt
README.md		README.md
al_chinese.txt		al_chinese.txt
main.py		main.py
setup.py		setup.py

License

csguoh/LEMMA

Folders and files

Latest commit

History

Repository files navigation

LEMMA

Pre-trained Model

Prepare Datasets

Text Recognizers

How to Run?

Training

Testing

Main Results

Quantitative Comparison

Qualitative Comparison

Citation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Languages