SAN: Structure-Aware Network for Complex and Long-tailed Chinese Text Recognition

The official code of SAN (ICDAR 2023)

Runtime Environment

Using the dependencies
```
pip install -r requirements.txt
```

Datasets

Test datasets can be downloaded from Datasets BaiduNetDisk (passwd: 7zmh).

Move these files to : SAN/data/

Model Result

Get the model training result from Model Result BaiduNetDisk (passwd: 69th).

Move these files to : SAN/workdir/

Training

Pre-train vision model

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/pretrain_vision_model.yaml

Pre-train language model

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/pretrain_language_model.yaml

Train ABINet

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/train_abinet.yaml

Notice

The code I provided can be directly trained or tested on the Web dataset. If you would like to train or test it on the Scene dataset, you need to modify the 'max_radical_length' parameter in the code.

Specifically, please make changes in the following two parts of the code:

SAN/configs/template.yaml line 22:

max_length_radical: 33 -> max_length_radical: 39
SAN/dataset.py line 221:

max_length:int=33 -> max_length:int=39
SAN/losses.py line 24:

self.max_length_radical = 33 -> self.max_length_radical = 39

Evaluation

Web datatet:

Radical length : 33

ABINet-TreeSim(SAN):

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/train_abinet.yaml --phase test --checkpoint=workdir/SAN-web-final/best-train-abinet.pth --test_root=data/web/web_val/ --model_eval=alignment --image_only

VM-TreeSim:

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/pretrain_vision_model.yaml --phase test --checkpoint=workdir/VisionTreeSim-web-final/best-pretrain-vision-model.pth --test_root=data/web/web_val/ --model_eval=vision --image_only

Scene dataset:

Radical length : 39

ABINet-TreeSim(SAN):

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/train_abinet.yaml --phase test --checkpoint=workdir/SAN-scene-final/best-train-abinet.pth --test_root=data/scene/scene_val/ --model_eval=alignment --image_only

VM-TreeSim:

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/pretrain_vision_model.yaml --phase test --checkpoint=workdir/VisionTreeSim-scene-final/best-pretrain-vision-model.pth --test_root=data/scene/scene_val/ --model_eval=vision --image_only

Citation

If you find our method useful for your reserach, please cite

@inproceedings{Zhang2023SANSN,
  title={SAN: Structure-Aware Network for Complex and Long-Tailed Chinese Text Recognition},
  author={Junyi Zhang and Chang Liu and Chun Yang},
  booktitle={IEEE International Conference on Document Analysis and Recognition},
  year={2023},
  url={https://api.semanticscholar.org/CorpusID:261102024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
__pycache__		__pycache__
configs		configs
data		data
modules		modules
tools		tools
workdir		workdir
LICENSE		LICENSE
README.md		README.md
callbacks.py		callbacks.py
dataset.py		dataset.py
demo.py		demo.py
losses.py		losses.py
main.py		main.py
radical_tree.py		radical_tree.py
requirements.txt		requirements.txt
transforms.py		transforms.py
ttt.txt		ttt.txt
untitled.txt		untitled.txt
utils.py		utils.py

License

Levi-ZJY/SAN

Folders and files

Latest commit

History

Repository files navigation

SAN: Structure-Aware Network for Complex and Long-tailed Chinese Text Recognition

Runtime Environment

Datasets

Model Result

Training

Notice

Evaluation

Citation

About

Resources

License

Stars

Watchers

Forks

Languages