- This work was tested with PyTorch 1.10.1, CUDA 11.4, python 3.6 and Ubuntu 18.04. You may need pip3 install torch==1.10.1.
- requirements : lmdb, pillow, torchvision, nltk, natsort, jamo, fire, opencv-python
pip3 install lmdb pillow torchvision nltk natsort
pip3 install opencv-python
pip3 install jamo
pip3 install torch==1.10.1
Prepare the Dataset step by step
AIHUB 데이터셋 압축해제, 저장 경로를 dataset_path로 넘겨주어 crop_dataset.py 실행
python3 crop_dataset.py
/data/
ㄴ img
ㄴ label
ㄴ Validation
ㄴ img
ㄴ 01.총류
ㄴ책표지_총류_002109.jpg
⋮
ㄴ01.가로형간판
ㄴ02.철학
⋮
ㄴ label
ㄴ1.간판
ㄴ2.책표지
ㄴ01.총류
ㄴ책표지_총류_002109.json
⋮
ㄴ02.철학
⋮
python3 merge.py
1.4 자음 파인튜닝을 위해 합친 파일에서 ‘ㄲ’,’ㄸ’,’ㅃ’,’ㅆ’,’ㅉ’, ‘ㄳ’, ‘ㄵ’, ‘ㄶ’, ‘ㄺ’, ‘ㄻ’, ‘ㄼ’, ‘ㄽ’, ‘ㄾ’, ‘ㅀ’, ‘ㅄ’ 를 포함하는 이미지경로-라벨 쌍만 저장
python3 extract_text.py
python3 create_gt_competition.py
data
ㄴ train
ㄴ cropped images
ㄴ validation
ㄴ cropped images
ㄴ train_competition
ㄴ competition train images
ㄴ gt_train.txt
ㄴ gt_valid.txt
ㄴ gt_merge.txt
ㄴ gt_jaeum.txt
ㄴ gt_competition.txt
python3 create_lmdb_dataset.py --inputPath data/ --gtFile data/gt_merge.txt --outputPath data_lmdb_training
python3 create_lmdb_dataset.py --inputPath data/ --gtFile data/gt_competition.txt --outputPath data_lmdb_validation
python3 create_lmdb_dataset.py --inputPath data/ --gtFile data/gt_jaeum.txt --outputPath data_lmdb_training_jaeum
python3 train.py --train_data data_lmdb_training --valid_data data_lmdb_validation --Transformation TPS --FeatureExtraction SENet --SequenceModeling BiLSTM --Prediction Attn --batch_size 52 --lr 1 --num_iter 67000 --manualSeed 1111
python3 train.py --train_data data_lmdb_training --valid_data data_lmdb_validation --Transformation TPS --FeatureExtraction SENetL --SequenceModeling BiLSTM --Prediction Attn --batch_size 44 --lr 1 --num_iter 55000 --manualSeed 6
python3 train.py --train_data data_lmdb_training_jaeum --valid_data data_lmdb_validation --saved_model SENetL.pth --Transformation TPS --FeatureExtraction SENetL --SequenceModeling BiLSTM --Prediction Attn --batch_size 44 --lr 0.3 --num_iter 1000 --manualSeed 6
if you want to use Pretrained files. Click.
python3 create_submission.py --exp_name result --model1 SENetL_Jaeum.pth --model2 SENet.pth --model3 SENetL.pth --Transformation TPS --SequenceModeling BiLSTM --Prediction Attn
You can change --image_folder (default='test') to set input test_data path
- --train_data: folder path to training lmdb dataset.
- --valid_data: folder path to validation lmdb dataset.
- --eval_data: folder path to evaluation (with test.py) lmdb dataset.
- --select_data: select training data.
- --data_filtering_off: skip data filtering when creating LmdbDataset.
- --Transformation: select Transformation module [None | TPS].
- --FeatureExtraction: select FeatureExtraction module [VGG | RCNN | ResNet | SENet | SENetL].
- --SequenceModeling: select SequenceModeling module [None | BiLSTM].
- --Prediction: select Prediction module [CTC | Attn].
- --saved_model: assign saved model to evaluation.
This implementation has been based on these repository crnn.pytorch, clovaAI.
[1] Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, Hwalsuk Lee. What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis
[2] M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman. Synthetic data and artificial neural networks for natural scenetext recognition. In Workshop on Deep Learning, NIPS, 2014.
[3] A. Gupta, A. Vedaldi, and A. Zisserman. Synthetic data fortext localisation in natural images. In CVPR, 2016.
[4] D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i Big-orda, S. R. Mestre, J. Mas, D. F. Mota, J. A. Almazan, andL. P. De Las Heras. ICDAR 2013 robust reading competition. In ICDAR, pages 1484–1493, 2013.
[5] D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R.Chandrasekhar, S. Lu, et al. ICDAR 2015 competition on ro-bust reading. In ICDAR, pages 1156–1160, 2015.
[6] A. Mishra, K. Alahari, and C. Jawahar. Scene text recognition using higher order language priors. In BMVC, 2012.
[7] K. Wang, B. Babenko, and S. Belongie. End-to-end scenetext recognition. In ICCV, pages 1457–1464, 2011.
[8] S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, andR. Young. ICDAR 2003 robust reading competitions. In ICDAR, pages 682–687, 2003.
[9] T. Q. Phan, P. Shivakumara, S. Tian, and C. L. Tan. Recognizing text with perspective distortion in natural scenes. In ICCV, pages 569–576, 2013.
[10] A. Risnumawan, P. Shivakumara, C. S. Chan, and C. L. Tan. A robust arbitrary text detection system for natural scene images. In ESWA, volume 41, pages 8027–8048, 2014.
[11] B. Shi, X. Bai, and C. Yao. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. In TPAMI, volume 39, pages2298–2304. 2017.