RCLSTR

[ACMMM 2023] Relational Contrastive Learning for Scene Text Recognition

Introduction

This repository is an official implementation of RCLSTR.

Getting Started

1. Environment Setup

Base Environments

Python >= 3.8
CUDA == 11.0
PyTorch == 1.7.1

Step-by-step installation instructions

a. Create a conda virtual environment and activate it.

conda create -n RCLSTR python=3.8 -y
conda activate RCLSTR

b. Install PyTorch and torchvision following the official instructions.

pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

c. Install other packages.

pip install lmdb pillow nltk natsort fire tensorboard tqdm imgaug einops
pip install numpy==1.22.3

2. Data Preparation

Download datasets

We use the ST(SynthText) training datasets from STR-Fewer-Labels. Download the datasets from baiduyun (password:px16).

Data folder structure

data_CVPR2021
├── training
│   └── label
│       └── synth
├── validation
│   ├── 1.SVT
│   ├── 2.IIIT
│   ├── 3.IC13
│   ├── 4.IC15
│   ├── 5.COCO
│   ├── 6.RCTW17
│   ├── 7.Uber
│   ├── 8.ArT
│   ├── 9.LSVT
│   ├── 10.MLT19
│   └── 11.ReCTS
└── evaluation
    └── benchmark
        ├── CUTE80
        ├── IC03_867
        ├── IC13_1015
        ├── IC15_2077
        ├── IIIT5k_3000
        ├── SVT
        └── SVTP

Link the dataset path as follows:

cd pretrain
ln -s /path/to/data_CVPR2021 data_CVPR2021
cd evaluation
ln -s /path/to/data_CVPR2021 data_CVPR2021

TPS model weights

For the TPS module, we use the pretrained TPS model weights from STR-Fewer-Labels. Please download the TPS model weights from baiduyun (password:px16) and put it in pretrain/TPS_model.

3. Pretrain and decoder evaluation

Pretrain

RCLSTR method includes regularization module (reg), hierarchical module (hier) and cross-hierarchy consistency module (con).

Pretrain SeqMoCo model:

cd pretrain
CUDA_VISIBLE_DEVICES=0,1,2,3 python main_moco.py   \
--model_name TRBA  \
--exp_name SeqMoCo   \
--lr 0.0015   \
--batch-size 32   \
--dist-url 'tcp://localhost:10002' \
--multiprocessing-distributed \
--world-size 1 \
--rank 0   \
--data data_CVPR2021/training/label/synth  \
--data-format lmdb  \
--light_aug   \
--instance_map window   \
--epochs 5   \
--useTPS ./TPS_model/TRBA-Baseline-synth.pth \
--loss_setting consistent \
--frame_weight 0 \
--frame_alpha 0 \
--word_weight 0 \
--word_alpha 0

Pretrain SeqMoCo model with reg module:

cd pretrain
CUDA_VISIBLE_DEVICES=0,1,2,3 python main_moco.py   \
--model_name TRBA  \
--exp_name SeqMoCo_reg   \
--lr 0.0015   \
--batch-size 32   \
--dist-url 'tcp://localhost:10002' \
--multiprocessing-distributed \
--world-size 1 \
--rank 0   \
--data data_CVPR2021/training/label/synth  \
--data-format lmdb  \
--light_aug   \
--instance_map window   \
--epochs 5   \
--useTPS ./TPS_model/TRBA-Baseline-synth.pth \
--loss_setting consistent \
--permutation \
--frame_weight 0 \
--frame_alpha 0 \
--word_weight 0 \
--word_alpha 0

Pretrain SeqMoCo model with reg and hier module:

cd pretrain
CUDA_VISIBLE_DEVICES=0,1,2,3 python main_moco.py   \
--model_name TRBA  \
--exp_name SeqMoCo_reg_hier   \
--lr 0.0015   \
--batch-size 32   \
--dist-url 'tcp://localhost:10002' \
--multiprocessing-distributed \
--world-size 1 \
--rank 0   \
--data data_CVPR2021/training/label/synth  \
--data-format lmdb  \
--light_aug   \
--instance_map window   \
--epochs 5   \
--useTPS ./TPS_model/TRBA-Baseline-synth.pth \
--loss_setting consistent \
--permutation

Pretrain SeqMoCo model with reg, hier and con module:

cd pretrain
CUDA_VISIBLE_DEVICES=0,1,2,3 python main_moco.py   \
--model_name TRBA  \
--exp_name SeqMoCo_reg_hier_con   \
--lr 0.0015   \
--batch-size 32   \
--dist-url 'tcp://localhost:10002' \
--multiprocessing-distributed \
--world-size 1 \
--rank 0   \
--data data_CVPR2021/training/label/synth  \
--data-format lmdb  \
--light_aug   \
--instance_map window   \
--epochs 5   \
--useTPS ./TPS_model/TRBA-Baseline-synth.pth \
--loss_setting consistent \
--permutation \
--multi_level_consistent global2local \
--multi_level_ins 0

Feature representation evaluation

Train attention-based decoder for feature representation evaluation:

cd evaluation
CUDA_VISIBLE_DEVICES=0 python train_new.py \
--model_name TRA \
--exp_name TRA_reg_hier_con \
--saved_model ../pretrain/SeqMoCo_reg_hier_con/checkpoint_0004.pth.tar \
--select_data synth \
--batch_size 256 \
--Aug light

Main Results

TODO

Support ViT

Acknowledgements

We thank these great works and open-source codebases:

Citation

If you find our method useful for your reserach, please cite

@misc{zhang2023relational,
      title={Relational Contrastive Learning for Scene Text Recognition}, 
      author={Jinglei Zhang and Tiancheng Lin and Yi Xu and Kai Chen and Rui Zhang},
      year={2023},
      eprint={2308.00508},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
evaluation		evaluation
figs		figs
pretrain		pretrain
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluation

evaluation

figs

figs

pretrain

pretrain

.gitignore

.gitignore

README.md

README.md

Repository files navigation

RCLSTR

[ACMMM 2023] Relational Contrastive Learning for Scene Text Recognition

Introduction

Getting Started

1. Environment Setup

Base Environments

Step-by-step installation instructions

2. Data Preparation

3. Pretrain and decoder evaluation

Pretrain

Feature representation evaluation

Main Results

TODO

Acknowledgements

Citation

About

Releases

Packages

Languages

ThunderVVV/RCLSTR

Folders and files

Latest commit

History

Repository files navigation

RCLSTR

[ACMMM 2023] Relational Contrastive Learning for Scene Text Recognition

Introduction

Getting Started

1. Environment Setup

Base Environments

Step-by-step installation instructions

2. Data Preparation

3. Pretrain and decoder evaluation

Pretrain

Feature representation evaluation

Main Results

TODO

Acknowledgements

Citation

About

Resources

Stars

Watchers

Forks

Languages