Self-supervised Character-to-Character Distillation for Text Recognition （ICCV23）

This is the official code of "Self-supervised Character-to-Character Distillation for Text Recognition". For more details, please refer to our paper or 中文解读 or poster. If you have any questions please contact me by email (gtk0615@sjtu.edu.cn).

We also released CVPR23 work on scene text recognition:

Self-Supervised Implicit Glyph Attention for Text Recognition (SIGA) Paper and Code

Pipeline

Model architecture

Environments

# 3090 Ubuntu 16.04 Cuda 11
conda create -n CCD python==3.7
source activate CCD
conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge
# The following optional dependencies are necessary
please refer to requirement.txt
pip install tensorboard==1.15.0
pip install tensorboardX==2.2

Pretrain

# We recommend using multiple 3090s for training.
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 train.py --config ./Dino/configs/CCD_pretrain_ViT_xxx.yaml

Finetune

#update model.pretrain_checkpoint in CCD_vision_model_xxx.yaml
CUDA_VISIBLE_DEVICES=0,1 python train_finetune.py --config ./Dino/configs/CCD_vision_model_xxx.yaml

Test

#update model.checkpoint in CCD_vision_model_xxx.yaml
CUDA_VISIBLE_DEVICES=0 python test.py --config ./Dino/configs/CCD_vision_model_xxx.yaml

Weights

Pretrain: CCD-ViT-Small,3090 Finetune: ARD,3090 and STD,3090

pretrain: CCD-ViT-Base,3090 Finetune:ARD,v100 and STD,3090

Data (please refer to DiG)

OCR-CC Link:

链接：https://pan.baidu.com/s/1PW7ef17AkwG27H_RaP0pDQ 
提取码：C2CD

    data_lmdb
    ├── charset_36.txt
    ├── Mask
    ├── TextSeg
    ├── Super_Resolution
    ├── training
    │   ├── label
    │   │   └── synth
    │   │       ├── MJ
    │   │       │   ├── MJ_train
    │   │       │   ├── MJ_valid
    │   │       │   └── MJ_test
    │   │       └── ST
    │   │── URD
    │   │    └── OCR-CC
    │   ├── ARD
    │   │   ├── Openimages
    │   │   │   ├── train_1
    │   │   │   ├── train_2
    │   │   │   ├── train_5
    │   │   │   ├── train_f
    │   │   │   └── validation
    │   │   └── TextOCR 
    ├── validation
    │   ├── 1.SVT
    │   ├── 2.IIIT
    │   ├── 3.IC13
    │   ├── 4.IC15
    │   ├── 5.COCO
    │   ├── 6.RCTW17
    │   ├── 7.Uber
    │   ├── 8.ArT
    │   ├── 9.LSVT
    │   ├── 10.MLT19
    │   └── 11.ReCTS
    └── evaluation
        └── benchmark
            ├── SVT
            ├── IIIT5k_3000
            ├── IC13_1015
            ├── IC15_2077
            ├── SVTP
            ├── CUTE80
            ├── COCOText
            ├── CTW
            ├── TotalText
            ├── HOST
            ├── WOST
            ├── MPSC
            └── WordArt

Mask preparation

optional, kmeans results of Synth and URD
if you don't want to generate mask, you can generate mask results online. please rewrite code1 and code2

cd ./mask_create
run generate_mask.py #parallelly process mask --> lmdb file
run merge.py #merge multiple lmdb files into single file

Citation

If you find our method useful for your reserach, please cite
@InProceedings{Guan_2023_ICCV,
    author    = {Guan, Tongkun and Shen, Wei and Yang, Xue and Feng, Qi and Jiang, Zekun and Yang, Xiaokang},
    title     = {Self-Supervised Character-to-Character Distillation for Text Recognition},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {19473-19484}
}

License

- This code are only free for academic research purposes and licensed under the 2-clause BSD License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dino

Dino

graph

graph

mask_create

mask_create

.gitattributes

.gitattributes

README.md

README.md

requirement.txt

requirement.txt

test.py

test.py

train.py

train.py

train_finetune.py

train_finetune.py

Repository files navigation

Self-supervised Character-to-Character Distillation for Text Recognition （ICCV23）

Pipeline

Model architecture

Environments

Pretrain

Finetune

Test

Weights

Data (please refer to DiG)

Mask preparation

Citation

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
Dino		Dino
graph		graph
mask_create		mask_create
.gitattributes		.gitattributes
README.md		README.md
requirement.txt		requirement.txt
test.py		test.py
train.py		train.py
train_finetune.py		train_finetune.py

TongkunGuan/CCD

Folders and files

Latest commit

History

Repository files navigation

Self-supervised Character-to-Character Distillation for Text Recognition （ICCV23）

Pipeline

Model architecture

Environments

Pretrain

Finetune

Test

Weights

Data (please refer to DiG)

Mask preparation

Citation

License

About

Topics

Resources

Stars

Watchers

Forks

Languages