Skip to content

bytedance/oclip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

oCLIP

This repository is the official implementation for the following paper:

Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting

Chuhui Xue, Wenqing Zhang, Yu Hao, Shijian Lu, Philip Torr, Song Bai, ECCV 2022 (Oral)

Part of the code is inherited from open_clip.

Models

  • English
Backbone Pre-train Data Pre-train Model Fine-tune Data Fine-tune Model (PSENet) Precision Recall F-score
ResNet-50 SynthText Link Total-Text Link 89.9 81.6 85.5
ResNet-101 SynthText Link Total-Text Link 89.9 82.2 85.9
ResNet-50 Web Image Link Total-Text Link 90.1 83.5 86.7
  • Chinese
Backbone Pre-train Data Pre-train Model
ResNet-50 LSVT-Weak Annotation Link

Training oCLIP

Conda

conda create -n oclip python=3.7
conda activate oclip
pip install -r requirement.txt

git clone https://github.com/bytedance/oclip.git
cd oclip
export PYTHONPATH="$PYTHONPATH:$PWD/src"

Data

Download SynthText and put it to ./data.

You may use the provided script to generate the annotation for pre-training:

python tools/convert_synthtext_csv.py --data_dir data/SynthText/ --save_dir data/SynthText/
  • Note we use [space] to represent the masked characters. For customized datasets, you may modify codes in src/training/data.py and your data annotation accordingly.

Train

Sample running code for training:

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -u src/training/main.py \
    --save-frequency 3 \
    --report-to tensorboard \
    --train-data="data/SynthText/train_char.csv"  \
    --char-dict-pth="data/SynthText/char_dict" \
    --csv-img-key filepath \
    --csv-caption-key title \
    --warmup 10000 \
    --batch-size=64 \
    --lr=1e-4\
    --wd=0.1 \
    --epochs=100 \
    --workers=8 \
    --model RN50 \
    --logs='output/RN50_synthtext' 

Visualization

We also provide a script for visualization of attention maps in the pre-trained model.

Download the pre-trained model to ./pretrained.

python3 tools/visualize_attn.py --model_path pretrained/RN50_synthtext.pt --char_dict_path data/SynthText/char_dict --model_config_file src/training/model_configs/RN50.json --im_fn demo/sample.jpg --text_list "ST LING" "STRLIN " "A GYLL'S" " ODGINGS" --demo_path demo/
Input Image Image Attenion Map "ST LING" "STRLIN " "A GYLL'S" " ODGINGS"
Input Image Image Attenion Map Char Attention Map 0 Char Attention Map 1 Char Attention Map 2 Char Attention Map 3

Fine-tune in MMOCR

We provide a script for converting model parameter names, thus it could be used in the dev-1.x branch of MMOCR

# first modify the model_path and save_path in tools/convert2mmocr.py
python tools/convert2mmocr.py

Citation

@article{xue2022language,
  title={Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting},
  author={Xue, Chuhui and Zhang, Wenqing and Hao, Yu and Lu, Shijian and Torr, Philip and Bai, Song},
  journal={Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2022}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages