Self-supervised Pre-training of Text Recognizers

This repository contains the code for the paper Self-supervised Pre-training of Text Recognizers by Martin Kišš and Michal Hradiš for 2024 International Conference on Document Analysis and Recognition (ICDAR 2024). In this repository you can find implementations of investigated methods, models, and visualizations.

[arxiv]

Paper abstract

In this paper, we investigate self-supervised pre-training methods for document text recognition. Nowadays, large unlabeled datasets can be collected for many research tasks, including text recognition, but it is costly to annotate them. Therefore, methods utilizing unlabeled data are researched. We study self-supervised pre-training methods based on masked label prediction using three different approaches -- Feature Quantization, VQ-VAE, and Post-Quantized AE. We also investigate joint-embedding approaches with VICReg and NT-Xent objectives, for which we propose an image shifting technique to prevent model collapse where it relies solely on positional encoding while completely ignoring the input image. We perform our experiments on historical handwritten (Bentham) and historical printed datasets mainly to investigate the benefits of the self-supervised pre-training techniques with different amounts of annotated target domain data. We use transfer learning as strong baselines. The evaluation shows that the self-supervised pre-training on data from the target domain is very effective, but it struggles to outperform transfer learning from closely related domains. This paper is one of the first researches exploring self-supervised pre-training in document text recognition, and we believe that it will become a cornerstone for future research in this area. We made our implementation of the investigated methods publicly available at https://github.com/DCGM/pero-pretraining.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
pero_pretraining		pero_pretraining
visualizations		visualizations
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pero_pretraining

pero_pretraining

visualizations

visualizations

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Self-supervised Pre-training of Text Recognizers

Paper abstract

About

Releases

Packages

Contributors 2

Languages

License

DCGM/pero-pretraining

Folders and files

Latest commit

History

Repository files navigation

Self-supervised Pre-training of Text Recognizers

Paper abstract

About

Resources

License

Stars

Watchers

Forks

Languages