Skip to content

Latest commit

 

History

History
101 lines (93 loc) · 3.65 KB

README.md

File metadata and controls

101 lines (93 loc) · 3.65 KB

Trainer

Introduction

This trainer handles two types of models.

  • Text parser model
  • Inpaint model

Training data config

The proposed model is trained with multiple datasets generated by the generator.
We can edit the dataset configuration in train.py.
The below code is a simple example for the setting (use of one dataset).

dataset_list=[
    DatasetConfig(
        data_dir='gen_data/color_eng_tmp',
        prior=1.0,
        noisy_bg_option=False,
    )
]

DatasetConfig class has three attributes.

  • data_dir
    • Directory name of the generated dataset
  • prior
    • Float value of the prior; larger prior value means more frequently used in training.
  • noisy_bg_option
    • Option of computing loss of text foreground prediction. Please set True for datasets whose background images include texts other than generated by the renderer.

Text parser model

In the paper, we used five types of datasets (synth text, fmd, color, book cover, bam) for a training text parser model.
See generator for the details of the datasets.
The configuration of the datasets for reproduction of the model in train.py is as follows.

dataset_list=[
    DatasetConfig(
        data_dir='DIR_SYNTHTEXTBG_DATASET',
        prior=1.0,
        noisy_bg_option=False,
    )
    DatasetConfig(
        data_dir='DIR_FMDBG_DATASET',
        prior=1.0,
        noisy_bg_option=False,
    )
    DatasetConfig(
        data_dir='DIR_COLOR_DATASET',
        prior=1.0,
        noisy_bg_option=False,
    )
    DatasetConfig(
        data_dir='DIR_BOOKCOVERBG_DATASET',
        prior=1.0,
        noisy_bg_option=True,
    )
    DatasetConfig(
        data_dir='DIR_BAMBG_DATASET',
        prior=1.0,
        noisy_bg_option=True,
    )
]

Note, we refrain from providing all datasets used in the paper considering the licenses of the datasets.

We trained the model with two steps in the paper. First, we train the proposed model with only OCR branches. Second, we train an overall model with initialization by the parameters of the model in the first step. The pre-trained model of the text parser is here.

The training script for the reproduction of the text parser model.

python train.py --mode=0 --pret=[path for the above pre-trained model] --batch_size=8 

Inpaintor model

In the paper, we used three types of datasets (synth text, fmd, color) for a training inpaintor model.
The configuration of the datasets for reproduction of the model in train.py is as follows.

dataset_list=[
    DatasetConfig(
        data_dir='DIR_SYNTHTEXTBG_DATASET',
        prior=1.0,
        noisy_bg_option=False,
    )
    DatasetConfig(
        data_dir='DIR_FMDBG_DATASET',
        prior=1.0,
        noisy_bg_option=False,
    )
    DatasetConfig(
        data_dir='DIR_COLOR_DATASET',
        prior=1.0,
        noisy_bg_option=False,
    )
]

The training script for the reproduction of the inpaintor model.

python train.py --mode=1 --batch_size=32