[Features] ICDAR17_MLT dataset train/val split #21

yehyunsuh · 2022-04-15T00:22:07Z

What

ICDAR17_MLT dataset train/val split completed

Why

For checking the validation accuracy in our experiments

how

Prerequisites

#17 Download MLT and division done by following the tagged issue

Orders

convert_mlp.py file을 수정해주어야 합니다

SRC_DATASET_DIR = '/opt/ml/input/data/ICDAR17_MLT' 
DST_DATASET_DIR = '/opt/ml/input/data/ICDAR17_Korean' 
...
def main():
    dst_image_dir_train = osp.join(DST_DATASET_DIR, 'train')
    dst_image_dir_val = osp.join(DST_DATASET_DIR, 'val')
    mlt_train = MLT17Dataset(osp.join(SRC_DATASET_DIR, 'raw/ch8_training_images'),
                             osp.join(SRC_DATASET_DIR, 'raw/ch8_training_gt'),
                             copy_images_to=dst_image_dir_train)
    mlt_valid = MLT17Dataset(osp.join(SRC_DATASET_DIR, 'raw/ch8_validation_images'),
                             osp.join(SRC_DATASET_DIR, 'raw/ch8_validation_gt'),
                             copy_images_to=dst_image_dir_val)
    ## create train dataset
    anno = dict(images=dict())
    with tqdm(total=len(mlt_train)) as pbar:
        for batch in DataLoader(mlt_train, num_workers=NUM_WORKERS, collate_fn=lambda x: x):
            image_fname, sample_info = batch[0]
            anno['images'][image_fname] = sample_info
            pbar.update(1)

    ufo_dir_train = osp.join(DST_DATASET_DIR, 'ufo')
    maybe_mkdir(ufo_dir_train)
    with open(osp.join(ufo_dir_train, 'train.json'), 'w') as f:
        json.dump(anno, f, indent=4)

    ## create val dataset
    anno = dict(images=dict())
    with tqdm(total=len(mlt_valid)) as pbar:
        for batch in DataLoader(mlt_valid, num_workers=NUM_WORKERS, collate_fn=lambda x: x):
            image_fname, sample_info = batch[0]
            anno['images'][image_fname] = sample_info
            pbar.update(1)

    ufo_dir_val = osp.join(DST_DATASET_DIR, 'ufo')
    maybe_mkdir(ufo_dir_val)
    with open(osp.join(ufo_dir_val, 'val.json'), 'w') as f:
        json.dump(anno, f, indent=4)

로 수정을 해주시면

위 사진과 같이 정상적으로 진행이 됨을 알 수 있고, 이를 파일 상에서 확인을 해보면

(여기서 baseline_train.json은 기존에 제공된 json file입니다. train.json과 이름이 겹쳐서 미리 이름을 바꾸어 놓았는데, 삭제되어도 aistages 에서 다시 다운 받으실 수 있습니다.)

해당 수정사항은 코드를 깔끔하게 수정해서 pull request 올리도록 하겠습니다. 다만, 기존 코드는 지우지 않고 주석처리만 하도록 하겠습니다.

The text was updated successfully, but these errors were encountered:

yehyunsuh added the enhancement New feature or request label Apr 15, 2022

yehyunsuh mentioned this issue Apr 15, 2022

[Discussion] 결국 우리는 다시 data를 추가해야한다. #20

Closed

2 tasks

yehyunsuh self-assigned this Apr 15, 2022

yehyunsuh added this to To do in Data Annotation via automation Apr 15, 2022

seoulsky-field mentioned this issue Apr 15, 2022

[Features] append validation f1 score in train.py #22

Closed

1 task

yehyunsuh closed this as completed Apr 18, 2022

Data Annotation automation moved this from To do to Done Apr 18, 2022

yehyunsuh added a commit that referenced this issue Apr 19, 2022

[Feat] split train/val dataset #21

cb81b89

yehyunsuh mentioned this issue Apr 19, 2022

[Feat] add train/valid split code #34

Merged

seoulsky-field changed the title ~~[Enhancement] ICDAR17_MLT dataset train/val split~~ [Features] ICDAR17_MLT dataset train/val split Apr 19, 2022

tnsgh9603 mentioned this issue Apr 19, 2022

[Discussion] train/val error #39

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Features] ICDAR17_MLT dataset train/val split #21

[Features] ICDAR17_MLT dataset train/val split #21

yehyunsuh commented Apr 15, 2022 •

edited

[Features] ICDAR17_MLT dataset train/val split #21

[Features] ICDAR17_MLT dataset train/val split #21

Comments

yehyunsuh commented Apr 15, 2022 • edited

What

Why

how

Prerequisites

Orders

yehyunsuh commented Apr 15, 2022 •

edited