Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Features] ICDAR17_MLT dataset train/val split #21

Closed
yehyunsuh opened this issue Apr 15, 2022 · 0 comments
Closed

[Features] ICDAR17_MLT dataset train/val split #21

yehyunsuh opened this issue Apr 15, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@yehyunsuh
Copy link
Contributor

yehyunsuh commented Apr 15, 2022

What

ICDAR17_MLT dataset train/val split completed

Why

For checking the validation accuracy in our experiments

how

Prerequisites

#17 Download MLT and division done by following the tagged issue

Orders

convert_mlp.py file을 수정해주어야 합니다

SRC_DATASET_DIR = '/opt/ml/input/data/ICDAR17_MLT' 
DST_DATASET_DIR = '/opt/ml/input/data/ICDAR17_Korean' 
...
def main():
    dst_image_dir_train = osp.join(DST_DATASET_DIR, 'train')
    dst_image_dir_val = osp.join(DST_DATASET_DIR, 'val')
    mlt_train = MLT17Dataset(osp.join(SRC_DATASET_DIR, 'raw/ch8_training_images'),
                             osp.join(SRC_DATASET_DIR, 'raw/ch8_training_gt'),
                             copy_images_to=dst_image_dir_train)
    mlt_valid = MLT17Dataset(osp.join(SRC_DATASET_DIR, 'raw/ch8_validation_images'),
                             osp.join(SRC_DATASET_DIR, 'raw/ch8_validation_gt'),
                             copy_images_to=dst_image_dir_val)
    ## create train dataset
    anno = dict(images=dict())
    with tqdm(total=len(mlt_train)) as pbar:
        for batch in DataLoader(mlt_train, num_workers=NUM_WORKERS, collate_fn=lambda x: x):
            image_fname, sample_info = batch[0]
            anno['images'][image_fname] = sample_info
            pbar.update(1)

    ufo_dir_train = osp.join(DST_DATASET_DIR, 'ufo')
    maybe_mkdir(ufo_dir_train)
    with open(osp.join(ufo_dir_train, 'train.json'), 'w') as f:
        json.dump(anno, f, indent=4)

    ## create val dataset
    anno = dict(images=dict())
    with tqdm(total=len(mlt_valid)) as pbar:
        for batch in DataLoader(mlt_valid, num_workers=NUM_WORKERS, collate_fn=lambda x: x):
            image_fname, sample_info = batch[0]
            anno['images'][image_fname] = sample_info
            pbar.update(1)

    ufo_dir_val = osp.join(DST_DATASET_DIR, 'ufo')
    maybe_mkdir(ufo_dir_val)
    with open(osp.join(ufo_dir_val, 'val.json'), 'w') as f:
        json.dump(anno, f, indent=4)

로 수정을 해주시면
image
위 사진과 같이 정상적으로 진행이 됨을 알 수 있고, 이를 파일 상에서 확인을 해보면
image
(여기서 baseline_train.json은 기존에 제공된 json file입니다. train.json과 이름이 겹쳐서 미리 이름을 바꾸어 놓았는데, 삭제되어도 aistages 에서 다시 다운 받으실 수 있습니다.)

해당 수정사항은 코드를 깔끔하게 수정해서 pull request 올리도록 하겠습니다. 다만, 기존 코드는 지우지 않고 주석처리만 하도록 하겠습니다.

@yehyunsuh yehyunsuh added the enhancement New feature or request label Apr 15, 2022
@yehyunsuh yehyunsuh self-assigned this Apr 15, 2022
@yehyunsuh yehyunsuh added this to To do in Data Annotation via automation Apr 15, 2022
Data Annotation automation moved this from To do to Done Apr 18, 2022
yehyunsuh added a commit that referenced this issue Apr 19, 2022
@seoulsky-field seoulsky-field changed the title [Enhancement] ICDAR17_MLT dataset train/val split [Features] ICDAR17_MLT dataset train/val split Apr 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

1 participant