Skip to content

ry-eon/bubble_segmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bubble Segmentation

Overview

The model segmentates speech bubble within the cut. I have referenced and implemented segmentation_models.pytorch to segment speech bubble. In the previous task, the speech bubble detection was performed. In this task, after detection, the speech bubble was accurately segmenized through edge detection such as canny edge detection. If you are curious about the bubble detection task, refer to the following bubble detector. However, performance is limited when edge detector is used(ex: transparency, scatter-type, etc). Therefore, masks of some speech bubble were created with edge detector and additional data were collected to create segmentation models.

result


Model Configuration and Performance

Model Configuration

Model Component Configuration
Base Network Unet
Encoder mobilenet_v2
Pretrained imagenet

Model Performance

Inference Images

  • size : 224 x 224
  • sheets : 25

Compare Inference Time

Speed mobilenet_V2 efficientnet-b0 resnet34
CPU 5.58838 sec 7.83775 sec 9.11186 sec
CUDA
  • The comparison of the three ecoders showed similar performance. Therefore, we chose mobilenet_v2 with the fewest parameters and fastest inference time

  • Compare Encoder(click)
    • Encoder

      • resnet34
      • efficientnet-b0
      • mobilenet_V2
    • sample 1
      Encoder Sample
      resnet34 check_unet_epoch10 png_0
      efficientnet-b0 check_eff_epoch9 png_0
      mobilenet_v2 check_mob_epoch8 png_0
      </div>
      
      sample 2
      Encoder Sample
      resnet34 check_unet_epoch10 png_1
      efficientnet-b0 check_eff_epoch9 png_1
      mobilenet_v2 check_mob_epoch8 png_1
      </div>
      
      sample 3
      Encoder Sample
      resnet34 check_unet_epoch10 png_2
      efficientnet-b0 check_eff_epoch9 png_2
      mobilenet_v2 check_mob_epoch8 png_2
      </div>
      
      sample 4
      Encoder Sample
      resnet34 check_unet_epoch10 png_3
      efficientnet-b0 check_eff_epoch9 png_3
      mobilenet_v2 check_mob_epoch8 png_3
      </div>
      
      sample 5
      Encoder Sample
      resnet34 check_unet_epoch10 png_4
      efficientnet-b0 check_eff_epoch9 png_4
      mobilenet_v2 check_mob_epoch8 png_4
      </div>
      
      sample 6
      Encoder Sample
      resnet34 check_unet_epoch10 png_5
      efficientnet-b0 check_eff_epoch9 png_5
      mobilenet_v2 check_mob_epoch8 png_5
      </div>
      
      sample 7
      Encoder Sample
      resnet34 check_unet_epoch10 png_6
      efficientnet-b0 check_eff_epoch9 png_6
      mobilenet_v2 check_mob_epoch8 png_6
      </div>
      
      sample 8
      Encoder Sample
      resnet34 check_unet_epoch10 png_7
      efficientnet-b0 check_eff_epoch9 png_7
      mobilenet_v2 check_mob_epoch8 png_7
      </div>
      
      sample 9
      Encoder Sample
      resnet34 check_unet_epoch10 png_8
      efficientnet-b0 check_eff_epoch9 png_8
      mobilenet_v2 check_mob_epoch8 png_8
      </div>
      
      sample 10
      Encoder Sample
      resnet34 check_unet_epoch10 png_9
      efficientnet-b0 check_eff_epoch9 png_9
      mobilenet_v2 check_mob_epoch8 png_9
      </div>
      

Pretrained

Model Link
Mobilenet_v2 Link
Mobilenet_v2 + Simple Random Location Link
Mobilenet_v2 + Transparent Random Location Link
Mobilenet_v2 + Color Random Location [Link]
Mobilenet_v2 + Color + Transparent Random Location [Link]

Data Generation

Overview

Use trdg to generate text data. In the case of trdg, when generating Korean image data, the data is generated one letter at a time. If you want to create a text image consisting of five Korean words, a total of five characters will be created, one by one. Therefore, I added the word in the Korean word dictionary as a txt file. Also, there is only one Korean font in trdg. Therefore, I added a font for Korean. The font is specified as random.

스크린샷 2021-03-11 오전 2 08 06

Arguments

  • Directory
    • --output_dir : Specify the directory in which to store the generated data.
    • --input_file : When set, this argument uses a specified text file as source for the text.
  • Text Generation
    • --language : The language to use, should be fr (French), en (English), es (Spanish), de (German), cn (Chinese), or hi (Hindi).
    • `-c' : The number of images to be created.
    • -rs : Use random sequences as the source text for the generation. Set '-let','-num','-sym' to use letters/numbers/symbols. If none specified, using all three.
    • -let : Define if random sequences should contain letters. Only works with -rs
    • -num : Define if random sequences should contain numbers. Only works with -rs
    • -sym : Define if random sequences should contain symbols. Only works with -rs
    • -t : Define the number of thread to use for image generation
    • -om : Define if the generator will return masks for the text
  • Data Format
    • -w : Define how many words should be included in each generated sample.
    • -r : Define if the produced string will have variable word count (with --length being the maximum).
    • -f : Define the height of the produced images if horizontal, else the width.
    • -e : Define the extension to save the image with.
    • -wd: Define the width of the resulting image. If not set it will be the width of the text + 10. If the width of the generated text is bigger that number will be used.
    • -al : Define the alignment of the text in the image. Only used if the width parameter is set. 0: left, 1: center, 2: right.
    • -or : Define the orientation of the text. 0: Horizontal, 1: Vertical.
    • -sw : Define the width of the spaces between words. 2.0 means twice the normal space width.
    • -cs : Define the width of the spaces between characters. 2 means two pixels.
    • -m : Define the margins around the text when rendered. In pixels.
    • -fi : Apply a tight crop around the rendered text.
    • -ca : Generate upper or lowercase only. arguments: upper or lower. Example: --case upper if you use en.
    • -ws : Split on words instead of on characters (preserves ligatures, no character spacing).
    • -stw : Define the width of the strokes.
    • -im : Define the image mode to be used. RGB is default, L means 8-bit grayscale images, 1 means 1-bit binary images stored with one pixel per byte, etc.
  • Text Augmentation
    • -k : Define skewing angle of the generated text. In positive degrees.
    • -rk : When set, the skew angle will be randomized between the value set with -k and it's opposite.
    • -bl : Apply gaussian blur to the resulting sample. Should be an integer defining the blur radius.
    • -rbl : When set, the blur radius will be randomized between 0 and -bl.
    • -b : Define what kind of background to use. 0: Gaussian Noise, 1: Plain white, 2: Quasicrystal, 3: Image.
    • -na : Define how the produced files will be named. 0: [TEXT][ID].[EXT], 1: [ID][TEXT].[EXT] 2: [ID].[EXT] + one file labels.txt containing id-to-label mappings.
    • -d : Define a distorsion applied to the resulting image. 0: None (Default), 1: Sine wave, 2: Cosine wave, 3: Random.
    • -do : Define the distorsion's orientation. Only used if -d is specified. 0: Vertical (Up and down), 1: Horizontal (Left and Right), 2: Both.
    • -tc : Define the text's color, should be either a single hex color or a range in the ?,? format.
    • -id : Define an image directory to use when background is set to image.
    • -stf : Define the color of the contour of the strokes, if stroke_width is bigger than.
  • Mask Generation
    • -save_dir : Specify the directory in which to store the mask image.
    • -sn : Define how the produced mask will be named.
    • -mt : Defines how many images are used in a row.
    • -mw : Define the width of the mask image.
    • -mh : Define the height of the mask image.

How to Run

  • Use the above argument to generate the data you want.
    python ./trdg/run.py -argument 
    

Data Augmentation

Overview

Data augmentation consists of four categories: Copy to Simple Random Location, Copy to Transparent Random Location, Copy to Color Random Location,Copy to Color + Transparent Random Location. In case of Copy to Simple Random Location, copy the generated speech bubble to a random location inside the cut. In case of Copy to Transparent Random Location, make the generated speech bubble transparent and copy it to a random location inside the cut. In case of Copy to Color Random Location, color the generated speech bubble and copy it to a random location inside the cut. In case of Copy to Color + Transparent Random Location, color the generated speech bubble and make it transparent and copy it to a random location inside the cut.

스크린샷 2021-03-05 오후 6 25 50


Install dependencies

  • Pytorch Version

    • Pytorch 1.7.0 higher
  • Install Dependencies Code

    pip install torch torchvision albumentations numpy opencv-python pandas Pillow pretrainedmodels scipy segmentation-models-pytorch efficientnet-pytorch timm requests
    

    or

    pip install -r requirements.txt
    

Train

  • 1. Download weight

  • 2. Train

    • Argument

      • device option
        • -g_num : gpu number to use cuda
        • -device : Whether the device to be used is cpu or cuda
      • data option
        • -train_dir : The parent folder of the image and mask that you use for training
        • -valid_dir : The folder of the image and mask that you use for Validating
      • model option
        • -pretraied : pretrained model for the entire network
        • -encoder : Encoder to use for network. Refer to segmentation_models.pytorch for encoders.
        • -encoder_weight : pretrained model for encoder
        • -activation : activation function
      • augmentation option
        • -simple : Simply attach the speech bubble to a random location inside the cut.
        • -trans : Attach the transparent speech bubble to a random location inside the cut.
        • -color : Attach the color speech bubble to a random location inside the cut.
        • -trans_color : Attach the color + transparent speech bubble to a random location inside the cut.
    • How to Run

      • Use the above argument to generate the data you want.
      python train.py -g gpu_id -dir 'data_dir' -pretrained 'pretrained_model.pth' ... 
      

Demo

  • 1. Download weight
  • 2. Demo
    python demp.py --weightfile pretrained_model.pth -imgfile image_dir 
    

Result

MobileNet_v2 MobileNet_v2 + trans
train_model_mob7 pth train_model_mob_trans_4 pth

Future Works

Overview

The model has limitations on some speech bubbles.The biggest problem is not segmenting unusual shape of speech bubbles (hard cases). Next, the performance for transparent speech bubbles is low. If there are letters around the transparent speech bubble, the model may predict the letters as speech bubbles. In addition, if the speech bubble is too transparent, the model cannot predict the speech bubble, and if other elements are transparent within the cut, the model may predict it as speech bubble. Finally, when the letters in speech bubbles are distorted, the model fails to predict.

Improvement points are summarized as follows.

Improvement Points

  • The model need to recognize unusual case(complicated decoration, Gradation) about speech bubbles.
  • The model need to seperate between transparent speech bubbles and line text letters near this when this two are overlayed.
  • The model need to recognize speech bubbles with high transparency.
  • The model only need to recognize speech bubbles not the other features(transparent background, text, ).
  • The model need to recognize speech bubbles with distorted internal letters, such as line text about sound effect.

Reference

  1. qubvel, segmentation_models.pytorch
  2. Belval, TextRecognitionDataGenerator

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages