Bubble Segmentation

Overview

The model segmentates speech bubble within the cut. I have referenced and implemented segmentation_models.pytorch to segment speech bubble. In the previous task, the speech bubble detection was performed. In this task, after detection, the speech bubble was accurately segmenized through edge detection such as canny edge detection. If you are curious about the bubble detection task, refer to the following bubble detector. However, performance is limited when edge detector is used(ex: transparency, scatter-type, etc). Therefore, masks of some speech bubble were created with edge detector and additional data were collected to create segmentation models.

Model Configuration and Performance

Model Configuration

Model Component	Configuration
Base Network	Unet
Encoder	mobilenet_v2
Pretrained	imagenet

Model Performance

Inference Images

size : 224 x 224
sheets : 25

Compare Inference Time

Speed	mobilenet_V2	efficientnet-b0	resnet34
CPU	5.58838 sec	7.83775 sec	9.11186 sec
CUDA

The comparison of the three ecoders showed similar performance. Therefore, we chose mobilenet_v2 with the fewest parameters and fastest inference time
Compare Encoder(click)
- Encoder
  - resnet34
  - efficientnet-b0
  - mobilenet_V2
- sample 1
  
  Encoder Sample
  
  resnet34
  
  efficientnet-b0
  
  mobilenet_v2
  
  </div>
  sample 2
  
  Encoder Sample
  
  resnet34
  
  efficientnet-b0
  
  mobilenet_v2
  
  </div>
  sample 3
  
  Encoder Sample
  
  resnet34
  
  efficientnet-b0
  
  mobilenet_v2
  
  </div>
  sample 4
  
  Encoder Sample
  
  resnet34
  
  efficientnet-b0
  
  mobilenet_v2
  
  </div>
  sample 5
  
  Encoder Sample
  
  resnet34
  
  efficientnet-b0
  
  mobilenet_v2
  
  </div>
  sample 6
  
  Encoder Sample
  
  resnet34
  
  efficientnet-b0
  
  mobilenet_v2
  
  </div>
  sample 7
  
  Encoder Sample
  
  resnet34
  
  efficientnet-b0
  
  mobilenet_v2
  
  </div>
  sample 8
  
  Encoder Sample
  
  resnet34
  
  efficientnet-b0
  
  mobilenet_v2
  
  </div>
  sample 9
  
  Encoder Sample
  
  resnet34
  
  efficientnet-b0
  
  mobilenet_v2
  
  </div>
  sample 10
  
  Encoder Sample
  
  resnet34
  
  efficientnet-b0
  
  mobilenet_v2
  
  </div>

Pretrained

Model	Link
Mobilenet_v2	Link
Mobilenet_v2 + Simple Random Location	Link
Mobilenet_v2 + Transparent Random Location	Link
Mobilenet_v2 + Color Random Location	[Link]
Mobilenet_v2 + Color + Transparent Random Location	[Link]

Data Generation

Overview

Use trdg to generate text data. In the case of trdg, when generating Korean image data, the data is generated one letter at a time. If you want to create a text image consisting of five Korean words, a total of five characters will be created, one by one. Therefore, I added the word in the Korean word dictionary as a txt file. Also, there is only one Korean font in trdg. Therefore, I added a font for Korean. The font is specified as random.

Arguments

Directory
- --output_dir : Specify the directory in which to store the generated data.
- --input_file : When set, this argument uses a specified text file as source for the text.
Text Generation
- --language : The language to use, should be fr (French), en (English), es (Spanish), de (German), cn (Chinese), or hi (Hindi).
- `-c' : The number of images to be created.
- -rs : Use random sequences as the source text for the generation. Set '-let','-num','-sym' to use letters/numbers/symbols. If none specified, using all three.
- -let : Define if random sequences should contain letters. Only works with -rs
- -num : Define if random sequences should contain numbers. Only works with -rs
- -sym : Define if random sequences should contain symbols. Only works with -rs
- -t : Define the number of thread to use for image generation
- -om : Define if the generator will return masks for the text
Data Format
- -w : Define how many words should be included in each generated sample.
- -r : Define if the produced string will have variable word count (with --length being the maximum).
- -f : Define the height of the produced images if horizontal, else the width.
- -e : Define the extension to save the image with.
- -wd: Define the width of the resulting image. If not set it will be the width of the text + 10. If the width of the generated text is bigger that number will be used.
- -al : Define the alignment of the text in the image. Only used if the width parameter is set. 0: left, 1: center, 2: right.
- -or : Define the orientation of the text. 0: Horizontal, 1: Vertical.
- -sw : Define the width of the spaces between words. 2.0 means twice the normal space width.
- -cs : Define the width of the spaces between characters. 2 means two pixels.
- -m : Define the margins around the text when rendered. In pixels.
- -fi : Apply a tight crop around the rendered text.
- -ca : Generate upper or lowercase only. arguments: upper or lower. Example: --case upper if you use en.
- -ws : Split on words instead of on characters (preserves ligatures, no character spacing).
- -stw : Define the width of the strokes.
- -im : Define the image mode to be used. RGB is default, L means 8-bit grayscale images, 1 means 1-bit binary images stored with one pixel per byte, etc.
Text Augmentation
- -k : Define skewing angle of the generated text. In positive degrees.
- -rk : When set, the skew angle will be randomized between the value set with -k and it's opposite.
- -bl : Apply gaussian blur to the resulting sample. Should be an integer defining the blur radius.
- -rbl : When set, the blur radius will be randomized between 0 and -bl.
- -b : Define what kind of background to use. 0: Gaussian Noise, 1: Plain white, 2: Quasicrystal, 3: Image.
- -na : Define how the produced files will be named. 0: [TEXT][ID].[EXT], 1: [ID][TEXT].[EXT] 2: [ID].[EXT] + one file labels.txt containing id-to-label mappings.
- -d : Define a distorsion applied to the resulting image. 0: None (Default), 1: Sine wave, 2: Cosine wave, 3: Random.
- -do : Define the distorsion's orientation. Only used if -d is specified. 0: Vertical (Up and down), 1: Horizontal (Left and Right), 2: Both.
- -tc : Define the text's color, should be either a single hex color or a range in the ?,? format.
- -id : Define an image directory to use when background is set to image.
- -stf : Define the color of the contour of the strokes, if stroke_width is bigger than.
Mask Generation
- -save_dir : Specify the directory in which to store the mask image.
- -sn : Define how the produced mask will be named.
- -mt : Defines how many images are used in a row.
- -mw : Define the width of the mask image.
- -mh : Define the height of the mask image.

How to Run

Use the above argument to generate the data you want.
```
python ./trdg/run.py -argument 
```

Data Augmentation

Overview

Data augmentation consists of four categories: Copy to Simple Random Location, Copy to Transparent Random Location, Copy to Color Random Location,Copy to Color + Transparent Random Location. In case of Copy to Simple Random Location, copy the generated speech bubble to a random location inside the cut. In case of Copy to Transparent Random Location, make the generated speech bubble transparent and copy it to a random location inside the cut. In case of Copy to Color Random Location, color the generated speech bubble and copy it to a random location inside the cut. In case of Copy to Color + Transparent Random Location, color the generated speech bubble and make it transparent and copy it to a random location inside the cut.

Install dependencies

Pytorch Version
- Pytorch 1.7.0 higher

Install Dependencies Code

pip install torch torchvision albumentations numpy opencv-python pandas Pillow pretrainedmodels scipy segmentation-models-pytorch efficientnet-pytorch timm requests

or

pip install -r requirements.txt

Train

1. Download weight
2. Train
- Argument
  - device option
    - -g_num : gpu number to use cuda
    - -device : Whether the device to be used is cpu or cuda
  - data option
    - -train_dir : The parent folder of the image and mask that you use for training
    - -valid_dir : The folder of the image and mask that you use for Validating
  - model option
    - -pretraied : pretrained model for the entire network
    - -encoder : Encoder to use for network. Refer to segmentation_models.pytorch for encoders.
    - -encoder_weight : pretrained model for encoder
    - -activation : activation function
  - augmentation option
    - -simple : Simply attach the speech bubble to a random location inside the cut.
    - -trans : Attach the transparent speech bubble to a random location inside the cut.
    - -color : Attach the color speech bubble to a random location inside the cut.
    - -trans_color : Attach the color + transparent speech bubble to a random location inside the cut.
- How to Run
  - Use the above argument to generate the data you want.
```
python train.py -g gpu_id -dir 'data_dir' -pretrained 'pretrained_model.pth' ... 
```

Demo

1. Download weight

2. Demo

python demp.py --weightfile pretrained_model.pth -imgfile image_dir

Result

MobileNet_v2	MobileNet_v2 + trans

Future Works

Overview

The model has limitations on some speech bubbles.The biggest problem is not segmenting unusual shape of speech bubbles (hard cases). Next, the performance for transparent speech bubbles is low. If there are letters around the transparent speech bubble, the model may predict the letters as speech bubbles. In addition, if the speech bubble is too transparent, the model cannot predict the speech bubble, and if other elements are transparent within the cut, the model may predict it as speech bubble. Finally, when the letters in speech bubbles are distorted, the model fails to predict.

Improvement points are summarized as follows.

Improvement Points

The model need to recognize unusual case(complicated decoration, Gradation) about speech bubbles.
The model need to seperate between transparent speech bubbles and line text letters near this when this two are overlayed.
The model need to recognize speech bubbles with high transparency.
The model only need to recognize speech bubbles not the other features(transparent background, text, ).
The model need to recognize speech bubbles with distorted internal letters, such as line text about sound effect.

Reference

qubvel, segmentation_models.pytorch
Belval, TextRecognitionDataGenerator

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
trdg		trdg
.gitignore		.gitignore
README.md		README.md
demo.py		demo.py
train.py		train.py
utils.py		utils.py
valid.py		valid.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bubble Segmentation

Overview

Model Configuration and Performance

Model Configuration

Model Performance

Pretrained

Data Generation

Overview

Arguments

How to Run

Data Augmentation

Overview

Install dependencies

Train

Demo

Result

Future Works

Overview

Improvement Points

Reference

About

Releases

Packages

Languages

ry-eon/bubble_segmentation

Folders and files

Latest commit

History

Repository files navigation

Bubble Segmentation

Overview

Model Configuration and Performance

Model Configuration

Model Performance

Pretrained

Data Generation

Overview

Arguments

How to Run

Data Augmentation

Overview

Install dependencies

Train

Demo

Result

Future Works

Overview

Improvement Points

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages