Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight'
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore Update .gitignore Jan 19, 2018
README.md Update README.md Feb 6, 2018
class_label_map.xlsx init commit Jan 19, 2018
datagen.py init commit Jan 19, 2018
encoder.py init commit Jan 19, 2018
inception.py init commit Jan 19, 2018
loss.py display loss in train function Feb 28, 2018
sstdnet.py init commit Jan 19, 2018
test.py init commit Jan 19, 2018
test_multi.py init commit Jan 19, 2018
train.py display loss in train function Feb 28, 2018
utils.py init commit Jan 19, 2018

README.md

SSTDNet

Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight' using pytorch.

This code is work for general object detection problem. not for (oriented) text detection problem. I will probably update to handle oriented bounding box as soon as possible :)

[How to use]

  1. you need dataset.
  • dataset structure is..

    /train/0.jpg, /train/0.txt, /valid/0.jpg, /valid/0.txt, ....

  • 0.txt contain position and label of objects like below

    (xmin, ymin, xmax, ymax, label)

    1273.0 935.0 1407.0 1017.0 v1

    911.0 893.0 979.0 953.0 v1

    984.0 889.0 1053.0 948.0 v1

  • To encode label name to integer number, you should define labels in the 'class_lable_map.xlsx"

    v1 1

    v2 2

    ....

    * start from 1. not from 0. 0 will be background (in the loss.py).
  1. need some settings for dataset reader.

    - see train.py. you can find some code for reading dataset

    
      'trainset = ListDataset(root="../train", gt_extension=".txt", labelmap_path="class_label_map.xlsx", is_train=True, transform=transform, input_image_size=512, num_crops=n_crops, original_img_size=2048)'
      
    • you should set the 'input_image_size' and 'original_img_size'. 'input_image_size' is size of (cropped) image for train. And 'original_img_size' is size of (original) image. I made this parameter to handle high resolution image. if you don't need crop function, -1 for num_crops.
  2. Train with your dataset!

    you should define some parameter like learning rate, which optimizer to use, size of batch etc.