Skip to content

Script identification in natural scene image and video frames using an attention based Convolutional-LSTM network (Pattern Recognition, 2019)

Notifications You must be signed in to change notification settings

ankanbhunia/AttenScriptNetPR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Script Identification in Natural Scene Image and Video Frame using Attention based Convolutional-LSTM Network [PDF][Arxiv]

https://www.sciencedirect.com/science/article/pii/S0031320318302590

Published in Pattern Recognition 2019

This repository contains the codes and instructions to use the trained models for all the four datasets described in the paper.

Alt text

Introduction

Script identification plays a significant role in analysing documents and videos. In this paper, we focus on the problem of script identification in scene text images and video scripts. Because of low image quality, complex background and similar layout of characters shared by some scripts like Greek, Latin, etc., text recognition in those cases become challenging.

Pre-requisites

  • python 2.7
  • Tensorflow
  • OpenCV
  • matplotlib
  • jupyter notebook

Training

  • Network Architecture

In this paper, we propose a novel method that involves extraction of local and global features using CNN-LSTM framework and weighting them dynamically for script identification. First, we convert the images into patches and feed them into a CNN-LSTM framework. Attention-based patch weights are calculated applying softmax layer after LSTM. Then we do patch-wise multiplication of these weights with corresponding CNN to yield local features. Global features are also extracted from last cell state of LSTM. We employ a fusion technique which dynamically weights the local and global features for an individual patch

Alt text

You can also interactively visualize the tensorboard graph by running the following:

tensorboard --logdir='logfiles'
  • Training curves

Tensorboard scaler plot of loss function and accuracy over number of epochs on CVSI-15 dataset

Alt text

  • Visualize Attention maps

Alt text

We can visualize the relative importance of the patches from the attention maps. High attention is shown in white and low attention is shown in black.

Datasets & Trained models

We evaluated our model on four datasets : CVSI-2015, SIW-13, ICDAR-2017, MLe2e. To faciliate further research, we are making available the trained models on all these four datasets.

Dataset Trained Model Accuracy
CVSI-15 Download 97.75%
SIW-13 Download 96.5%
ICDAR-17 Download 90.23%
MLe2e Download 96.7%

Usage

Clone or download this repository. Download the pre-trained models and unzip the models in the Models/<dataset_name> folder. Now, open jupyter notebook , run the main.ipynb file and execute the code.

Citations

If you find this code useful in your research, please consider citing:

  @article{BHUNIA2018,
  title = "Script Identification in Natural Scene Image and Video Frames using an Attention based Convolutional-LSTM Network",
  journal = "Pattern Recognition",
  year = "2018",
  issn = "0031-3203",
  doi = "https://doi.org/10.1016/j.patcog.2018.07.034",
  url = "http://www.sciencedirect.com/science/article/pii/S0031320318302590",
  author = "Ankan Kumar Bhunia and Aishik Konwer and Ayan Kumar Bhunia and Abir Bhowmick and Partha P. Roy and Umapada Pal",
  keywords = "Script Identification, Convolutional Neural Network, Long Short-Term Memory, Local feature, Global feature, Attention    Network, Dynamic Weighting"
  }

Authors

About

Script identification in natural scene image and video frames using an attention based Convolutional-LSTM network (Pattern Recognition, 2019)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published