Script Identification in Natural Scene Image and Video Frame using Attention based Convolutional-LSTM Network [PDF][Arxiv]
https://www.sciencedirect.com/science/article/pii/S0031320318302590
Published in Pattern Recognition 2019
This repository contains the codes and instructions to use the trained models for all the four datasets described in the paper.
Script identification plays a significant role in analysing documents and videos. In this paper, we focus on the problem of script identification in scene text images and video scripts. Because of low image quality, complex background and similar layout of characters shared by some scripts like Greek, Latin, etc., text recognition in those cases become challenging.
- python 2.7
- Tensorflow
- OpenCV
- matplotlib
- jupyter notebook
In this paper, we propose a novel method that involves extraction of local and global features using CNN-LSTM framework and weighting them dynamically for script identification. First, we convert the images into patches and feed them into a CNN-LSTM framework. Attention-based patch weights are calculated applying softmax layer after LSTM. Then we do patch-wise multiplication of these weights with corresponding CNN to yield local features. Global features are also extracted from last cell state of LSTM. We employ a fusion technique which dynamically weights the local and global features for an individual patch
You can also interactively visualize the tensorboard graph by running the following:
tensorboard --logdir='logfiles'
Tensorboard scaler plot of loss function and accuracy over number of epochs on CVSI-15 dataset
We can visualize the relative importance of the patches from the attention maps. High attention is shown in white and low attention is shown in black.
We evaluated our model on four datasets : CVSI-2015, SIW-13, ICDAR-2017, MLe2e. To faciliate further research, we are making available the trained models on all these four datasets.
Dataset | Trained Model | Accuracy |
---|---|---|
CVSI-15 | Download | 97.75% |
SIW-13 | Download | 96.5% |
ICDAR-17 | Download | 90.23% |
MLe2e | Download | 96.7% |
Clone or download this repository. Download the pre-trained models and unzip the models in the Models/<dataset_name>
folder. Now, open jupyter notebook
, run the main.ipynb file and execute the code.
If you find this code useful in your research, please consider citing:
@article{BHUNIA2018,
title = "Script Identification in Natural Scene Image and Video Frames using an Attention based Convolutional-LSTM Network",
journal = "Pattern Recognition",
year = "2018",
issn = "0031-3203",
doi = "https://doi.org/10.1016/j.patcog.2018.07.034",
url = "http://www.sciencedirect.com/science/article/pii/S0031320318302590",
author = "Ankan Kumar Bhunia and Aishik Konwer and Ayan Kumar Bhunia and Abir Bhowmick and Partha P. Roy and Umapada Pal",
keywords = "Script Identification, Convolutional Neural Network, Long Short-Term Memory, Local feature, Global feature, Attention Network, Dynamic Weighting"
}
-
*both the authors have equal contributions