Skip to content

Visual Re-ranking with Natural Language Understanding for Text Spotting. ACCV18

Notifications You must be signed in to change notification settings

ahmedssabir/Visual-Semantic-Relatedness-with-Word-Embedding

Repository files navigation

Visual Semantic Relatedness with Word Embedding (SWE)

Improved implementation of the paper Visual Re-ranking with Natural Language Understanding for Text Spotting. Sabir et al. ACCV 2018.

image

Introduction

Many scene text recognition approaches are based on purely visual information and ignore the semantic relation between scene and text. In this paper, we tackle this problem from natural language processing perspective to fill the gap between language and vision. We propose a post-processing approach to improve scene text recognition accuracy by using occurrence probabilities of words (unigram language model), and the semantic correlation between scene and text. For this, we initially rely on an off-the-shelf deep neural network, already trained with large amount of data, which provides a series of text hypotheses per inputimage. These hypotheses are then re-ranked using word frequencies and semantic relatedness with objects or scenes in the image. As a result of this combination, the performance of the original network is boosted with almost no additional cost. We validate our approach on ICDAR’17dataset.

Model

Count-based word embedding visual re-ranker

Requirement

conda create -n Visual_w2v python=3.8 anaconda
conda activate Visual_w2v
pip install gensim==4.1.0

Data

Install GloVe pre-trained word vectors glove.6B.300d.txt bigger is better, the 840B pre-trained word vectors is recommneded. We use Glove as main in this work. The advantage of Glove over Word2Vec is that it does not rely on local word-context information, but it incorporates global co-occurrence statistics.

For w2v install GoogleNews-vectors-negative300.bin

For fastext install crawl-300d-2M.vec

Quick Start

Familiarize yourself with the model architecture by running it in Colab

How to run

To be able to use w2v/Glove as visual re-ranker, we need the following information

  • The spotted text text_spotted.txt: word candidates from the baseline
  • The original hypothesis score from the baseline baseline.txt softmax output
  • The hypothesis LM.txt: initialized by common observation (ie LM)
  • Visual information from the image visual-context_label.txt: initialized visual context or classifer confident
  • Visual information confidence visual-context_prob.txt from the classifier -ie RseNet152

After having all the required information run as shown in Example 1 (below)

For GloVe

quarters-example/python glove-visual.py --ulm LM.txt --bl baseline.txt --text spotted-text.txt --vis visual-context_label.txt --vis_prob visual-context_prob.txt

For w2v

quarters-example/python w2v-visual.py --ulm LM.txt --bl baseline.txt --text spotted-text.txt --vis visual-context_label.txt --vis_prob visual-context_prob.txt

For fasttext

quarters-example/python fastext-visual.py --ulm LM.txt --bl baseline.txt --text spotted-text.txt --vis visual-context_label.txt --vis_prob visual-context_prob.txt

Example 1

full image -->

Orignial baseline softmax score

quartos  0.060192
quotas   0.040944	
quarters 0.03037

After visual re-ranking visual_glove_result.txt

quarters 7.040899415659617e-06
quotas   4.0903987856408736e-07
quartos  2.0644119047556385e-09

Example 2

full image -->

Orignial baseline softmax score

stook 0.4865732956	
sioux 0.0919743552	
stock 0.0703927792

After visual re-ranking visual_glove_result.txt

stock 0.00018136249963338343
sioux 7.23838175424e-06
stook 8.07711670696e-07

Citation

Please use the following bibtex entry:

@inproceedings{sabir2018visual,
  title={Visual re-ranking with natural language understanding for text spotting},
  author={Sabir, Ahmed and Moreno-Noguer, Francesc and Padr{\'o}, Llu{\'\i}s},
  booktitle={Asian Conference on Computer Vision},
  pages={68--82},
  year={2018},
  organization={Springer}
}

About

Visual Re-ranking with Natural Language Understanding for Text Spotting. ACCV18

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published