This dataset has been used in these papers:
Visual Re-ranking with Natural Language Understanding for Text Spotting
paper code
Semantic Relatedness Based Re-ranker for Text Spotting
paper code
Text Spotting in the Wild is a Computer Vision (CV) task consisting of detecting and recognizing text appearing in images (e.g. signboards, traffic signals or brands in clothing or objects). This is an unsolved problem due to the complexity of the context where texts appear (uneven backgrounds, shading, occlusions, perspective distortions, etc.). Only a few CV approaches try to exploit the relation between text and its surrounding environment to better recognize text in the scene. In this work, we propose a visual context dataset for Text Spotting in the wild, where the publicly available dataset COCO-text Veit (veit et al., 2016) has been extended with information about the scene (such as objects and places appearing in the image) to enable researchers to include on semantic relations between texts and scene in their Text Spotting systems, and to offer a common framework for such approaches.
This dataset is based on COCO-text, Please visit https://github.com/andreasveit/coco-text. COCO-text is based on Microsoft COCO Please visit http://mscoco.org/ for more information on COCO-dataset, including the image data, object annotatins and caption annotations.
- COCO-text offical API
python 2,7
- Run
coco_api_modified.py
Extract full image with its gt (json files)
- Matlab 2018 - you only need to run it once
- MatConvNet open source deep learning freamework
- Download most recent Pre-trained SOTA object classifer or Resnet152 (this code)
- Run
Extract_BBox.m
file 1 bounding box file 2 full image
- word level
- sentence level
- Image_id, spotted word(gt), objects, places
- Example:
COCO_train2014_000000000081.jpg,airfracne, airliner, airfield
- Learning the sim/distance between two objects/places can be useful to filter out duplicated cases and false posstive example. Load visual-pairs and visual-pairs models and run this precomputed model
M = containers.Map(pairsobject,cosine_sim)
M('airfield airliner'), M('crosswalk plaza'), etc
- For you own dataset, run
sim.py
with glove (840 billion tokens) better similiary score orsim-fastText
with fastText (600B billion tokens) with capability on handling out-of-vocabulary (OOV). - Also, is possible to visualize word vectors using the notebook
visualization-embedding
*You can find the model Places365-CNNs
- word level
- sentence level
- This dataset from ICDAR2017 Robust Reading Challenge on COCO-Text, Task 3 End-to-End Recognition
- Image_id, spotted word(baseline), objects1,object2,places
- Example:
COCO_train2014_000000273358.jpg,barber,street,ticket_booth, barbershop
- word level
- sentence level
- Image_id, spotted word(gt/baseline), caption
- Example:
COCO_train2014_000000000081.jpg, airfracne,a large jetliner flying through the sky with a sky background ,airliner, airfield)
- word level
- sentence level
- This dataset from ICDAR2017 Robust Reading Challenge on COCO-Text, Task 3 End-to-End Recognition
- Image_id, spotted word(baseline),object_1, object_2, place, caption
-
word level
-
sentence level
-
spotted word(w), (c) places/object- co-occurrence information between text and objects
-
The conditional probability of object/text happen togaher in COCO-text as shown in the example above the sports channel (kt) with a racket
object-text-co-occurrence-(P(w|c).csv
-
run
counting_pairs.py
to count the pairs (spotted text, object/place) happen together -
To get P(w|c) of pairs happen together in COCO-text load
M = containers.Map(Pairs,Pairs_prob)
then run the pairsM(' kt racket'), M(' pay parking')..etc
- Matlab 2018
- Load the Pre-trained Dictionary A) opensubtitle or B) enhanced version with google n-gram
runMap = containers.Map(T3w, T3N) %A)Dic
runMap = containers.Map(opensub_google_ngram_W, opensub_google_ngram_N) %B)Dic
word = runMap('barcelona') % get word
🙋♂️ Suggestions and opinions of this dataset (both positive and negative) are greatly welcome 🙇♂️. Please contact the author by sending an email to asabir◎cs。upc。edu
Please use the following bibtex entry:
@inproceedings{sabir2020textual,
title={Textual visual semantic dataset for text spotting},
author={Sabir, Ahmed and Moreno-Noguer, Francesc and Padr{\'o}, Llu{\'\i}s},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
pages={542--543},
year={2020}
}