GitHub - Naosekpam/EMBiL-English-Manipuri-Benchmark-for-scene-text-detection-and-language-identification: A curated bi-lingual scene text detection and language identification benchmark dataset named EMBiL, comprising English and Manipuri texts embedded in the scene images. The paper is accepted at CAIP ' 23.

EMBiL-Dataset

A curated bi-lingual scene text detection and language identification benchmark dataset, EMBiL, comprises English and Manipuri (Meitei Mayek / Meetei Mayek) texts embedded in the scene images. The paper is accepted at CAIP ' 23, to be held at Limassol, Cyprus, in September.

The Manipuri language (called "Meetei Mayek") is one of India's scheduled recognized languages. Statistically, this language is used by only 0.15% (3.6 million out of 1.4 billion) of the country's (India) total demography.

The dataset includes various naturally occurring visual noises and distortions collected from diverse scenarios, such as local markets, billboards, navigation and traffic signs, graffiti, shop banners, etc. Owing to language, culture, and history differences, scene text images in Manipur have distinctive features that combine English and Meetei Mayek languages.

We describe the diversity of EMBiL in three levels: : (1) Image-level diversity; (2) Scene-level diversity, and 3) Text instance-level diversity.

EMBiL contains bi-lingual text images with a total of 720 images. It is divided into a 70% train set, 20% validation, and 10% test set.

Mail at veronica.naosekpam@iiitg.ac.in for the complete dataset.

Baseline architecture :

Please cite the following papers if code or part of the code is used :

@inproceedings{naosekpam2023embil, 
  title={EMBiL: An English-Manipuri Bi-lingual Benchmark for Scene Text Detection and Language Identification}, 
  author={Naosekpam, Veronica and Islam, Mushtaq and Chourasia, Amul and Sahu, Nilkanta}, 
  booktitle={International Conference on Computer Analysis of Images and Patterns},
  pages={65--75},
  year={2023}, 
  organization={Springer} 
}

Naosekpam, Veronica, and Nilkanta Sahu. "Multi-label Indian scene text language identification." Intelligent Systems and Applications in Computer Vision (2023).

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
Train		Train
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train

Train

README.md

README.md

Repository files navigation

EMBiL-Dataset

Please cite the following papers if code or part of the code is used :

About

Releases

Packages

Naosekpam/EMBiL-English-Manipuri-Benchmark-for-scene-text-detection-and-language-identification

Folders and files

Latest commit

History

Train

Train

README.md

README.md

Repository files navigation

EMBiL-Dataset

Please cite the following papers if code or part of the code is used :

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages