Skip to content

bhuiyanmobasshir94/Bengali.AI-Handwritten-Grapheme-Classification

Repository files navigation

pre-commit

Bengali is the 5th most spoken language in the world with hundreds of million of speakers. It’s the official language of Bangladesh and the second most spoken language in India. Considering its reach, there’s significant business and educational interest in developing AI that can optically recognize images of the language handwritten. This challenge hopes to improve on approaches to Bengali recognition.

Optical character recognition is particularly challenging for Bengali. While Bengali has 49 letters (to be more specific 11 vowels and 38 consonants) in its alphabet, there are also 18 potential diacritics, or accents. This means that there are many more graphemes, or the smallest units in a written language. The added complexity results in ~13,000 different grapheme variations (compared to English’s 250 graphemic units).

Bangladesh-based non-profit Bengali.AI is focused on helping to solve this problem. They build and release crowdsourced, metadata-rich datasets and open source them through research competitions. Through this work, Bengali.AI hopes to democratize and accelerate research in Bengali language technologies and to promote machine learning education.

For this competition, you’re given the image of a handwritten Bengali grapheme and are challenged to separately classify three constituent elements in the image: grapheme root, vowel diacritics, and consonant diacritics.

By participating in the competition, you’ll hopefully accelerate Bengali handwritten optical character recognition research and help enable the digitalization of educational resources. Moreover, the methods introduced in the competition will also empower cousin languages in the Indian subcontinent.

Challenge and dataset summary available at https://arxiv.org/abs/2010.00170

Resources

  1. iterative-stratification
  2. pretrained-models.pytorch

Issues

  1. Regarding parquet
  2. Solve about parquet
  3. Needed to install snappy to work with fastparquet engine-
conda install python-snappy

To install Torch

conda install pytorch torchvision cudatoolkit=10.2 -c pytorch

hacks

  • Run python standalone file from a module
python -m module.file_name

About

Classify the components of handwritten Bengali

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published