Skip to content
Cyrillic-oriented MNIST. A dataset of Latin and Cyrillic letter images for text recognition.
Branch: master
Clone or download
Latest commit aacf4b1 Mar 1, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
api added API code Apr 10, 2017
images Additional letters Mar 1, 2018
misc New contributors Mar 1, 2018
README.md new contributors Mar 1, 2018

README.md

Cyrillic-oriented MNIST

CoMNIST services

A repository of images of hand-written Cyrillic and Latin alphabet letters for machine learning applications.

The repository currently consists of 28,000+ 278x278 png images representing all 33 letters of the Russian alphabet and the 26 letters of the English alphabet. These images have been hand-written on touch screen through crowd-sourcing.

The dataset will be regularly extended with more data as the collection progresses

An API that reads words in images

CoMNIST also makes available a web service that reads drawing and identifies the word/letter you have drawn. On top of an image you can submit an expected word and get back the original image with mismtaches highlighted (for educational purposes)

The API is available at this address: http://35.187.34.5:5002/api/word It is accessible via a POST request with following input expected:

{
    'img': Mandatory b64 encoded image, with letters in black on a white background
    'word': Optional string, the expected word to be read
    'lang': Mandatory string, either 'en' or 'ru', respectively for Latin or Cyrillic (russian) alphabets
    'nb_output': Mandatory integer, the "tolerance" of the engine
}

The return information is the following:

{
    'img': b64 encoded image, if a word was supplied as an input, then modified version of that image highlighting mismatches
    'word': string, the word that was read by the API
}

Participate

The objective is to gather at least 1000 images of each class, therefore your contribution is more that welcome! One minute of your time is enough, and don't hesitate to ask your friends and family to participate as well.

English version - Draw Latin only + common to cyrillic and latin

French version - Draw Latin only + common to cyrillic and latin

Russian version - Draw Cyrillic only

Find out more about CoMNIST on my blog

Credits and license

A big thanks to all the contributors!

These images have been crowd-sourced thanks to the great web-design by Anna Migushina available on her github.

CoMNIST logo by Sophie Valenina

Creative Commons License
CoMNIST by Gregory Vial is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

You can’t perform that action at this time.