Skip to content

Files

Latest commit

92c7bf5 · May 8, 2019

History

History

image2bichig

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
May 6, 2019
May 8, 2019
May 8, 2019
May 8, 2019
May 6, 2019
May 6, 2019
May 6, 2019
May 8, 2019
May 6, 2019
May 8, 2019

Mongolian Script OCR

Synthetic Dataset

For generating a synthetic data set from Mongolian song lyrics and dictionary, first install all fonts from fonts. After that, execute the following commands:

mkdir images
./generate_from_dictionary.py > synthetic.csv
./generate_from_lyrics.py >> synthetic.csv

You can also download an already generated synthetic data set from here.

Train

To be released.

Eval

Download a pre trained model from here. To make OCR on an image, execute:

python ocr.py --checkpoint image2bichig-epoch-0157.pth test.jpg
ᠮᠢᠨᠦ ᠨᠤᠲᠠᠭ
ᠬᠡᠨᠲᠡᠢ ᠂ ᠬᠠᠩᠭᠠᠢ᠂ ᠱᠣᠶᠣᠨ ᠤ ᠥᠨᠳᠥᠷ ᠰᠠᠶ᠋ᠢᠬᠠᠨ ᠨᠢᠷᠤᠭᠤᠨ ᠤᠳᠨ
ᠬᠣᠶᠢᠲᠤ ᠵᠦᠭ ᠦᠨ ᠴᠢᠮᠡᠭ ᠪᠣᠯᠤᠭᠰᠠᠨ ᠣᠢ ᠬᠥᠪᠴᠢ ᠶᠢᠨ ᠠᠭᠤᠯᠠᠨ ᠤᠳ
ᠮᠠᠨᠠᠨ  ᠮᠠᠷᠭ᠎ᠠ᠂ ᠨᠣᠮᠢᠨ ᠤ ᠥᠷᠭᠡᠨ ᠶᠡᠬᠡ ᠭᠣᠪᠢ ᠤᠳᠨ
ᠡᠮᠦᠨ᠎ᠡ ᠵᠦᠭ ᠦᠨ ᠮᠠᠩᠯᠠᠢ ᠪᠣᠯᠤᠭᠰᠠᠨ ᠡᠯᠡᠯᠡᠳ ᠮᠠᠩᠬᠠᠨ ᠳᠠᠯᠠᠢ ᠤᠳ
 ᠡᠨᠡ ᠪᠣᠯ ᠮᠢᠨᠦ ᠲᠦᠷᠦᠭᠰᠡᠨ ᠨᠤᠲᠤᠭ ᠮᠣᠩᠭᠣᠯ ᠤᠨ ᠰᠠᠶ᠋ᠢᠬᠠᠨ ᠣᠷᠣᠨ

You can try it also online on Colab here.