Sinhala Optical Character Recognition (OCR) using Neural Network

Main Library:

Tensorflow 1.14
OpenCV 4.5.2

Dataset

All images in the Dataset are extraxted from Sinhala Font

BhashitaComplex
Iskoola Potha
LK-LUG
Nirmala UI
Noto Sans Sinhala
Noto Sans Sinhala Bold

classes = ['ක', 'ඛ', 'ග', 'ඝ', 'ඟ', 'ච', 'ඡ', 'ජ', 'ට', 'ඩ', 'න', 'ණ', 'ත', 'ථ', 'ද',
           'ධ', 'ප', 'ඵ', 'බ', 'භ', 'ම', 'ඹ', 'ය', 'ර', 'ල', 'ව', 'ශ', 'ෂ', 'ස', 'හ', 'ළ', 'ෆ']

There are 32 classes (letters). For the Training dataset, each class has 6 examples. So Total number of examples in the training dataset is 6x32 = 192. For the Testing dataset, each class has 1 examples. So Total number of examples in the testing dataset is 1x32 = 32.

A data example has 70 rows and 70 columns, then it is resized into 28x28.

Training

Tensorflow default ANN architecture. <Tensorflow v1> : https://github.com/tensorflow/docs/tree/master/site/en/r1/tutorials

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dense (Dense)                (None, 512)               401920    
_________________________________________________________________
dropout (Dropout)            (None, 512)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 32)                16416     
=================================================================
Total params: 418,336
Trainable params: 418,336
Non-trainable params: 0
_________________________________________________________________

| Accuracy : 96.88% |
| Loss : 0.9942 |

Results

Limitation: Each letter should have enough margin. In the Contour detection, character height and width should between 32 - 320 pixels. Further: Increase the dataset with more characters

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dataset		dataset
model		model
README.md		README.md
image.jpg		image.jpg
iskpota.ttf		iskpota.ttf
iskpotab.ttf		iskpotab.ttf
map.txt		map.txt
result.png		result.png
sinhala_OCR.ipynb		sinhala_OCR.ipynb
test.txt		test.txt
train.txt		train.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sinhala Optical Character Recognition (OCR) using Neural Network

Main Library:

Dataset

Training

Results

About

Releases

Packages

Languages

chamara96/sinhala-letters-OCR

Folders and files

Latest commit

History

Repository files navigation

Sinhala Optical Character Recognition (OCR) using Neural Network

Main Library:

Dataset

Training

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages