Given an image of a text witten in a language, can computer tools of computer vision be used to identify the script of the language? We have explored this question by using feature extraction and creating classification using bag of visual-words model followed by its classification.
This project is implemented in MATLAB.
The entry point of the project is manager.m file.
The dataset is stored in the finalDataset folder. The file createDataset > generateMatFile.m generates .mat files in dataMatlabFormat folder. This file generates the entire dataset. The filterDataset.m file trims the dataset. You can select the subset of languages in the manager.m (change number of languages and language names). In this way you can modify the languages the model will be trained on.
This project extracts SIFT features from the files. The code for this is in sift folder.
Bag of words model is created in the clusterFeatures > getClusterFeatures.m file.
For this project, I use two classifiers. Linear Classifier and Random forest classifier.