Language Classifier

Install the library to read unicode

$ pip3 install chardet

$ python3 build_stat.py <LANGUAGE DATA> <TARGET DATA>

Repeat the above process for both the languages SINHALA and TAMIL.
Then repeat the process for the TEST SAMPLE whose language needs to be detected.

$ python3 detector.py <TARGET STATS> <SINHALA STATS> <TAMIL STATS>

Language detected: SINHALA

MAE: 1.0206944116 < 1.70093005005

MAE: Mean Absolute Error

Stat analysis process

Install the matplotlib to read unicode

$ pip3 install matplotlib

$ cd stat_analysis

$ python3 plotter.py

Provide plots for variation of the combinations for each language
Identify the critical combinations to be tested that demontrate significant difference in distribution
Define tolerance values in the detector.py file

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
stat_analysis		stat_analysis
.gitignore		.gitignore
README.md		README.md
build_stat.py		build_stat.py
detector.py		detector.py
language_data_sinhala.txt		language_data_sinhala.txt
language_data_tamil.txt		language_data_tamil.txt
out_sinhala.txt		out_sinhala.txt
out_tamil.txt		out_tamil.txt
out_test_stats.txt		out_test_stats.txt
sinhala_unicode.txt		sinhala_unicode.txt
tamil_unicode.txt		tamil_unicode.txt
test_language_data.txt		test_language_data.txt
uni_detect.py		uni_detect.py