From 40f43111e05b3dd2f2f8aeae3aba33016523c881 Mon Sep 17 00:00:00 2001 From: Shreeshrii Date: Sat, 24 Feb 2018 14:07:25 +0530 Subject: [PATCH] Add list of scripts to manpage for tesseract (#1347) --- doc/tesseract.1.asc | 49 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/doc/tesseract.1.asc b/doc/tesseract.1.asc index 053fca8da7..029a06050e 100644 --- a/doc/tesseract.1.asc +++ b/doc/tesseract.1.asc @@ -244,6 +244,55 @@ To use a non-standard language pack named *foo.traineddata*, set the *TESSDATA_PREFIX*/tessdata/*foo*.traineddata and give Tesseract the argument '-l foo'. +SCRIPTS +------- + +The traineddata files for the following scripts for tesseract 4.00 +are also in https://github.com/tesseract-ocr/tessdata_fast. + +In most cases, each of these contains all the languages that use that script PLUS English. +So it is possible to recognize a language that has not been specifically trained for +by using traineddata for the script it is written in. + +Arabic, +Armenian, +Bengali, +Canadian Aboriginal, +Cherokee, +Cyrillic, +Devanagari, +Ethiopic, +Fraktur, +Georgian, +Greek, +Gujarati, +Gurmukhi, +Han - Simplified, +Han - Simplified (vertical), +Han - Traditional, +Han - Traditional (vertical), +Hangul, +Hangul (vertical), +Hebrew, +Japanese, +Japanese (vertical), +Kannada, +Khmer, +Lao, +Latin, +Malayalam, +Myanmar, +Oriya (Odia), +Sinhala, +Syriac, +Tamil, +Telugu, +Thaana, +Thai, +Tibetan, +Vietnamese. + + CONFIG FILES AND AUGMENTING WITH USER DATA ------------------------------------------