From 035325dfd0e72607472e11f374d73aa25fe68536 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Zdenko=20Podobn=C3=BD?= Date: Fri, 23 Feb 2018 11:19:18 +0100 Subject: [PATCH] Update language list based on tessdata_fast; fix #1343 --- doc/tesseract.1.asc | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/doc/tesseract.1.asc b/doc/tesseract.1.asc index 6832ea0c79..489584c555 100644 --- a/doc/tesseract.1.asc +++ b/doc/tesseract.1.asc @@ -115,8 +115,9 @@ SINGLE OPTIONS LANGUAGES --------- -There are currently language packs available for the following languages -(in https://github.com/tesseract-ocr/tessdata): +The currently available traineddata files for tesseract 4.00 +for the following languages are in +(in https://github.com/tesseract-ocr/tessdata_fast): *afr* (Afrikaans) *amh* (Amharic) @@ -176,26 +177,33 @@ There are currently language packs available for the following languages *khm* (Central Khmer) *kir* (Kirghiz; Kyrgyz) *kor* (Korean) +*kor_vert* (Korean (vertical)) *kur* (Kurdish) +*kur_ara* (Kurdish (Arabic)) *lao* (Lao) *lat* (Latin) *lav* (Latvian) *lit* (Lithuanian) +*ltz* (Luxembourgish) *mal* (Malayalam) *mar* (Marathi) *mkd* (Macedonian) *mlt* (Maltese) +*mon* (Mongolian) +*mri* (Maori) *msa* (Malay) *mya* (Burmese) *nep* (Nepali) *nld* (Dutch; Flemish) *nor* (Norwegian) +*oci* (Occitan (post 1500)) *ori* (Oriya) *osd* (Orientation and script detection module) *pan* (Panjabi; Punjabi) *pol* (Polish) *por* (Portuguese) *pus* (Pushto; Pashto) +*que* (Quechua) *ron* (Romanian; Moldavian; Moldovan) *rus* (Russian) *san* (Sanskrit) @@ -203,20 +211,24 @@ There are currently language packs available for the following languages *slk* (Slovak) *slk_frak* (Slovak - Fraktur) *slv* (Slovenian) +*snd* (Sindhi) *spa* (Spanish; Castilian) *spa_old* (Spanish; Castilian - Old) *sqi* (Albanian) *srp* (Serbian) *srp_latn* (Serbian - Latin) +*sun* (Sundanese) *swa* (Swahili) *swe* (Swedish) *syr* (Syriac) *tam* (Tamil) +*tat* (Tatar) *tel* (Telugu) *tgk* (Tajik) *tgl* (Tagalog) *tha* (Thai) *tir* (Tigrinya) +*ton* (Tonga) *tur* (Turkish) *uig* (Uighur; Uyghur) *ukr* (Ukrainian) @@ -225,6 +237,7 @@ There are currently language packs available for the following languages *uzb_cyrl* (Uzbek - Cyrilic) *vie* (Vietnamese) *yid* (Yiddish) +*yor* (Yoruba) To use a non-standard language pack named *foo.traineddata*, set the *TESSDATA_PREFIX* environment variable so the file can be found at