Skip to content

Mayitzin/Taan

Repository files navigation

T'aan

This project intends to easily recognize characters in old writings and automatically translate them to any other language.

Usage

python img2txt.py <input_file> <output_file>

Requirements

  • Tesseract by Google is used by T'aan to perform the OCR. Download it or clone it from its repository in GitHub. Follow the installing instructions for your corresponding OS. The latest version already includes models to recognize the most common writing systems and languages.

  • The module pdf2image, a wrapper of poppler, is required to transform a given PDF into images and then use Tesseract.

References

Online

Books and Articles


What is T'aan?

T'aan means language in Mayan. In pre-hispanic Mesoamerica existed dozens of languages and hundreds of dialects, which difficulted the fast integration of the several nations. In order to ease their communication, specialized translators and interpreters were established. These interpreters could read the several scripts and codices and, thus, connect their communities.

The Mayans of Yucatán used the word T'aan to name language, conversation, to read aloud, word or voice. In short, everything that had to do with communication in any language belongs to the space of T'aan.

About

OCR and translation for old documents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published