Skip to content

Commit

Permalink
Include ALTO in list of supported output formats
Browse files Browse the repository at this point in the history
  • Loading branch information
jakesebright authored and stweil committed Dec 15, 2018
1 parent 1f5fb15 commit e398601
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Expand Up @@ -24,7 +24,7 @@ and GitHub's log of [contributors](https://github.com/tesseract-ocr/tesseract/gr

Tesseract has **unicode (UTF-8) support**, and can **recognize more than 100 languages** "out of the box".

Tesseract supports **various output formats**: plain-text, hocr(html), pdf, tsv, invisible-text-only pdf.
Tesseract supports **various output formats**: plain text, hOCR (HTML), PDF, invisible-text-only PDF, TSV. The master branch also has experimental support for ALTO (XML) output.

You should note that in many cases, in order to get better OCR results, you'll need to **[improve the quality](https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality) of the image** you are giving Tesseract.

Expand Down
1 change: 1 addition & 0 deletions doc/tesseract.1.asc
Expand Up @@ -90,6 +90,7 @@ OPTIONS
contains a list of variables and their values, one per line, with a
space separating variable from value. Interesting config files
include: +
* `alto` - Output in ALTO format (file extension `.xml`).
* `hocr` - Output in hOCR format (file extension `.hocr`).
* `pdf` - Output PDF (file extension `.pdf`).
* `tsv` - Output TSV (file extension `.tsv`).
Expand Down

0 comments on commit e398601

Please sign in to comment.