Skip to content

Commit

Permalink
Update tesseract man page
Browse files Browse the repository at this point in the history
- move Tesseract 4 release note to other release notes
- format command line options in text
- add link to release notes (wiki)
- add link to contributors (GitHub)

Signed-off-by: Stefan Weil <sw@weilnetz.de>
  • Loading branch information
stweil committed Oct 4, 2018
1 parent a86292b commit 3e9b0ac
Showing 1 changed file with 15 additions and 10 deletions.
25 changes: 15 additions & 10 deletions doc/tesseract.1.asc
Expand Up @@ -17,12 +17,6 @@ between 1985 and 1995. In 1995, this engine was among the top 3 evaluated by
UNLV. It was open-sourced by HP and UNLV in 2005, and has been developed
at Google since then.

Tesseract 4 adds a new neural net (LSTM) based OCR engine which is focused
on line recognition, but also still supports the legacy Tesseract OCR engine of
Tesseract 3 which works by recognizing character patterns. Compatibility with
Tesseract 3 is enabled by --oem 0. It also needs traineddata files which support
the legacy engine, for example those from the tessdata repository.


IN/OUT ARGUMENTS
----------------
Expand Down Expand Up @@ -97,7 +91,7 @@ OPTIONS
* hocr - Output in hOCR format instead of as a text file.
* pdf - Output in pdf instead of a text file.

*Nota Bene:* The options '-l lang' and '--psm N' must occur
*Nota Bene:* The options `-l lang` and `--psm N` must occur
before any 'configfile'.


Expand All @@ -116,7 +110,7 @@ SINGLE OPTIONS
Returns the current version of the tesseract(1) executable.
'--list-langs'::
List available languages for tesseract engine. Can be used with --tessdata-dir.
List available languages for tesseract engine. Can be used with `--tessdata-dir`.
'--print-parameters'::
Print tesseract parameters.
Expand Down Expand Up @@ -251,7 +245,7 @@ for the following languages are in
To use a non-standard language pack named *foo.traineddata*, set the
*TESSDATA_PREFIX* environment variable so the file can be found at
*TESSDATA_PREFIX*/tessdata/*foo*.traineddata and give Tesseract the
argument '-l foo'.
argument `-l foo`.

SCRIPTS
-------
Expand Down Expand Up @@ -377,7 +371,15 @@ language data.
Tesseract 3.02 adds BiDirectional text support, the ability to recognize
multiple languages in a single image, and improved layout analysis.
For further details, see the file ReleaseNotes included with the distribution.
Tesseract 4 adds a new neural net (LSTM) based OCR engine which is focused
on line recognition, but also still supports the legacy Tesseract OCR engine of
Tesseract 3 which works by recognizing character patterns. Compatibility with
Tesseract 3 is enabled by `--oem 0`. It also needs traineddata files which
support the legacy engine, for example those from the tessdata repository.
For further details, see the file ReleaseNotes in the Tesseract wiki
(<https://github.com/tesseract-ocr/tesseract/wiki/ReleaseNotes>).
RESOURCES
---------
Expand All @@ -402,6 +404,9 @@ Pingping Xiu, Pong Eksombatchai (Chantat), Ranjith Unnikrishnan, Raquel
Romano, Ray Smith, Rika Antonova, Robert Moss, Samuel Charron, Sheelagh
Lloyd, Shobhit Saxena, and Thomas Kielbus.

For a list of contributors see
<https://github.com/tesseract-ocr/tesseract/blob/master/AUTHORS>.

COPYING
-------
Licensed under the Apache License, Version 2.0

0 comments on commit 3e9b0ac

Please sign in to comment.