UB-Mannheim/tesseract forked from tesseract-ocr/tesseract
Home
Tesseract at UB Mannheim
The Mannheim University Library (UB Mannheim) uses Tesseract to perform OCR of historical German newspapers (Allgemeine Preußische Staatszeitung, Deutscher Reichsanzeiger). The latest results with OCR from more than 360,000 scans are available online.
Normally we run Tesseract on Debian GNU Linux, but there was also the need for a Windows version. That's why we have built a Tesseract installer for Windows.
The latest installers can be downloaded here: tesseract-ocr-setup-3.05.01.exe and tesseract-ocr-setup-4.0.0-alpha.20170804.exe (experimental). There are also older versions available.
In addition, we also provide documentation which was generated by Doxygen.
Hint: Old versions of the installer had an option to add Tesseract to the PATH environment variable. That option was disabled by default. If it was enabled and PATH was very long, it could happen that the new PATH was empty. We suggest not to use that option and disabled it in our latest version.
History:
- 2017-08-04 Update Tesseract 4. Now supports best traineddata.
- 2017-06-02 Update Tesseract 3.05.01.
- 2017-05-10 Update Tesseract 3.05.00 (+ later fixes). Removed buggy setting of PATH.
- 2017-05-10 Update Tesseract 4. Now includes AVX support.
- 2017-02-16 Update Tesseract 4. Fixed not working AVX support.
- 2017-02-02 Update Tesseract 4. Removed not working AVX support.
- 2017-01-30 Update Tesseract 4, added new training tools. AVX support not working.
- 2016-11-29 First version with LSTM (still experimental).
- 2016-11-11 Update with latest bug fixes.
- 2016-08-31 Update with latest bug fixes for text2image.
- 2016-08-28 Update with latest bug fixes.
- 2016-07-11 TIFF warnings are now shown on the console (no longer disturbing message windows).
- 2016-05-13 The new installer now includes the executables needed for training, too. It is based on the latest Tesseract sources.