Home

Stefan Weil edited this page Jun 21, 2018 · 34 revisions
Clone this wiki locally

Tesseract at UB Mannheim

The Mannheim University Library (UB Mannheim) uses Tesseract to perform OCR of historical German newspapers (Allgemeine Preußische Staatszeitung, Deutscher Reichsanzeiger). The latest results with OCR from more than 360,000 scans are available online.

Normally we run Tesseract on Debian GNU Linux, but there was also the need for a Windows version. That's why we have built a Tesseract installer for Windows.

The latest installers can be downloaded here: tesseract-ocr-setup-3.05.02-20180621.exe, tesseract-ocr-w32-setup-v4.0.0-beta.1.20180608.exe and tesseract-ocr-w64-setup-v4.0.0-beta.1.20180608.exe (new, 64 bit, experimental). There are also older versions available.

In addition, we also provide documentation which was generated by Doxygen.

History:

  • 2018-06-21 Update Tesseract 3.05.02. Also updates the DLL files.
  • 2018-06-08 Update Tesseract 4.0.0. Fix ICU DLL files for 64 bit installer.
  • 2018-04-14 Update Tesseract 4.0.0. Also updates some DLL files. Now also with 64 bit installer.
  • 2018-01-09 Update Tesseract 4. Also updates some DLL files.
  • 2017-08-04 Update Tesseract 4. Now supports best traineddata.
  • 2017-06-02 Update Tesseract 3.05.01.
  • 2017-05-10 Update Tesseract 3.05.00 (+ later fixes). Removed buggy setting of PATH.
  • 2017-05-10 Update Tesseract 4. Now includes AVX support.
  • 2017-02-16 Update Tesseract 4. Fixed not working AVX support.
  • 2017-02-02 Update Tesseract 4. Removed not working AVX support.
  • 2017-01-30 Update Tesseract 4, added new training tools. AVX support not working.
  • 2016-11-29 First version with LSTM (still experimental).
  • 2016-11-11 Update with latest bug fixes.
  • 2016-08-31 Update with latest bug fixes for text2image.
  • 2016-08-28 Update with latest bug fixes.
  • 2016-07-11 TIFF warnings are now shown on the console (no longer disturbing message windows).
  • 2016-05-13 The new installer now includes the executables needed for training, too. It is based on the latest Tesseract sources.

Hint: Old versions of the installer had an option to add Tesseract to the PATH environment variable. That option was disabled by default. If it was enabled and PATH was very long, it could happen that the new PATH was empty. We suggest not to use that option and disabled it in our latest version.