Skip to content
Stefan Weil edited this page Mar 28, 2020 · 67 revisions

Tesseract at UB Mannheim

The Mannheim University Library (UB Mannheim) uses Tesseract to perform OCR (optical character recognition) of historical German newspapers (Allgemeine Preußische Staatszeitung, Deutscher Reichsanzeiger). The latest results with OCR from more than 360,000 scans are available online.

Normally we run Tesseract on Debian GNU Linux, but there was also the need for a Windows version. That's why we have built a Tesseract installer for Windows.

WARNING: Tesseract should be either installed in the directory which is suggested during the installation or in a new directory. The uninstaller removes the whole installation directory. If you installed Tesseract in an existing directory, that directory will be removed with all its subdirectories and files.

The latest installers can be downloaded here:

We don't provide an installer for Tesseract 4.1.0 because we think that the latest version 5.0.0-alpha is better for most Windows users in many aspects (functionality, speed, stability). Version 4.1 is only needed for people who develop software based on the Tesseract API and who need 100 % API compatibility with version 4.0.

There are also older versions available.

In addition, we also provide documentation which was generated by Doxygen.

History

  • 2020-03-28 Update Tesseract 5.0.0 (alpha).
  • 2020-02-23 Update Tesseract 5.0.0 (alpha).
  • 2019-10-30 Update Tesseract 5.0.0 (alpha). Added support for OCR from URL. Fixed installation for Lao traineddata.
  • 2019-10-10 Update Tesseract 5.0.0 (alpha). Uninstall no longer recursively removes the installation directory.
  • 2019-07-08 Update Tesseract 5.0.0 (alpha). Supports result output on Windows command line.
  • 2019-06-23 Update Tesseract 5.0.0 (alpha). Supports Windows XP again. Much faster (removed OpenMP).
  • 2019-05-26 Update Tesseract 5.0.0 (alpha).
  • 2019-05-09 Special edition for #elag2019. Training executables which require ICU fail.
  • 2019-03-17 Special edition for #bibtag19.
  • 2019-03-14 Update Tesseract 4.1.0 (RC1). Added support for ALTO output. Missing ICU DLL for training.
  • 2018-10-30 Update Tesseract 4.0.0.
  • 2018-10-24 Update Tesseract 4.0.0 (RC4).
  • 2018-10-14 Update Tesseract 4.0.0 (RC3).
  • 2018-10-10 Update Tesseract 4.0.0 (RC2).
  • 2018-10-02 Update Tesseract 4.0.0 (RC1).
  • 2018-09-17 Fixed the previous 64 bit installer by adding two missing DLL files.
  • 2018-09-12 Update Tesseract 4.0.0. Mainly bug fixes, see list of commits. For the 64 bit installation, some executables don't work because of missing DLL files.
  • 2018-06-21 Update Tesseract 3.05.02. Also updates the DLL files.
  • 2018-06-08 Update Tesseract 4.0.0. Fix ICU DLL files for 64 bit installer.
  • 2018-04-14 Update Tesseract 4.0.0. Also updates some DLL files. Now also with 64 bit installer.
  • 2018-01-09 Update Tesseract 4. Also updates some DLL files.
  • 2017-08-04 Update Tesseract 4. Now supports best traineddata.
  • 2017-06-02 Update Tesseract 3.05.01.
  • 2017-05-10 Update Tesseract 3.05.00 (+ later fixes). Removed buggy setting of PATH.
  • 2017-05-10 Update Tesseract 4. Now includes AVX support.
  • 2017-02-16 Update Tesseract 4. Fixed not working AVX support.
  • 2017-02-02 Update Tesseract 4. Removed not working AVX support.
  • 2017-01-30 Update Tesseract 4, added new training tools. AVX support not working.
  • 2016-11-29 First version with LSTM (still experimental).
  • 2016-11-11 Update with latest bug fixes.
  • 2016-08-31 Update with latest bug fixes for text2image.
  • 2016-08-28 Update with latest bug fixes.
  • 2016-07-11 TIFF warnings are now shown on the console (no longer disturbing message windows).
  • 2016-05-13 The new installer now includes the executables needed for training, too. It is based on the latest Tesseract sources.

Hint: Old versions of the installer had an option to add Tesseract to the PATH environment variable. That option was disabled by default. If it was enabled and PATH was very long, it could happen that the new PATH was empty. We suggest not to use that option and disabled it in our latest version.

You can’t perform that action at this time.