Welcome to the Reichsanzeiger wiki!
Digital edition of the Reichsanzeiger
Here we inform about changes in the digital presentation.
FESS had a configuration value which told it to delete an index older than 300 days. So it did now. That means it is currently building a new index. Until that process is finished, search results from the FESS search will be incomplete.
OCR for Reichsanzeiger was just finished when Ray Smith published new trained OCR models for Tesseract. These new models are very promising, because they improve OCR for Fraktur a lot, although there are also some regressions (missing paragraph character, bugs like ß/B confusion in word list). So as soon as that bugs are fixed, there will be a new round trip of OCR. Compare some new results with the old ones.
OCR for all images is finished! We now have more than 360,000 text files produced by Tesseract.
Since a couple of days there is also a new experimental search index which allows fuzzy searchs tolerating some of the errors made by OCR. It is using the Fess Enterprise Search Server. (2017-08-02: updated URL)
We now have nearly 340.000 scans processed by OCR, and most of them are already in our search index.
A week ago we started OCR again with an improved Fraktur model for Tesseract (still based on Tesseract 3.05 technology, so not using LSTM). We use four compute servers with 72 (32+16+16+8) Tesseract processes simultaneously. That increased the number of scans covered by OCR from 54,000 to more than 120,000 up to now, and hopefully we'll have processed all scans in a few weeks.
The search service offered by digi.bib.uni-mannheim.de had to be stopped because both increased usage and more than doubled OCR data required too much server resources for the approximate search.
Problems (bad scans, missing journal issues) and other notes related to the digital presentation are documented here.