Skip to content
Stefan Weil edited this page Jun 11, 2018 · 10 revisions

Welcome to the Reichsanzeiger wiki!

Digital edition of the Reichsanzeiger

The digital edition of the Reichsanzeiger is prepared by Mannheim University Library. Report any issues here.

News

Here we inform about changes in the digital presentation.

2018-06-11

FESS had a configuration value which told it to delete an index older than 300 days. So it did now. That means it is currently building a new index. Until that process is finished, search results from the FESS search will be incomplete.

2017-08-02

OCR for Reichsanzeiger was just finished when Ray Smith published new trained OCR models for Tesseract. These new models are very promising, because they improve OCR for Fraktur a lot, although there are also some regressions (missing paragraph character, bugs like ß/B confusion in word list). So as soon as that bugs are fixed, there will be a new round trip of OCR. Compare some new results with the old ones.

2017-08-01

OCR for all images is finished! We now have more than 360,000 text files produced by Tesseract.

2017-07-29

Since a couple of days there is also a new experimental search index which allows fuzzy searchs tolerating some of the errors made by OCR. It is using the Fess Enterprise Search Server. (2017-08-02: updated URL)

We now have nearly 340.000 scans processed by OCR, and most of them are already in our search index.

2017-07-20

Now more than 250,000 scans can be searched locally. The search is based on Xapian Omega.

2017-07-09

A week ago we started OCR again with an improved Fraktur model for Tesseract (still based on Tesseract 3.05 technology, so not using LSTM). We use four compute servers with 72 (32+16+16+8) Tesseract processes simultaneously. That increased the number of scans covered by OCR from 54,000 to more than 120,000 up to now, and hopefully we'll have processed all scans in a few weeks.

oaclmmopbnobibdn

2017-07-08

The search service offered by digi.bib.uni-mannheim.de had to be stopped because both increased usage and more than doubled OCR data required too much server resources for the approximate search.

Notes

Problems (bad scans, missing journal issues) and other notes related to the digital presentation are documented here.

Clone this wiki locally
You can’t perform that action at this time.