The CIS language aware OCR document error profiler
C++ CMake C Objective-C Perl Makefile
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
AbbyyXmlParser
AlphaScore
Alphabet
AltoXML
Candidate
CorrectionSystem
CreateXML
DictSearch
DocEvaluation
DocXML
Document
FBDic
FBDicString
Getopt
GlobalProfile
GtDoc
Hash
IBMGroundtruth
INIConfig
LevDEA
LevDistance
LevFilter
LevenshteinWeights
MSMatch
MinDic
Pattern
PatternCounter
Profiler
ResultSet
SimpleEnrich
TXTReader
Token
TransTable
Utils
Vaam
Val
cmake_modules
gsm
iDictionary
jni
markup
tools
.gitignore
.gitmodules
CMakeLists.txt
Doxyfile
Exceptions.h
Global.h
Mainpage.h
Makefile
Makefile.win
Makefile~
README.md
Stopwatch.h

README.md

Profiler

Source code for the language-aware OCR document error profiler. See the Profiler Manual for a description.

References

The profiler has originally been written by Uli Reffle as part of his PhD thesis in computational linguistics at CIS during the IMPACT project (2008-2011).

It has been further developed as a CLARIN-D Kurationsprojekt by Florian Fink at CIS.

Its underlying technology is described in the following publications:

Mihov, Stoyan, and Klaus U. Schulz. 2004. “Fast Approximate Search in Large Dictionaries.” Computational Linguistics 30 (4). MIT Press: 451–77.

Reffle, Ulrich. 2011. Algorithmen und Methoden zur dokumentenspezifischen Analyse historischer und OCR-erfasster Texte. Verlag Dr. Hut.

Reffle, Ulrich, and Christoph Ringlstetter. 2013. “Unsupervised Profiling of OCRed Historical Documents.” Pattern Recognition 46 (5): 1346–57. doi:http://dx.doi.org/10.1016/j.patcog.2012.10.002.

Schulz, Klaus U., and Stoyan Mihov. 2002. “Fast String Correction with Levenshtein Automata.” International Journal on Document Analysis and Recognition 5 (1). Springer: 67–85.