Material for improving OCR output
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
Long s preservation OCR Corrections
Modern Font OCR Corrections
c18th Font OCR Corrections
.gitignore
18th-century-statute-vocabulary.txt
English-corrections.txt
Googles.txt
README.md
Roman-numerals-1-200.txt
cyrillic-character-corrections.txt
ending-corrections.txt
hyphenated-corrections.txt
latin-corrections.txt
modern-corrections.txt
names-corrections.txt
obsolete-spellings.txt
phrase-corrections.txt
places-corrections.txt
punctuation-cleaning.txt
roman-numerals.txt
splitting-words.txt
standardization.txt
stem-list.txt
stoplist.txt
unescapable-words.txt
y-for-comma-pairs.txt
years.txt

README.md

OCR-Support

Material for improving OCR output

This repository contains various lists of OCR errors and their corrections.

The files are: English: English language words and their corrections Googles: Misreadings of the Google signature at the bottom of google-scanned books. No correction given. Latin: A small selection of Latin words, mainly derived from statutes. Names: Personal names, both fore- and sur-. Places: Place names of one word, mainly English and Welsh

Unless stated, each file is licensed as public domain, and may be used freely without any encumbrance.