Skip to content

Workflow Guide ocr evaluation

Robert Sachunsky edited this page Feb 4, 2022 · 4 revisions

In this processing step, the text output of the OCR or post-correction can be evaluated by aligning with ground truth text and measuring the error rates.

Available processors

Processor Parameter Remarks Call
ocrd-dinglehopper -P textequiv_level region For page-wise visual comparison (2 file groups). First input group should point to the ground truth. ocrd-dinglehopper -I OCR-D-GT,OCR-D-OCR -O OCR-D-EVAL
ocrd-cor-asv-ann-evaluate -P metric historic-latin -P gt_level 2 -P confusion 20 -P histogram true For document-wide aggregation (N file groups). First input group should point to the ground truth. ocrd-cor-asv-ann-evaluate -I OCR-D-GT,OCR-D-OCR -O OCR-D-EVAL

Comparison

ocrd-dinglehopper ocrd-cor-asv-ann-evaluate
goal CER/WER and visualization CER/WER (mean+stddev)
granularity only single pages single-page + aggregated
input arity 2 fileGrps N fileGrps
input constraints segmentations may deviate segments must have same IDs
input level region or textline textline
output HTML + JSON report per page JSON report per page+all
alignment rapidfuzz.string_metric.levenshtein_editops difflib.SequenceMatcher
Unicode uniseg.graphemeclusters to get distances on graphemes calculates alignment on codepoints, but post-processes combining characters
charset NFC + a set of normalizations that (roughly) target OCR-D GT transcription guidelines level 3 to level 2 NFC or NFKC or a custom normalization (called historic_latin) with setting gt_level 1/2/3

Notes on parameter usage

E.g.

  • which parameters do you use with what values?
  • which parameters are insufficiently documented?
  • which aspects of a processor should be parameterizable but are not?

Notes on document-specific usage

E.g. which processors worked best with what material? -- feel free to post sample images here, too.

Welcome to the OCR-D wiki, a companion to the OCR-D website.

Articles and tutorials
Discussions
Expert section on OCR-D- workflows
Particular workflow steps
Recommended workflows
Workflow Guide
Videos
Section on Ground Truth
Clone this wiki locally