v0.0.7
Fixed:
recognize
: regression from changed network initializationrecognize
: also load uncompressed models (Python 3 port)re/segment
: avoid creating invalid polygon coordinates- ocrolib scale estimation: make DPI-dependent, add fallback for empty/noise pages
- ocrolib morphology: avoid rounding artifacts and asymmetry
Changed:
- ocrolib / all ocropy processors: boost performance via OpenCV/PIL intead of SciPy
binarize
: exposethreshold
parameterre/segment
: require images to be binarized already instead of ad-hoc binarizationre/segment
: much faster line segmentation, better separation of neighbouring linessegment
: much more robust fg h/v-line and bg column detection, new image detectionsegment
: addAlternativeImage
with h/v-line or image non-text clipped to backgroundsegment
: rewrite of region aggregation via hybrid recursive X-Y cutsegment
: also annotate detected lines (at detected regions) after page segmentationsegment
: expose many new parameterssegment
: add all new lines/regions in proper (but only top-down left-right) reading ordersegment
: addtable
level (like page segmentation, but horizontal-first split strategy)segment
: also recurse into table cells forregion
levelsegment
: incremental annotation (ignore and re-order existing text/image/separator regions)profile
: remove this CLI, but keep integrated via API in new pure OCR-D CLIpostcorrect
wer
,training
: remove these CLIpostcorrect
,align
: update to latest JAR- remove non-OCR-D scripts from installation
- add
uninstall
target - update documentation (esp. training, testing and postcorrection)
- improve/extend automatic tests