Skip to content

v0.0.7

Compare
Choose a tag to compare
@bertsky bertsky released this 30 Jun 13:47
· 134 commits to master since this release

Fixed:

  • recognize: regression from changed network initialization
  • recognize: also load uncompressed models (Python 3 port)
  • re/segment: avoid creating invalid polygon coordinates
  • ocrolib scale estimation: make DPI-dependent, add fallback for empty/noise pages
  • ocrolib morphology: avoid rounding artifacts and asymmetry

Changed:

  • ocrolib / all ocropy processors: boost performance via OpenCV/PIL intead of SciPy
  • binarize: expose threshold parameter
  • re/segment: require images to be binarized already instead of ad-hoc binarization
  • re/segment: much faster line segmentation, better separation of neighbouring lines
  • segment: much more robust fg h/v-line and bg column detection, new image detection
  • segment: add AlternativeImage with h/v-line or image non-text clipped to background
  • segment: rewrite of region aggregation via hybrid recursive X-Y cut
  • segment: also annotate detected lines (at detected regions) after page segmentation
  • segment: expose many new parameters
  • segment: add all new lines/regions in proper (but only top-down left-right) reading order
  • segment: add table level (like page segmentation, but horizontal-first split strategy)
  • segment: also recurse into table cells for region level
  • segment: incremental annotation (ignore and re-order existing text/image/separator regions)
  • profile: remove this CLI, but keep integrated via API in new pure OCR-D CLI postcorrect
  • wer, training: remove these CLI
  • postcorrect, align: update to latest JAR
  • remove non-OCR-D scripts from installation
  • add uninstall target
  • update documentation (esp. training, testing and postcorrection)
  • improve/extend automatic tests