Skip to content

Releases: HazyResearch/pdftotree

v0.5.0

13 Oct 21:52
20bbd8d
Compare
Choose a tag to compare

0.5.0 - 2020-10-13

Added

Changed

  • Switch the output format from "HTML-like" to hOCR. (#62, @HiromuHota)
  • Loosen Keras' version restriction, which is now unnecessarily strict. (#68, @HiromuHota)
  • Greedily extract contents from PDF even if it looks scanned. (#71, @HiromuHota)
  • Upgrade Keras to 2.4.0 or later (and TensorFlow 2.2 or later). (#86, @HiromuHota)

Removed

Fixed

  • Fix a bug that an html file is not created at a given path. (#64, @HiromuHota)
  • Extract LTChar even if they are not children of LTTextLine. (#79, @HiromuHota)

v0.4.1

21 Sep 18:37
Compare
Choose a tag to compare

This release marks the end of development for the v0.4.x version of pdftotree. Going forward, we plan to change pdftotree to conform to hOCR with v0.5. For this process, we welcome @HiromuHota as a new maintainer.

If you would like to give feedback for this refactor, we invite you to comment in #62.