Skip to content
This repository has been archived by the owner on Jan 6, 2024. It is now read-only.

Releases: deajan/pmOCR

Commiting the git crime again and again, with a spoon

16 May 17:16
Compare
Choose a tag to compare
  • Limit preprocessor/transform threads to config defined NUMBER_OF_PROCESSES
  • Tesseract PDF intermediary transformation
    • Added intermediary transformation suffix to make sure we don't overwrite earlier files
    • Fixed intermediary transformation failing
    • Disabled intermediary transformation when preprocessor is used
  • Tesseract preprocessor
    • Improved tesseract preprocessor settings
    • Made general preprocessing/transformation dpi a variable
    • Always preprocess files to TIFF format so we don't need intermediary transformation with preprocessor

Commiting the git crime again

08 Mar 12:04
Compare
Choose a tag to compare

This release adds some nifty featues:

  • A configurable directory poller interval
  • Service recovery when the monitored directory is not writable or absent

It also fixes upgrades with newer configuration files and preprocessed images errors when using the new poller.

As already said, this should be the last pmOCR v1 release.
It will be maintained until pmOCR v2 shows up, written in Python, which should be fairly more easy to maintain than a 2.5K lines bash script ;)

Commiting the git crime

25 Feb 15:20
8000237
Compare
Choose a tag to compare

This release adds a new inotifywait emulation which uses polling instead of waiting for inotify signals from kernel, allowing to use pmOCR on Samba / NFS shares.
it also speeds up the file detection process by using pre-determined file lists.

As we're hitting 2022, this will be the last pmOCR release coded in bash.
bash is a wonderful complicated beast which is heavily error prone and was never designed to be used in such complicated ways.

I wish to continue maintaining this wrapper, but I definitly need to shift to a better programming language, and have chosen Python since it allows to code pmOCR with simple existing tools, without the need to reinvent (recode) the wheel.

Unless pmOCR v2 is released, support for pmOCR v1.x is guaranteed.

Happy OCRring

poor man's 4 tesseracts

11 Jul 08:43
8d9ac25
Compare
Choose a tag to compare

pmOCR v1.6.1 maintenance release

This release brings the following features:

  • Tesseract 4.x support (actually, did already work, but now it's tested and allows to select different OCR engines)
  • Currently in use files are deferred in service mode for later OCR processing

Other fixes went into this release:

  • Fix automatic service shutdown in RHEL / CentOS 6/7 after 10 days (automatic /tmp directory cleanup did remove the run file)
  • Many minor improvements and fixes that came with ofunctions developped on osync/obackup

Long time no see

21 Dec 18:36
Compare
Choose a tag to compare

A brand new pmocr release with lots of bugfixes and more sanity checks.

IMPORTANT Configuration file syntax has changed with version 1.6.0 in order to simplify new deployments.
Please make sure to use the new format.

See Changelog for more details

Urgent bugfix release

21 Apr 17:53
Compare
Choose a tag to compare

Bugfix release addressing an issue introduced with earlier v1.5.6 release that stops the service monitor after a first run because of the new cleanup behavior.

Bugfix & test framework release

20 Apr 19:15
Compare
Choose a tag to compare

This release mainly introduces some unit and functional testing, which resolved a couple of issues and also allows to run on travis CI platform:

  • Service run file was created in root since v1.5.4 because of some merge modifications
  • CSV transformation didn't work anymore (nasty typo)
  • Fixed a low severity security issue where log & run files were world readable

For more details, see chanelog file.

Small improvement release

13 Mar 12:14
Compare
Choose a tag to compare

The main feature of this release is the ability to move files upon successful / failed OCR recognition in order to keep the folder structure clean.
For other minor fixes see changelog.

A small improvements & bugfix release :)

06 Feb 16:55
Compare
Choose a tag to compare

New release of the 1.5 branch including the following

Improvements

  • Service now makes a 'forced' run every MAX_WAIT seconds (defaults to an hour)
  • An OCR run is also made on service start now
  • Moving files in monitored directories also trigger a run
  • Improved mail functions, parallel execution and logging

Bugfixes

  • Prevent overwriting multiple failed files to be overwritten when source produces the same filename

poor man's OCR tool just got less poor

21 Oct 13:49
Compare
Choose a tag to compare

This should be a pretty mature release, including the following highlights:

  • Owership preservation possibilty
  • Parallelization of OCR runs
  • Support for image preprocessors
  • New config file support
  • Better tesseract support