Skip to content

Perform OCR operations on PDFs and then compress them with the Internet Archive's code.

License

Notifications You must be signed in to change notification settings

TDavLinguist/OCRmyIA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

OCRmyIA

Perform OCR operations on PDFs and then compress them with the Internet Archive's code.

ABout the code

I am terrible at BASH and even worse at Python so I welcome and encourage any and all pull requests to improve my code. It's essentially a workflow recipe that depends on other pieces of code to work. I will probably add more scripts here later that do different things. I originally intended to license it as GPL-2, but I've decided to release it as AGPL-3.

Prerequisites

  • GNU/Linux OS with GNU Core Utils
  • OCRmyPDF (Tested on >= 1.4.0)
  • archive-pdf-tools Note: Due to a bug you should really modify the requirements.txt to install pymupdf v1.21.0 and not the latest version. This bug appears to only affect the archive-pdf-tools script.
  • archive-hocr-tools(these should install when installing archive-pdf-tools with pip)

About

Perform OCR operations on PDFs and then compress them with the Internet Archive's code.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages