Use pdfminer for OSX; retain ocrmypdf for Linux #17

billfitzgerald · 2022-01-24T12:35:19Z

Pdfminer is already used to extract metadata, and ocrmypdf is not behaving well in testing with OSX (although that's likely due to my human error).

In any case, pdfminer has the ability to extract text from pdfs, and it is working without issue in OSX (so far, anyways).

billfitzgerald · 2022-01-25T15:25:13Z

Updating the title to reflect the status of the issue.

OCRmyPDF does a better job handling pdfs across a broaderrange of pdf types than pdfminer.

The temporary fix will be to check OS type and route all OSX users to clean pdfs using pdfminer and route all Linux users to use OCRmyPDF.

This is not ideal; in the future I want to have a single mechanism for all OS's, but it will have to do for now.

billfitzgerald · 2022-01-26T13:36:49Z

Closing.

billfitzgerald mentioned this issue Jan 24, 2022

Document setup procedure for OSX #7

Closed

billfitzgerald changed the title ~~Evaluate replacing ocrmypdf with pdfminer~~ Use pdfminer for OSX; retain ocrmypdf for Linux Jan 25, 2022

billfitzgerald closed this as completed Jan 26, 2022

Provide feedback