Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
I hear that unpaper is the go-to tool for this sort of thing, but integrating it would have to be done on the command-line level (like we're already doing for Tesseract). I'm not opposed to this, but given that all of my scans so far have been really good quality, I'm not prioritising it.
However, if someone wants to write a PR to make this work, I'd probably merge it :-)
referenced this issue
Feb 16, 2016
Just to keep you guys in the loop: I have a working version of Paperless with
I did some limited testing (mainly because I don't have any bad scan samples at hand), but from what I have seen, the OCR results at least didn't get worse for decent scans.
If somebody (a) either has a bad scan for me to test with or (b) can test it for themselves, some feedback would be great!
Edit: whoops, I just realised that you're the guy who made unpaper! Nifty! Well in that case, maybe you can take a look at @pitkley's fork and see if there's anything you'd change, like if there's a Pythonic way to interface with unpaper that hasn't been tried yet?
Sorry I should have been more clearer, I meant that as a current unpaper
Although I'd be happy to try this out once I have some spare time!
On Wed, Feb 17, 2016 at 8:36 AM Daniel Quinn email@example.com