New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deskew/Despeckle #20

Closed
rileytg opened this Issue Feb 11, 2016 · 12 comments

Comments

Projects
None yet
6 participants
@rileytg

rileytg commented Feb 11, 2016

Often times when scanning documents (especially in bulk) you run into situations where the document is slightly skewed and/or has speckles that throw off OCR.

skew
speckle

@rileytg rileytg changed the title from Deskew/Despeckle to Deskewe/Despeckle Feb 11, 2016

@rileytg rileytg changed the title from Deskewe/Despeckle to Deskew/Despeckle Feb 11, 2016

@rileytg

This comment has been minimized.

rileytg commented Feb 11, 2016

Just opening this issue for thoughts on how to automatically process for this. There are a number of commercial products but I haven't found anything OSS that I've gotten to work well

@danielquinn

This comment has been minimized.

Owner

danielquinn commented Feb 11, 2016

I hear that unpaper is the go-to tool for this sort of thing, but integrating it would have to be done on the command-line level (like we're already doing for Tesseract). I'm not opposed to this, but given that all of my scans so far have been really good quality, I'm not prioritising it.

However, if someone wants to write a PR to make this work, I'd probably merge it :-)

@dimitrieh

This comment has been minimized.

dimitrieh commented Feb 15, 2016

This would be incredible!

@pitkley

This comment has been minimized.

Contributor

pitkley commented Feb 16, 2016

Just to keep you guys in the loop: I have a working version of Paperless with unpaper which you can find over at feature/unpaper.
This PR is currently blocked by #34, but as soon as that is resolved I will open a PR for unpaper.

I did some limited testing (mainly because I don't have any bad scan samples at hand), but from what I have seen, the OCR results at least didn't get worse for decent scans.

If somebody (a) either has a bad scan for me to test with or (b) can test it for themselves, some feedback would be great!

@Flameeyes

This comment has been minimized.

Flameeyes commented Feb 17, 2016

If there is any help I can lend with unpaper please don't hesitate to tag me :)

@danielquinn

This comment has been minimized.

Owner

danielquinn commented Feb 17, 2016

@Flameeyes as @pitkley suggested, you could post a link to a low-quality scan you'd like to see work, or even try out his fork and test some stuff yourself :-)

Edit: whoops, I just realised that you're the guy who made unpaper! Nifty! Well in that case, maybe you can take a look at @pitkley's fork and see if there's anything you'd change, like if there's a Pythonic way to interface with unpaper that hasn't been tried yet?

@Flameeyes

This comment has been minimized.

Flameeyes commented Feb 17, 2016

Sorry I should have been more clearer, I meant that as a current unpaper
maintainer :)

Although I'd be happy to try this out once I have some spare time!

On Wed, Feb 17, 2016 at 8:36 AM Daniel Quinn notifications@github.com
wrote:

@Flameeyes https://github.com/Flameeyes as @pitkley
https://github.com/pitkley suggested, you could post a link to a
low-quality scan you'd like to see work, or even try out his fork and test
some stuff yourself :-)


Reply to this email directly or view it on GitHub
#20 (comment)
.

Diego Elio Pettenò (aka Flameeyes)

@Cyber1000

This comment has been minimized.

Cyber1000 commented Mar 5, 2016

Will there be a PR for paperless with unpaper or are there any problems with it right now?

@danielquinn

This comment has been minimized.

Owner

danielquinn commented Mar 6, 2016

I see that @pitkley has a branch where he's started the unpaper integration, but it's fallen out of sync with master for now. My priority right now is the UI, but when that's ready, I'll be looking at this issue -- that is if @pitkley hasn't already submitted a PR.

@pitkley pitkley referenced this issue Mar 6, 2016

Merged

Add unpaper #74

@pitkley

This comment has been minimized.

Contributor

pitkley commented Mar 6, 2016

@Cyber1000 I have opened PR #74 which adds unpaper if you want to test it.

@danielquinn

This comment has been minimized.

Owner

danielquinn commented Mar 6, 2016

I've merged @pitkley's unpaper integration PR, so I'm going to go ahead and close this.

@danielquinn danielquinn closed this Mar 6, 2016

@Cyber1000

This comment has been minimized.

Cyber1000 commented Mar 7, 2016

Thanks, I'll take a look at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment