-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Built in scanner does not set image DPI #23
Comments
Hmm, this sounds tricky. I didn't even know you could store DPI information in jpeg files. Some phones do provide depth information for camera images but I think this would be overkill. In the general case we could just assume the scanned document to be of A4 size (I guess Americans use a similarly sized paper format) and use that to estimate the DPI. Of course this would be way off if the user scans a large billboard. |
Well, this is required so that paperless can make PDF documents from images with selectable text in them (which I believe is a good thing). It needs to know how big the pages should be. Since we're talking about paper documents, I'd just default to A4 and calculate that from the image height. This will cover about 99% of all usage. |
Good, then that should be simple 👍 |
After some thinking, I think I'll add that calculation to paperless itself for images. If no DPI information is available, assume A4. I just tried adding an image of a document created with this Office Lens app, and it does not work. That should make it more compatible. |
Sounds good, then we need less work here 😄 |
Well, that was easy. Edit: I'm kinda amazed by how good this entire setup works with these scanner apps, OCR and text embedding and all that. |
Thanks guys, that was fast |
New release with this fix is out. |
Another subset of #18 I suppose. Recently paperless-ng has started using OCRmyPDF (tested w/0.9.6), which fails when trying to process an image missing DPI.
Since each image from the app's scanner would be a different size, I'm not sure there's a way to set
PAPERLESS_OCR_IMAGE_DPI=<num>
to make this work properly (without ruining OCR of other scanned docs). Or perhaps there is, maybe @jonaswinkler could chime in here.Otherwise, do you know if there is there way to put DPI info with edge_detection?
Logs when uploading an image (used my PR #22 for the moment):
The text was updated successfully, but these errors were encountered: