-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance image before sending to Tesseract for better OCR accuracy #6
Comments
Tesseract does seem to do otsu binarization (we are pretending this is a word) on the input image, but as far as I can tell that is all it does. From simple test cases on the sample image in #5 performing the above code decreased the amount of false positives on noisy backgrounds. This would imply that either it is not performing the pre-processing (it is accidentally turned off), or it does not do it as well. The above code is a stopgap to increase efficiency for the data I was using. Though ideally, as you said, a full pipeline should be implemented. For cleaning up an image you often times want to:
Depending on what the purpose of the cleanup is... you may swap the order around, or run some steps multiple times. For something like Japanese which has many small dots and dashes as part of the language... it may be difficult to de-noise without first identifying potential characters and removing outliers. |
Ah, ops. Would the correct term be Otsu thresholding then? I'm not particularly familiar with image processing, unfortunately. We could possibly even display the pre-processed image in the capture box (instead of having it be transparent) before OCR if pre-processing can be done in a fairly short time (<500ms?), then have some sort of simple interface to allow users to adjust the threshold values by a simple mechanism (i.e., tap & hold capture box, then drag to increase/decrease threshold). This sort of manual intervention could be advantageous seeing that there usually isn't a one-size-fits-all solution to these sorts of problems. |
I wasn't calling you out on anything. I just don't think Binarization is a word, but it got the concept across that I wanted it to. Thresholding does not imply a binary image... so I dont think that word is correct either. "Using the Otsu algorithm to create a binary image" is just a mouthful. So I am all for making up words that fit the situation. I am going to roughly define some terms, so forgive me if you know them already: Otsu Thresholding is the concept of taking a grayscale image and reading the histogram to determine the optimal threshold level for an image. A histogram is a graph showing the relative percentages of each monocolor value compared to the image. So if an Image is mostly black with a few white dots, it will show a line graph that is taller on the "dark" side. If the Image is split pretty well between almost white and almost black then the graph might look like a U. What you do with the threshold is up to you, and technically has no bearing on the algorithm itself. We are choosing to create a binary... but we could just as well turn everything above that threshold white and leave everything below it grayscale. Or apply it to the original color image to make everything below the threshold grayscale. (which would create pops of color) What this means is that Otsu IS the algorithm for a one-size fits all approach, because it calculates the best values for you. (In the simplest sense) Now of course it can be improved upon in various ways... One such way finds a different threshold for every pixel on the image based off surrounding pixels. This means that if the image is partially shadowed, it gets normalized to remove the shadow and then turned black/white. (more or less) You are correct, we absolutely could allow the user to change the threshold value, but in theory unless they are doing it per character something like a local Otsu would be more efficient. (Possibly... needs testing, but I like the idea for power users.) |
Hey o/ I could try a couple of different techniques. However, to compare them it would be nice to have a set of test images. Is there such a set? Or some examples where the current implementation fails? |
you could use manga_ocr that would make kaku very accurate , idk why but it's so inaccurate that i can't even use it |
@Shimizoki graciously sent over some otsu thresholding code in Java that should clean up the image a little:
I didn't have time to test for accuracy, but here's performance roughly (Axon 7):
For now, I don't think I'll be committing this code though - rather than a quick fix now I'd rather wait until a good image processing pipeline can be designed and implemented. It seems like Tesseract already uses the Otsu method for binarizing images so I don't think this method would be the most effective way to process the capture either.
Leptonica, which is also in the tess-two library also seems to have some nice image processing functions we might be able to take advantage of (also runs in C so should be faster than a Java implementation).
https://groups.google.com/forum/#!topic/tesseract-ocr/JRwIz3xL45U
The text was updated successfully, but these errors were encountered: