Table of Contents
There are a several articles and GitHub repos dedicated to document segmentation; however, I didn't find one that worked right out of the box, so I created this one. It can be used for preprocessing document images for further text recognition on them or for saving them in proper format.
The description below explains the whole circumcision process. Maybe you will make up an idea of how to make it work better.
This package can be easily installed via pip
pip install document-cropper
import document_cropper as dc
To crop image you should use:
dc.crop_image("path_to_image.jpg", "name_for_the_result.jpg")
If you want to continue processing the photo and save the cropping result to a variable:
cropped = dc.crop_image("path_to_image.jpg")
Also you can provide input as np.ndarray
from skimage import io
cropped = dc.crop_image(image=image)
If you want to see all stages of processing you can use:
# Save the stages as image
dc.crop_image_pipeline("path_to_image.jpg", "name_for_the_result.jpg")
# Show the stages as matplotlib figure
dc.crop_image_pipeline("path_to_image.jpg")
main.py file contains examples of using these methods
There are also demostration() methods in this repository which can be uncommented in main.py. They show how the code from the files works in order to make it easier to figure out what the methods are doing. This demonstration() methods present only in this repository and do not come with the package when installed via pip
# segmentation.demonstration()
# corner_detection.demonstration()
# image_cropper.demonstration()
# cropper_pipeline.demonstration()
This implementation was based on this Inovex article. Code from the article didn't work out of the box, so I have reworked part of the code and implemented my own corner detection algorithm.
The algorithm consists of several steps:
1 preprocessing; 2) corner detection; 3) cropping.
I chose the most optimal methods and their hyperparameters testing each on a dataset of 200 photos.
-
Convert initial RGB image into monochrome one.
Usually documents are white and stand out strongly in the photo so we can use contrast filters for our needs. Such filters work well with monochrome images.
-
Apply Gaus filter
Gauss filter blurs the image thereby removing some artifacts. Tests showed that we can get better segmentation results using this filter.
-
Thresholding
At this step I apply thresholding as first binarization step. Usage of the thresholding method was missed in the article so I tested all the thresholding methods available in skimage and chose the best one. It turned out to be Otsu thersholding with disc size of 8 pixels.
-
Document selection
This is the most important step of the algorithm. Mistakes on this step cause the whole cropping process to fail.
After Otsu thresholding we get picture with different white zones. One of such zones is our document. At this step I am trying leave only document white zone. There is method in skimage which can cluster pixels from disjoit white zones. The biggest cluster is our document so I leave only it and remove all other regions.
This part of the algorithm should be improved. There are two cases when extracting works incorrect. The first one is when some of background white regions is connected to the documnet's region. The second one is when some of the background regions is bigger than document. There are examples for theese problems below:
These issues should be handled somehow in the future. -
Fill holes
At this step we remove holes left by text.
Two different binary holes methods from skimage and scipy were used. I went through different values of the hyperparameters and found the best ones. Also I tested the order of the application of these methods.
There is another issue. In the original article this step of binary closing was performed before extracting document region (previous step). Changing the order of applying theese methods gave significant increase in quality. If we perform binary closing before previous step we will glue background to the document.
-
Corner detection
Now we have segmentation mask of the document and we can find document's corners. For this perpose I extract one pixel edge from obtained mask. There is a problem with edge selection if the document goes out of the image (as in the example below). In such a case method for edges extrcation loses some of the edges so I decided to use padding with False values before edge extraction. It solved the problem.
Now we have edge pixels. Some of them belong to sides of the mask others belong to corners. Let's consider side pixel. If we look at surrounding of such a pixel we will understand that it has about half white neighbors and half of black neighbors. If we consider corner pixels then we will get that they have much less than half of white pixels. By this way we can decide guess which pixels are more likely to belong to the corners.
Finally we should decide which 4 pixels will we take for corners. I go through the obtained list of pixels and select the closest one for each of the corners of the image.
Algorithm described above is totally my idea. In other articles an repos the authors try to use Hough lines tranform. They try to find straight lines in seegmentation mask and then try to find their intrsections. Finally, the choose corners from theese intersections. I tested some varints of this approach and it did not give good results. -
Сutting out
When coordinates of corners are found we can finally cut the document from the image and rescale it to the correct form.