# Deskewing pages
This notebook was designed to introduce one method for automatically deskewing page images in preparation for OCR. For our purposes in this iteration of the class, the details of the process are less important than seeing an example of the ways that images can be computationally changed after initial photography: when you're looking at a digital surrogate, you're seeing an image that has likely been through several processes that attempt to optimize it for the task at hand without a human having to check it at each step of the way.

For today, there's **really** no need to try to figure out any of the details of the code: the more important thing is to get a general sense of what's happening, and then observe the differences that those changes make to the image in the end.

The code in this notebook is drawn from a blog post by Leo Ertuna at [Becoming Human](https://becominghuman.ai/how-to-automatically-deskew-straighten-a-text-image-using-opencv-a0c30aed83df). This notebook breaks Ertuna's code up into interactive steps to show what's happening along the way.

### A note before we start: this may not be the only problem to solve
The code in this notebook assumes that the problem with the page image is that it's skewed and needs to straightened—this treats the page as a two-dimensional plane. 

That works pretty well if the images are of reasonably flat pages of the kind that we can often get from the sort of imaging labs that many libraries have. But books don't necessarily lay flat, and, depending on the condition of the binding, it may not always possible to flatten the pages for imaging. So the lines of text in some images will appear not just skewed, but actually curved, due to the curvature of the pages. And pages in a book can curl in a number of ways all at once (recall, for instance how much more and how differently the pages in the middle of a thick book curl compared to pages at the beginning or end.)

It's possible to reduce or eliminate the appearance of curvature in the lines of a page image incorporating some of techniques we'll see in this deskewing routine (but adding some others). That's a more complicated problem that we won't take on, but there's a great [blog post by Mark Zucker](https://mzucker.github.io/2016/08/15/page-dewarping.html) that walks through a solution. The blog post offers visualizations of what's happening at each step, so it's an instructive read even if you're not examining the details of the code. In the interests of time, I'll urge you **not** to examine the details of the code today, but to skim Zucker's description of his approach to the problem and look at the illustrations that visualize what the code is *doing*.

## 1 - Connect to Google Drive, copy files, and install packages

In [None]:
#Code cell #1
#Get access to Google Drive
from google.colab import drive
drive.mount('/gdrive')

In [None]:
#Code cell #2
%cp -r /gdrive/MyDrive/L-100a/page_images.zip /content/page_images.zip
%cd /content/
!unzip page_images.zip
%cd /content/page_images/
!unzip penn_pr3732_t7_1730b.zip

In [None]:
#Code cell #3
#Install IPyWidgets to provide widgets for experimenting with some variables later
import ipywidgets as widgets
from ipywidgets import interact

#Import necessary Python packages for use in our code. 

#Note that opencv-python is installed by default in Google Colaboratory. If you 
#were working in a different environment, you'd need to be sure it was installed
#using pip

#(The second import is specific to Google Colaboratory and provides a workaround
#to get OpenCV's imshow command to work properly in a Colab notebook.
import cv2
from google.colab.patches import cv2_imshow
import numpy as np

## 2 - Opening the image
Run the next cell to get a drop-down list of different images for processing, all with varying levels of skewing. (You only need to run that cell once. Thereafter, you can change the image you're working with by choosing a different image from the select menu.)

In [None]:
#Code cell #4
image_select = widgets.Dropdown(
    description='Choose image',\
    options = ['PR3732_T7_1730b_body00' + i for i in ['04.tif', '11.tif', '13.tif', '21.tif', '36.tif', '63.tif', '78.tif', '82.tif']],\
    value = 'PR3732_T7_1730b_body0004.tif',
    style={'description_width': 'initial'})
display(image_select)

In [None]:
#Code cell #5
#Identify the skewed image and have OpenCV read it. (This can take a little 
#while, so give it time to complete.)

source_directory = '/content/page_images/penn_pr3732_t7_1730b/'
skewed_image = source_directory + image_select.value
im = cv2.imread(skewed_image, cv2.IMREAD_COLOR)
#Let's see what the image looks like: an excellent image, but a little skewed.
cv2_imshow(im)

## 3 - Manipulating the image
OCR software like Tesseract might well be able to handle an image like this, but recognition of text lines will be better if we can straighten it. Ertuna's script offers a nice example of a workflow for figuring out exactly *how* skewed the image is, then using that measurement to straighten the image. As we proceed, you'll see some of the ways that images that are good for *us* are not as useful for the computer, and vice versa.

In [None]:
#Code cell #6
#Make a copy of the image
newImage = im.copy()
#Convery to grayscale
gray = cv2.cvtColor(newImage, cv2.COLOR_BGR2GRAY)
#Apply a Gaussian blur to reduce the effect of any noise in the image
blur = cv2.GaussianBlur(gray, (9, 9), 0)
#Convert the image to inverted black and white (i.e., white text on a 
#black background). Note that Ertuna's script uses Otsu's method for 
#thresholding to black and white.
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
cv2_imshow(thresh)

### 3.a - Set some variables
Run the next cell to create a few sliders that will allow you to adjust the variables used in the next steps. (You only need to run that cell once. Thereafter, changing the sliders will change the values of the variables used in the subsequent cells.)

In [None]:
#Code cell #7
kernel_width = widgets.IntSlider(description = 'Kernel width', \
                                               min=10, max=50, step=5, value=30)
kernel_height = widgets.IntSlider(description='Kernel height', \
                                                 min=1, max=10, step=1, value=5)
num_iterations = widgets.IntSlider(description='Iterations', min=1, \
                      max=10, step=1, value=5)
display(kernel_width) 
display(kernel_height)
display(num_iterations)

### 3.b - Now things start to get strange...
Ertuna implements a common approach that may seem counterintuitive at first: we're going to `dilate` the white pixels of the text until they run together to form solid blocks of white. 
The next cell creates a set of sliders so you can play with the values that determine the size of the `kernel` used to dilate those pixels and the number of iterations for dilation (the defaults follow Ertuna's script). 
NOTE: Don't re-run Code cell #7—that will just reset the values to their defaults. Instead, make any adjustments to the sliders and then run Code cell #8.

In [None]:
#Code cell #8
#The kernel variable defines a shape to use for dilating the pixels: in this 
#case, a rectangle of 30 pixels wide and 5 pixels high. These proportions ensure
# that the text will run together while more or less maintaining the vertical 
#dimensions of the text lines. You could experiment with changing the x and y
#dimensions that are passed to cv2.MORPH_RECT to see how the output changes.
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernel_width.value, kernel_height.value))
#We dilate the pixels using the shape defined by kernel, and perform the operation
#five times. You could try increasing or decreasing the number of iterations to
#see how the output changes.
dilate = cv2.dilate(thresh, kernel, iterations=num_iterations.value)
cv2_imshow(dilate)

This cell finds the boundaries of the dilated white blocks that used to be our text lines and determines their contours. 
I've made one adjustment to Ertuna's script here in using the `RETR_EXTERNAL` method rather than the `RETR_LIST` method that Ertuna used. Ertuna's method retrieves *all* contours that are detected, where `RETR_EXTERNAL` ignores contours that are found *within other contours*. Though I can't say I've tested it entirely systematically, this approach seems to do a better job of detecting text blocks in this eighteenth-century text, where what may be wandering in the baseline of the set type creates some gaps that end up being detected as contours.

In [None]:
#Code cell #9
#Determine contours
contours, hierarchy = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

#These next steps are not really part of the deskewing sequence. I've included 
#them simply so we can see what's happening. We first create a color version
#of our black-and-white image, then draw a bright green line connecting the
#contour points so we can see the outlines of the detected shapes
show_contours = cv2.cvtColor(dilate, cv2.COLOR_BayerGR2RGB)
draw_contours = cv2.drawContours(show_contours, contours, -1, (115,255,105), 3)
cv2_imshow(draw_contours)

### 3.c - Finding the rectangle that fits this contour
Ertuna's script acts on only the largest of the detected areas on the not-unreasonable premise that the skew angle of the largest text block will be a good proxy for the skew angle of the entire page of text. (He notes, though, that other approaches are possible. One might find that the angle of a different block yielded better results, or the average of multiple blocks.)

Having selected the largest contour, the code in the next cell then determines the smallest possible rectangle that could contain the entire contour using `minAreaRect`: basically, we're drawing a straight-sided box around the irregular contour of the text block.



(The code in the following cell is, again, not really part of the deskewing procedure, but I have added it to show what's happening.)

Actually, what `minAreaRect` produces is not exactly a rectangle, but rather some *instructions* for making a rectangle. We get:
- the x, y coordinates of the center point;
- the width and height of the rectangle; and
- the angle of the rectangle (for more on how `minAreaRect` treats this angle, see [this post at *The AI Learner*](https://theailearner.com/tag/cv2-minarearect/).)

To actually draw the rectangle described by `minAreaRect`, the following cell uses OpenCV's `boxPoints` to get the corner points, then draws a series of lines to connect those corners.

In [None]:
#Code cell #10
#This sorts the set of contour points in reverse order: the contours of the 
#largest area will be first in this sorted, which will come in handy in the next
#cell.
sorted_contours = sorted(contours, key = cv2.contourArea, reverse = True)
#Select the largest detected contour
largest_contour = sorted_contours[0]
#Determine the minimum-area rectangle that would contain that contour
largest_min_area_rect = cv2.minAreaRect(largest_contour)
print(largest_min_area_rect)

In [None]:
#Code cell #11
def draw_min_area_rect(cv2minimumarearectangle, base_image) :
  draw_min_area_rect = cv2.cvtColor(base_image, cv2.COLOR_BayerGR2RGB)
  if isinstance(cv2minimumarearectangle, list) == True :
    print(len(cv2minimumarearectangle))
    for rect in cv2minimumarearectangle :
      min_area_box = cv2.boxPoints(rect)
      min_area_box = np.int0(min_area_box)
      draw_min_area_rect = cv2.line(draw_min_area_rect, (min_area_box[0][0], min_area_box[0][1]), \
                                    (min_area_box[1][0], min_area_box[1][1]), (0, 30, 255), 3)
      draw_min_area_rect = cv2.line(draw_min_area_rect, (min_area_box[1][0], min_area_box[1][1]), \
                                    (min_area_box[2][0], min_area_box[2][1]), (0, 30, 255), 3)
      draw_min_area_rect = cv2.line(draw_min_area_rect, (min_area_box[2][0], min_area_box[2][1]), \
                                    (min_area_box[3][0], min_area_box[3][1]), (0, 30, 255), 3)
      draw_min_area_rect = cv2.line(draw_min_area_rect, (min_area_box[3][0], min_area_box[3][1]), \
                                    (min_area_box[0][0], min_area_box[0][1]), (0, 30, 255), 3)
      cv2.putText(draw_min_area_rect, str(rect[-1]), 
                  (int(rect[0][0]) -100, int(rect[0][1])), cv2.FONT_HERSHEY_SIMPLEX, 
                  1, (0, 30, 255, 255), 3)
  else :
    min_area_box = cv2.boxPoints(cv2minimumarearectangle)
    min_area_box = np.int0(min_area_box)
    draw_min_area_rect = cv2.line(draw_min_area_rect, (min_area_box[0][0], min_area_box[0][1]), \
                                  (min_area_box[1][0], min_area_box[1][1]), (0, 30, 255), 3)
    draw_min_area_rect = cv2.line(draw_min_area_rect, (min_area_box[1][0], min_area_box[1][1]), \
                                  (min_area_box[2][0], min_area_box[2][1]), (0, 30, 255), 3)
    draw_min_area_rect = cv2.line(draw_min_area_rect, (min_area_box[2][0], min_area_box[2][1]), \
                                  (min_area_box[3][0], min_area_box[3][1]), (0, 30, 255), 3)
    draw_min_area_rect = cv2.line(draw_min_area_rect, (min_area_box[3][0], min_area_box[3][1]), \
                                  (min_area_box[0][0], min_area_box[0][1]), (0, 30, 255), 3)
    cv2.putText(draw_min_area_rect, str(cv2minimumarearectangle[-1]), 
                (int(cv2minimumarearectangle[0][0]) -100, int(cv2minimumarearectangle[0][1])), 
                cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 30, 255, 255), 3)

  return draw_min_area_rect

In [None]:
#Code cell #12
largest_min_area = draw_min_area_rect(largest_min_area_rect, dilate)
cv2_imshow(largest_min_area)

### 3.d - Back to the actual deskewing routine
Getting the angle of the rectangle's skew is actually easier than drawing the rectangle: we just need the last number produced by `minAreaRect`, but because of the way `minAreaRect` measures the angle, we do need to have a bit of math to make sure the value comes out usable.

In [None]:
#Code cell #13
# Determine the angle. Convert it to the value that was originally used to obtain skewed image
largest_rect_angle = largest_min_area_rect[-1]
if largest_rect_angle < -45:
    largest_rect_angle = 90 + largest_rect_angle
print(largest_rect_angle)

### 3.e - Let's see the deskewed image.
The code in the next cell takes a few steps to rotate our image 
1. First, we make a copy of our original image
2. Next, we determine the size of the image by getting its height and width (the first two items returned by `shape`)
3. Then, we determine the center of the image by dividing its height and width by two.
4. Next, we construct the rotation we want to happen: rotating the image around its center point by the `angle` we determined in the previous cell.

Note how we're using information that we calculated by using what is to us a very strange-looking image, and applying it to our original color image.

In [None]:
#Code cell #14
largest_rect_deskew = im.copy()
(h, w) = largest_rect_deskew.shape[:2]
center = (w // 2, h // 2)
# M = cv2.getRotationMatrix2D(center, angle, 1.0)
M = cv2.getRotationMatrix2D(center, largest_rect_angle, 1.0)
deskewed_largest_rect = cv2.warpAffine(largest_rect_deskew, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE)
cv2_imshow(deskewed_largest_rect)

Those results probably represent an improvement, though not necessarily a huge one. (In a longer version of this notebook, I offered an adjustment to Ertuna's method that seemed like it yielded some small but noticeable improvements with early print.) Try going back and changing some of the earlier variables to see how the final output changes. You could also try working on a different image to see how different images respond to the same process.

## 4 - Clear Google Colab environment

In [None]:
#Code cell #15
%cd /content/
!rm -r ./*