# Re-process bad images
If you examine the binarized images, you may well find some where the process we used in the last notebook didn't yield the best results: perhaps Otsu's method didn't yield the best binarization, or perhaps the deskewing routine didn't quite do the trick for a particular page. (I noticed that page 86 fared pretty badly, for instance, and there may be others I'm missing.)

This notebook offers an interactive way to tweak the binarization and deskewing methods in order to come up with a better result for any given image. When you have a result that looks better, you can save a new binarized file for  preliminary OCR.

If, after checking out the images, you don't see any that need fixing, then you can just skip this altogether. If you do see some that need tweaking, I'd recommend only doing one or two to get a feel for the kinds of adjustments you'd make—in the time we have, there's no need to go for perfect results for all of the images.

(**Note:** Because this notebook mostly repackages things we've already done, there are very few comments. There are also some differences here that I introduced to solve little snags along the way. I haven't tested this exhaustively, so some things might not work as expected.)

In [None]:
#Code cell #1
#Connect to Google Drive
from google.colab import drive
drive.mount('/gdrive')

In [None]:
#Code cell #2
!pip install pytesseract
import ipywidgets as widgets
from ipywidgets import interact, interact_manual, interactive
from PIL import Image, ImageDraw
import cv2
from google.colab.patches import cv2_imshow
import matplotlib.pyplot as plt
import numpy as np
import pytesseract

In [None]:
#Code cell #3
#I'm assuming that you'll be working with the black and white page images
#produced by the prior notebook.
%cp /gdrive/MyDrive/L-100\ Digital\ Approaches\ to\ Bibliography\ \&\ Book\ History-2023/penn_pr3732_t7_1730b.zip /content/penn_pr3732_t7_1730b.zip
%cp /gdrive/MyDrive/rbs_digital_approaches_2023/output/penn_pr3732_t7_1730b-bw.zip /content/penn_pr3732_t7_1730b-bw.zip
%cd /content/
!unzip penn_pr3732_t7_1730b.zip
!unzip penn_pr3732_t7_1730b-bw.zip
!mv bw/ penn_pr3732_t7_1730b-bw/


In [None]:
#Code cell #4
image_source_directory = '/content/penn_pr3732_t7_1730b/'

In [None]:
#@title 1 - Choose Image to Reprocess
#@markdown Run this cell to generate a dropdown menu to select an image that needs to be reprocessed{display: 'form'}
image_select = widgets.Dropdown(
    description='Choose image',\
    options = ['PR3732_T7_1730b_body00' + i for i in ['01.tif',
 '02.tif', '03.tif', '04.tif', '05.tif', '06.tif', '07.tif', '08.tif',
 '09.tif', '10.tif', '11.tif', '12.tif', '13.tif', '14.tif', '15.tif',
 '16.tif', '17.tif', '18.tif', '19.tif', '20.tif', '21.tif', '22.tif',
 '23.tif', '24.tif', '25.tif', '26.tif', '27.tif', '28.tif', '29.tif',
 '30.tif', '31.tif', '32.tif', '33.tif', '34.tif', '35.tif', '36.tif',
 '37.tif', '38.tif', '39.tif', '40.tif', '41.tif', '42.tif', '43.tif',
 '44.tif', '45.tif', '46.tif', '47.tif', '48.tif', '49.tif', '50.tif',
 '51.tif', '52.tif', '53.tif', '54.tif', '55.tif', '56.tif', '57.tif',
 '58.tif', '59.tif', '60.tif', '61.tif', '62.tif', '63.tif', '64.tif',
 '65.tif', '66.tif', '67.tif', '68.tif', '69.tif', '70.tif', '71.tif',
 '72.tif', '73.tif', '74.tif', '75.tif', '76.tif', '77.tif', '78.tif',
 '79.tif', '80.tif', '81.tif', '82.tif', '83.tif', '84.tif', '85.tif', '86.tif']],\
    value = 'PR3732_T7_1730b_body0001.tif',
    style={'description_width': 'initial'})
display(image_select)

In [None]:
#Code cell #5
source_image = image_source_directory + image_select.value
cv2color_image = cv2.imread(source_image, cv2.IMREAD_COLOR)
cv2gray_image = cv2.cvtColor(cv2color_image, cv2.COLOR_BGR2GRAY)
cv2_imshow(cv2gray_image)

In [None]:
#@title Set values for Gaussian blur {display-mode: "form"}
#@markdown Try adjusting the value that will be used for blurring in the next cell.

#@markdown (You only need to run this cell once—re-running it will simply reset it to the default value. After changing the value of the slider, try re-running the cell below this one.)
blur = widgets.IntSlider(min=1, max=31, step=2, value=5, description='Blur')
display(blur)

In [None]:
#Code cell #6
cv2blurred_image = cv2.GaussianBlur(cv2gray_image, (blur.value, blur.value), 0)
cv2_imshow(cv2blurred_image)

## 2 - Cropping
If the image is properly cropped, you can skip this step and move on to Step 3 to try adaptive thresholding. If our automatic cropping routine didn;t give us a good result, though, we can try again.

In [None]:
#Code cell #7
invert_image = cv2.threshold(cv2blurred_image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
cv2_imshow(invert_image)

In [None]:
#@title Set kernel size for dilation {display-mode: "form"}

#@markdown **Run the code** in this cell to create a set of
#@markdown slider widgets for changing the values of the
#@markdown "kernel" used to dilate the white pixels in the
#@markdown image. You can change the height and width of the
#@markdown kernel (i.e., the amount of vertical and horizontal
#@markdown dilation to be applied) as well as the number of
#@markdown iterations (how many times the dilation operation
#@markdown will be applied.)

#@markdown You only need to run this cell once (re-running
#@markdown it will just re-set the values to their defaults).
#@markdown You can change the values of the sliders and
#@markdown then run Code cell 12 to see the different
#@markdown effects that different values have.
kernel_width = widgets.IntSlider(description = 'Kernel width', \
                                               min=1, max=25, step=1, value=10)
kernel_height = widgets.IntSlider(description='Kernel height', \
                                                 min=1, max=25, step=1, value=20)
num_iterations = widgets.IntSlider(description='Iterations', min=1, \
                      max=10, step=1, value=5)
display(kernel_width)
display(kernel_height)
display(num_iterations)

In [None]:
#Code cell #8
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernel_width.value, kernel_height.value))
#Create a new image by dilating the prior image using the kernel shape we've set
dilate_image = cv2.dilate(invert_image, kernel, iterations=num_iterations.value)
cv2_imshow(dilate_image)

In [None]:
h, w = dilate_image.shape[:2]
mask = np.zeros((h+2, w+2), np.uint8)

# Floodfill from point (0, 0)
floodfill_image = dilate_image.copy()
cv2.floodFill(floodfill_image, mask, (0,0), 0);
# floodfill = cv2.floodFill(invert_image)
cv2_imshow(floodfill_image)

In [None]:
#@title Set methods for identifying contours
#@markdown **Run this cell** to create two dropdowns
#@markdown to set the methods OpenCV will use for
#@markdown detecting contours.

retrieval_select = widgets.Dropdown(
    description='Choose contour retrieval method',\
    # options = ['cv2.RETR_' + op for op in ['EXTERNAL', 'LIST', 'CCOMP', 'TREE', 'FLOODFILL']],\
    # options = [cv2.RETR_EXTERNAL, cv2.RETR_LIST, cv2.RETR_CCOMP, cv2.RETR_TREE, cv2.RETR_FLOODFILL],\
    options = ['RETR_EXTERNAL', 'RETR_LIST', 'RETR_CCOMP', 'RETR_TREE', 'RETR_FLOODFILL'],\
    value = 'RETR_EXTERNAL',\
    style={'description_width': 'initial'})
contour_approx = widgets.Dropdown(
    description = 'Choose contour approximation method',\
    options = ['CHAIN_APPROX_NONE', 'CHAIN_APPROX_SIMPLE', 'CHAIN_APPROX_TC89_L1', 'CHAIN_APPROX_TC89_KCOS'],\
    value = 'CHAIN_APPROX_SIMPLE',\
    style = {'description_width': 'initial'}
)
display(retrieval_select)
display(contour_approx)

In [None]:
#Code cell #9
#Identify the contours and their hierarchy
retrieval_method = {'RETR_EXTERNAL': 0,
                    'RETR_LIST': 1,
                    'RETR_CCOMP': 2,
                    'RETR_TREE': 3,'RETR_FLOODFILL': 4
                    }
approximation_method = {'CHAIN_APPROX_NONE': 0,
                        'CHAIN_APPROX_SIMPLE': 1,
                        'CHAIN_APPROX_TC89_L1': 2,
                        'CHAIN_APPROX_TC89_KCOS': 3}
contours, hierarchy = cv2.findContours(floodfill_image, retrieval_method[retrieval_select.value], approximation_method[contour_approx.value])

#These lines are just to visualize what we have. We make a copy of the dilate
#image, converting it from binary to color (so we can see colored lines on it),
#then draw all of the contours on that new image in green.
show_contours = cv2.cvtColor(floodfill_image.copy(), cv2.COLOR_BayerGR2RGB)
show_contours = cv2.drawContours(show_contours, contours, -1, (0,255,0), 3)
cv2_imshow(show_contours)

In [None]:
#Code cell 14
#Get the height and width of the image
height = np.shape(floodfill_image)[0]
width = np.shape(floodfill_image)[1]

#Divide the width by 8
eighth = int(width/8)
#Find the midpoint on the x-axis
midpoint_x = int(width/2)
#Create a tuple with the left-most and right-most x-axis for this zone
middle_zone = (midpoint_x - eighth, midpoint_x + eighth)

#This code is just to display what's going on. We make a copy of the image
#that already has our contours drawn in green...
show_middle_zone = show_contours.copy()
#...then draw two blue lines to show the edges of the middle zone
show_middle_zone = cv2.line(show_middle_zone, (middle_zone[0],0),
                            (middle_zone[0], height), (255,0,0), 3)
show_middle_zone = cv2.line(show_middle_zone, (middle_zone[1],0),
                            (middle_zone[1],height), (255,0,0), 3)
cv2_imshow(show_middle_zone)

In [None]:
#Create an empty list
middle_zone_contours = []
#Iterate through the list of contours
for contour in contours :
  #https://learnopencv.com/find-center-of-blob-centroid-using-opencv-cpp-python/
  M = cv2.moments(contour)
  contour_x = int(M["m10"] / M["m00"])
  #If the x-axis value of the centroid is in range for teh x-axis values of the
  #middle_zone, then add it to the list of middle_zone_contours
  if middle_zone[0] <= contour_x <= middle_zone[1] :
    middle_zone_contours.append(contour)

#This code just shows what we've done
show_middle_contours = show_middle_zone.copy()

#Iterate through the list of middle_zone_contours, outlining them in purple
for middle_contour in middle_zone_contours :
  show_middle_contours = cv2.drawContours(show_middle_contours, [middle_contour], -1, (255, 0, 255), 3)
cv2_imshow(show_middle_contours)

In [None]:
#Code cell 16

show_rectangles = cv2.cvtColor(floodfill_image.copy(), cv2.COLOR_BayerGR2BGR)
#Put the detected contours back on the image
for contour in contours :
  #All contours in green
  show_rectangles = cv2.drawContours(show_rectangles, contour, -1, (0,255,0), 3)
for middle_zone_contour in middle_zone_contours :
  #Middle zone contours in purple
  show_rectangles = cv2.drawContours(show_rectangles, [middle_zone_contour], -1, (255, 0, 255), 3)

rectangles = [cv2.boundingRect(contour) for contour in middle_zone_contours]
for rectangle in rectangles :
  start_point = (rectangle[0], rectangle[1])
  end_point = (rectangle[0] + rectangle[2], rectangle[1] + rectangle[3])
  show_rectangles = cv2.rectangle(show_rectangles, start_point, end_point, (0, 0, 255), 3)
  show_rectangles = cv2.putText(show_rectangles, str(rectangle[0]) + ',' + str(rectangle[1]),
                              (rectangle[0], rectangle[1]),
                              cv2.FONT_HERSHEY_SIMPLEX, 1.5, (0, 0, 255), 2)

cv2_imshow(show_rectangles)




In [None]:
#Code cell 17
#Construct lists of x- and y-axis coordinates for each rectangle
leftx_coords = [rectangle[0] for rectangle in rectangles]
rightx_coords = [rectangle[0] + rectangle[2] for rectangle in rectangles]
topy_coords = [rectangle[1] for rectangle in rectangles]
bottomy_coords = [rectangle[1]  + rectangle[3] for rectangle in rectangles]

#Get the left-, right-, top-, and bottom-most x- and y-axis values by getting
#the minima and maxima of the values in the lists we just made, then
#padding them a little bit so that we're not cropping right against the text
leftmost = min(leftx_coords) - 100
rightmost = max(rightx_coords) + 100
topmost = min(topy_coords) - 50
bottommost = max(bottomy_coords) + 50

#Construct coordinates for the four corners of the imaginary rectangle using the
#left-, right-, top-, and bottom-most x- and y-axis values
upper_left = (leftmost, topmost)
upper_right = (rightmost, topmost)
lower_right = (rightmost, bottommost)
lower_left = (leftmost, bottommost)

In [None]:
#Code cell 18
#Make a copy of the show_rectangles image with the red rectangles already drawn
text_block = show_rectangles.copy()
#Draw a rectangle on the image, using the upper_left and lower_right coordinates
text_block = cv2.rectangle(text_block, upper_left, lower_right, (255, 255, 0), 3)

cv2_imshow(text_block)


In [None]:
#Code cell #19
text_block_cropped = cv2color_image.copy()
y = topmost
x = leftmost
w = rightmost
h = bottommost
text_block_cropped = text_block_cropped[y:h, x:w]

In [None]:
#@title Which image to proceed with?

#@markdown **Run this cell** to create a dropdown
#@markdown menu to determine how to proceed with
#@markdown with attempting to binarize the image

binarize_select = widgets.Dropdown(
    description='Did you re-crop the image?',\
    options = ['Yes', 'No'],\
    value = 'No',
    style={'description_width': 'initial'})
display(binarize_select)



## 3 - Try Adaptive Thresholding
If you get good results with adaptive thresholding in this step, you can proceed to number 4 (Deskew or Save?).

In [None]:
#Code cell #6
if binarize_select.value == 'No' :
  rebinarize_image = cv2blurred_image
else :
  text_block_gray = cv2.cvtColor(text_block_cropped, cv2.COLOR_BGR2GRAY)
  text_block_blurred = cv2.GaussianBlur(text_block_gray, (blur.value, blur.value), 0)
  rebinarize_image = text_block_blurred
cv2_imshow(rebinarize_image)

In [None]:
#Code cell #8
cv2binary_adaptive_image = cv2.adaptiveThreshold(rebinarize_image, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 101, 30)
cv2_imshow(cv2binary_adaptive_image)

## 3 - Try Manual Thresholding
If you're not liking the results you're getting with adaptive thresholding, you can try manual thresholding, instead. When you've gotten the image looking good to your mind, move on to number 4 (Deskew or Save?).

In [None]:
#Code cell #9
if binarize_select.value == 'No' :
  pilcolor_image = Image.open(source_image)
else :
  pilcolor_image = Image.fromarray(text_block_cropped)

pilgray_image = pilcolor_image.convert('L')

In [None]:
#@title Set a threshold value {display-mode: "form"}
 #@markdown Run this cell, then use the slider that will appear to adjust the threshold point for our image in the cell below.

 #@markdown You only need to run this cell once (re-running it will just set things back to the default value). Try adjusting the slider and then re-running the *next* cell a few times to see the difference that different threshold values make.
thresh_value_slider = widgets.IntSlider(
    min=0,
    max=255,
    step=1,
    description='Threshold:',
    value=150
)
display(thresh_value_slider)

In [None]:
#Code cell #10
thresh = thresh_value_slider.value
fn = lambda x : 255 if x > thresh else 0
pilbinary_image = pilgray_image.convert('L').point(fn, mode='1')
pilbinary_image

## 4 - Deskew or Save?

In [None]:
#{display-mode: 'form'}
#@markdown (Run this cell to create some widgets for this step.)

#@markdown Do we need to deskew? If so, which thresholding method produced the better result?

#@markdown If you're ready to save the image, select "No" and choose which
#@markdown thresholded image to save, then skip to the "Save" section and
#@markdown and proceed to re-OCR.

#@markdown If the image needs deskewing, select "Yes" and indicate which of
#@markdown the thresholded images should be used for deskewing.
proceed_to_deskew = widgets.Dropdown(
    description='Deskew?',\
    options = ['Yes', 'No'],\
    value = 'Yes',
    style = {'description_width': 'initial'}
    )
thresholded_image = widgets.Dropdown(
    description='Binarization Method',\
    options = ['Adaptive Threshold', 'Manual Threshold'],\
    value = 'Adaptive Threshold',
    style={'description_width': 'initial'}
    )
display(proceed_to_deskew)
display(thresholded_image)

### 4.a - Deskew

In [None]:
#Code cell #11
if thresholded_image.value == 'Adaptive Threshold' :
  image_to_deskew = cv2binary_adaptive_image
else :
  pass_to_cv2 = np.array(pilbinary_image)
  image_to_deskew = pass_to_cv2.astype(np.uint8) * 255
thresh = cv2.threshold(image_to_deskew, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

In [None]:
#@title Set dilation variables { display-mode: "form" }
#@markdown (Run this cell to create a slider for setting the dilation amount.)
kernel_width = widgets.IntSlider(description = 'Kernel width', \
                                               min=10, max=50, step=5, value=30)
kernel_height = widgets.IntSlider(description='Kernel height', \
                                                 min=1, max=10, step=1, value=1)
num_iterations = widgets.IntSlider(description='Iterations', min=1, \
                      max=10, step=1, value=5)
display(kernel_width)
display(kernel_height)
display(num_iterations)

In [None]:
#Code cell #12
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernel_width.value, kernel_height.value))
#We dilate the pixels using the shape defined by kernel, and perform the operation
#five times. You could try increasing or decreasing the number of iterations to
#see how the output changes.
dilate = cv2.dilate(thresh, kernel, iterations=num_iterations.value)
cv2_imshow(dilate)

In [None]:
#Code cell #13
contours, hierarchy = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
sorted_contours = sorted(contours, key = cv2.contourArea, reverse = True)

In [None]:
#Code cell #14
def draw_min_area_rect(cv2minimumarearectangle, base_image) :
  draw_min_area_rect = cv2.cvtColor(base_image, cv2.COLOR_BayerGR2RGB)
  if isinstance(cv2minimumarearectangle, list) == True :
    print(len(cv2minimumarearectangle))
    for rect in cv2minimumarearectangle :
      min_area_box = cv2.boxPoints(rect)
      min_area_box = np.int0(min_area_box)
      draw_min_area_rect = cv2.line(draw_min_area_rect, (min_area_box[0][0], min_area_box[0][1]), \
                                    (min_area_box[1][0], min_area_box[1][1]), (0, 30, 255), 3)
      draw_min_area_rect = cv2.line(draw_min_area_rect, (min_area_box[1][0], min_area_box[1][1]), \
                                    (min_area_box[2][0], min_area_box[2][1]), (0, 30, 255), 3)
      draw_min_area_rect = cv2.line(draw_min_area_rect, (min_area_box[2][0], min_area_box[2][1]), \
                                    (min_area_box[3][0], min_area_box[3][1]), (0, 30, 255), 3)
      draw_min_area_rect = cv2.line(draw_min_area_rect, (min_area_box[3][0], min_area_box[3][1]), \
                                    (min_area_box[0][0], min_area_box[0][1]), (0, 30, 255), 3)
      cv2.putText(draw_min_area_rect, str(rect[-1]),
                  (int(rect[0][0]) -100, int(rect[0][1])), cv2.FONT_HERSHEY_SIMPLEX,
                  1, (0, 30, 255, 255), 3)
  else :
    min_area_box = cv2.boxPoints(cv2minimumarearectangle)
    min_area_box = np.int0(min_area_box)
    draw_min_area_rect = cv2.line(draw_min_area_rect, (min_area_box[0][0], min_area_box[0][1]), \
                                  (min_area_box[1][0], min_area_box[1][1]), (0, 30, 255), 3)
    draw_min_area_rect = cv2.line(draw_min_area_rect, (min_area_box[1][0], min_area_box[1][1]), \
                                  (min_area_box[2][0], min_area_box[2][1]), (0, 30, 255), 3)
    draw_min_area_rect = cv2.line(draw_min_area_rect, (min_area_box[2][0], min_area_box[2][1]), \
                                  (min_area_box[3][0], min_area_box[3][1]), (0, 30, 255), 3)
    draw_min_area_rect = cv2.line(draw_min_area_rect, (min_area_box[3][0], min_area_box[3][1]), \
                                  (min_area_box[0][0], min_area_box[0][1]), (0, 30, 255), 3)
    cv2.putText(draw_min_area_rect, str(cv2minimumarearectangle[-1]),
                (int(cv2minimumarearectangle[0][0]) -100, int(cv2minimumarearectangle[0][1])),
                cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 30, 255, 255), 3)

  return draw_min_area_rect

In [None]:
#@title Angle Calculation Method{display-mode: 'form'}
#@markdown (Run this cell to create a widget for use in this step.)

#@markdown Do you want to use all minAreaRect angles for deskewing, or
#@markdown just the angles from a subset of the largest contours?
angle_method = widgets.Dropdown(
    description='Select method',\
    options = ['All Rects', 'Selected'],\
    value = 'All Rects',
    style={'description_width': 'initial'})
num_rects = widgets.IntSlider(description='Top rects', min=1, \
                      max=5, step=1, value=1)

display(angle_method)
display(num_rects)


In [None]:
#Code cell #15
rects = []
if angle_method.value == 'All Rects' :
  for contour in contours :
    minAreaRect = cv2.minAreaRect(contour)
    if minAreaRect[1][1] > 60 :
      if minAreaRect[-1] not in [-0.0, 0.0, -90.0] :
        rects.append(minAreaRect)
else :
  for contour in sorted_contours[0:num_rects.value] :
    minAreaRect = cv2.minAreaRect(contour)
    rects.append(minAreaRect)

draw_all_rects = draw_min_area_rect(rects, dilate)
cv2_imshow(draw_all_rects)

In [None]:
#Code cell #16
angle_corrections = []
for rect in rects :
  points = cv2.boxPoints(rect)
  point_tuples = [(point[0], point[1]) for point in points]
  sorted_point_tuples = sorted(point_tuples, key = lambda x: x[1])
  if -200 < sorted_point_tuples[-1][0] - sorted_point_tuples[-2][0] < 200 :
    sorted_point_tuples = [sorted_point_tuples[0], sorted_point_tuples[2], \
                           sorted_point_tuples[1], sorted_point_tuples[3]]
  if sorted_point_tuples[-1][0] < sorted_point_tuples[-2][0] :
    angle_corrections.append((90 - rect[-1], 1))
  else :
    angle_corrections.append((rect[-1], -1))
  average_angle = np.mean([angle_tuple[0] for angle_tuple in angle_corrections])
  plus_or_minus = sum(angle_tuple[1] for angle_tuple in angle_corrections)
  if plus_or_minus > 0 :
    average_angle = -1.0 * average_angle

In [None]:
#Code cell #17
average_angle_deskew = image_to_deskew.copy()
(h, w) = average_angle_deskew.shape[:2]
center = (w // 2, h // 2)
# M = cv2.getRotationMatrix2D(center, angle, 1.0)
M = cv2.getRotationMatrix2D(center, average_angle, 1.0)
deskewed_average_angle = cv2.warpAffine(average_angle_deskew, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE)
cv2_imshow(deskewed_average_angle)

### 4.b - Save
Save the reprocessed image before re-OCR'ing.

In [None]:
#Code cell #18
import os
output_directory = '/content/penn_pr3732_t7_1730b-bw/'

outname = image_select.value.rstrip('.tif') + '-bw.tif'
with open(output_directory + outname, 'wb') as new_image :
  if proceed_to_deskew.value == 'Yes' :
    final_image = Image.fromarray(deskewed_average_angle)

  else :
    if thresholded_image.value == 'Adaptive Threshold' :
      final_image = Image.fromarray(cv2binary_adaptive_image)

    if thresholded_image.value == 'Manual Threshold' :
      pass_to_cv2 = np.array(pilbinary_image)
      intermediate_image = pass_to_cv2.astype(np.uint8) * 255
      final_image = Image.fromarray(intermediate_image)

  print('Saving ' + image_source_directory + outname)
  final_image.save(new_image)

## Move output files back to Google Drive

In [None]:
#Code cell #19
%cd /content/
!zip -r penn_pr3732_t7_1730b-bw.zip penn_pr3732_t7_1730b-bw/
!mv penn_pr3732_t7_1730b-bw.zip /gdrive/MyDrive/rbs_digital_approaches_2023/output/penn_pr3732_t7_1730b-bw.zip


## Clear Colaboratory environment

In [None]:
#Code cell #20
%cd /content/
!rm -r ./*

## Moving on to preliminary OCR to get hOCR output
The next notebook will have you moving files back into the Colaboratory environment to perform preliminary OCR to get hOCR output and then slice up your page images into line level images. That will be the last step for now!