### Week 12 | Image processing with OpenCV and Pytesseract
<hr>
## Learning Objectives
At the end of this lesson, you will be able to:

- Read an image using `imread`

- Convert an image to grayscale using `cv2.COLOR_BGR2GRAY`

- Apply a threshold to the grayscale image using `cv2.THRESH_BINARY_INV` to produce binary images

- Apply morphological transofrmations to the binary image from the thresholding, using a structuring element/kernel such as `cv2.MORPH_RECT` and morphologyEx operations such as `cv2.MORPH_DILATE` and `cv2.MORPH_OPEN`

- Find and use the contours from the morph generated from the morphological transofrmation, using a retrieval mode such as `cv2.RETR_EXTERNAL` and a contour approximation method such as `cv2.CHAIN_APPROX_SIMPLE` or `cv2.CHAIN_APPROX_NONE`

- Visualise the contour generated using `cv2.rectangle` to draw the contours

- Use Pytesseract to read the text embedded in the contours found from the images

### Reading the image
The image is read using `cv2.imread`, before applying Gaussian Blur.
![gray](gray.jpg)

In [2]:
import cv2
import numpy as np
import pytesseract
from PIL import Image, ImageOps
# https://stackoverflow.com/questions/61199573/how-to-detect-the-text-above-lines-using-opencv-in-python

image_path = "Questions.jpg"

# Read the image
img = cv2.imread(image_path)

# Blurring
imgBlur = cv2.GaussianBlur(img, (7, 7), 1)

# Convert image to grayscale, which is needed for thresholding
gray = cv2.cvtColor(imgBlur, cv2.COLOR_BGR2GRAY)
cv2.imshow("gray", cv2.resize(gray,(700,850)))
cv2.waitKey(0)
cv2.destroyAllWindows()

### Threshold the grayscale image
- If the pixel value is greater than a threshold value,
it is assigned one value(which may be white), else it is assigned another value.(which may be black)

    1. First argument is source image, which should be a grayscale image
    2. Second argument is the threshold value which is used to classify the pixel values
    3. Third argument is the maxVal which represents the value to be given if the pixel value is more than the 
threshold value

- In this case, cv2.THRESH_BINARY_INV is used, which sets a pixel to 0(black) if it is greater
than the threshold value(second argument), else set it to maxVal(third argument).
THRESH_OTSU allows you to avoid choosing the threshold value and it it determined automatically

- For the given image, notice that since the white background is greater than 0, it is set to black. Also, as the black words is 0, it is set to white. The purpose of this thresholding is so that the text is white on black background
https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_thresholding/py_thresholding.html

![thresh](thresh.jpg)

In [3]:
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
cv2.imshow("thresh", cv2.resize(thresh,(700,850)))
cv2.waitKey(0)
cv2.destroyAllWindows()

### Morphological transformations (Part 1)
- Morphological transformations are some simple operations based on the image shape. It is 
normally performed on binary images. It needs two inputs, one is our original image, second one 
is called structuring element or kernel which decides the nature of operation. 
- `cv2.MORPH_RECT` is used to obtain a rectangular kernel.
- For `cv2.MORPH_DILATE` (dilating), a pixel element is '1'(white) if atleast one pixel under the kernel is '1'. 
So it increases the white region in the image or size of foreground object increases
- This step applies morphology dilate with horizontal kernel to blur text in a line together. This is 
because the white text is "dilated" and merges with the other text in the same line
based on the rectangular kernel(which has 200 width and 3 height).
![morph1](morph1.jpg)

In [4]:
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (200, 3)) #250,3
morph = cv2.morphologyEx(thresh, cv2.MORPH_DILATE, kernel)
cv2.imshow("morph1", cv2.resize(morph,(700,850)))
cv2.waitKey(0)
cv2.destroyAllWindows()

### Morphological transformations (Part 2)
- For erosion, pixel in the original image (either 1 or 0) will be considered 1 only
if all the pixels under the kernel is 1, otherwise it is eroded (made to zero).
Hence, it causes the bondaries of the foreground object to erode away.
- In this case, MORPH_OPEN is just another name of erosion followed by dilation,
which is useful in removing noise.
- This step applies morphology open with a vertical kernel to remove the thin lines from the dotted lines. A rectangular kernel is used(which has 3 width and 17 height)
![morph2](morph2.jpg)

In [5]:
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 17)) #3,17
morph = cv2.morphologyEx(morph, cv2.MORPH_OPEN, kernel)
cv2.imshow("morph2", cv2.resize(morph,(700,850)))
cv2.waitKey(0)
cv2.destroyAllWindows()

### Contours generation
- Contours can be explained simply as a curve joining all the continuous points (along the boundary), 
having same color or intensity. The contours are a useful tool for shape analysis and object 
detection and recognition.
- In OpenCV, finding contours is like finding white object from black background. So remember, 
object to be found should be white and background should be black. Here, the retrieval
mode(second argument) used is the `cv2.RETR_EXTERNAL`, which retrieves only
the extreme outer contours.
- Contours are the boundaries of a shape with same intensity. It stores the (x,y) coordinates 
of the boundary of a shape. But does it store all the coordinates ? That is specified by this 
contour approximation method(third argument). Here we use `cv2.CHAIN_APPROX_SIMPLE`, 
which the contour is stored using only the four corners of the rectangle. This 
saves memory as compared to `cv2.CHAIN_APPROX_NONE`, which stores the contour 
using the entire outline of the rectangle.

In [6]:
cntrs = cv2.findContours(morph, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cntrs = cntrs[0] if len(cntrs) == 2 else cntrs[1]
cntrs

[array([[[ 712, 2279]],
 
        [[ 712, 2300]],
 
        [[ 739, 2300]],
 
        [[ 740, 2301]],
 
        [[ 942, 2301]],
 
        [[ 942, 2285]],
 
        [[ 941, 2285]],
 
        [[ 940, 2284]],
 
        [[ 940, 2279]]], dtype=int32), array([[[ 758, 2245]],
 
        [[ 757, 2246]],
 
        [[ 756, 2246]],
 
        [[ 755, 2247]],
 
        [[ 627, 2247]],
 
        [[ 626, 2248]],
 
        [[ 626, 2249]],
 
        [[ 624, 2251]],
 
        [[ 560, 2251]],
 
        [[ 560, 2267]],
 
        [[ 628, 2267]],
 
        [[ 629, 2268]],
 
        [[ 669, 2268]],
 
        [[ 670, 2269]],
 
        [[ 670, 2272]],
 
        [[ 671, 2273]],
 
        [[ 901, 2273]],
 
        [[ 902, 2272]],
 
        [[ 902, 2269]],
 
        [[ 903, 2268]],
 
        [[1030, 2268]],
 
        [[1031, 2267]],
 
        [[1055, 2267]],
 
        [[1055, 2251]],
 
        [[ 960, 2251]],
 
        [[ 959, 2250]],
 
        [[ 959, 2245]]], dtype=int32), array([[[ 164, 2012]],
 
        [[ 164

### Using Pytesseract on the contours
- Loop through the contour list, and for each contour, extract the text from the contour image using Pytesseract
- `cv2.rectangle` draws the contours as rectangle for display (0, 0, 255) is BGR for RED and 2 is the thickness.
- Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and “read” the text embedded in images.
- `ROI` refers to the Region of Interest, which is specified by the contours found above.
- As the contours found is not sorted, we append the text found into `ordered_values_tuples`(ordered list of tuples), with the text and the y_coordinate.
- After that, we sort the `ordered_value_tuples` based on the y_coordinates Hence, it will now contain the text found in the image from top to bottom in order
![result](result.jpg)

In [7]:
ordered_value_tuples = []
for c in cntrs:
    area = cv2.contourArea(c)
    x,y,w,h = cv2.boundingRect(c)
    cv2.rectangle(img, (x, y), (x+w, y+h), (0, 0, 255), 2)

    pytesseract.pytesseract.tesseract_cmd = "C:/Program Files/Tesseract-OCR/tesseract.exe"

    image = cv2.imread(image_path, 0)
    thresh = 255 - cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

    ROI = thresh[y:y+h,x:x+w]

    data = pytesseract.image_to_string(ROI, lang='eng',config='--psm 6')
    
    ordered_value_tuples.append((data, y))

ordered_value_tuples.sort(key = lambda tup: tup[1])

for value in ordered_value_tuples:
    print(value)
    
cv2.imshow("result", cv2.resize(img,(700,850)))
cv2.waitKey(0)
cv2.destroyAllWindows()

('For each question from 1 to 10, four options are given. One of them is the correct', 295)
('answer. Make your choice (1, 2, 3 or 4) and shade your answer on the Optical Answer', 332)
('Sheet.', 365)
('(10 marks)', 378)
('1. Had | known you were il, would not you.', 474)
('(1) — disturb', 546)
('{2) disturbed', 582)
('{3} had disturbed', 618)
('{4) have disturbed', 653)
('2. “She\'d be upset about her poor results, : ?”" asked Munah.', 795)
('-{1) could she', 868)
('(2) would she', 903)
("(3) — couldn't she", 939)
('{4) — wouldn’t she', 975)
('3. My best friend made me .', 1118)
('for him so that we could have lunch together.', 1124)
('(1) wait', 1190)
('(2) waits', 1226)
('{3} waited', 1261)
('(4) — waiting', 1297)
('4. The twins, accompanied by their father,', 1440)
('travelling to Vietnam this', 1452)
('weekend.', 1476)
('(1) ss', 1547)
('(2) are', 1583)
('(3) was', 1619)
('(4) were', 1654)
('5. “', 1796)
('it rain later, the match will be postponed,” the coach reminded the', 1797)