# Promotional Code Optical Character Recognition and Selenium Automation 
Utilizing Python to automate the boring stuff by Jaume Clave   
June 15th, 2020


### The Project
Every morning I wake up and start my morning routine to limber up and to approach the day head on. I shower, make a fantastic cup of coffee and sit down in front of my computer before looking at my notes and seeing what I have planned for the day. Later on I make breakfast and when in front of my computer I tend to have yogurt and some muesli because it is easy, fast and delicious. I always have different flavored yogurts from Yeo Valley. Yeo Valley is a British family-owned farming and dairy company based in the village of Blagdon, in the Yeo Valley, Somerset and in Cannington, near Bridgwater, Somerset. Yeo Valley hold a running offer involving what they call "Yoekens". Yoekens are points that can be collected and attributed to your account when you enter a Yeoken code on Yeo Valley's website. Yeokens serve as a way to spend and redeem offers on the website for example you can redeem your Yoekens for special items, experiences in Yeo Valley, charity donations and various other services and products. Because I eat a lot of yogurt, I have a lot of these codes but I haven't been logging into the website and inputting them because I could never be bothered to do so...

This project aims to make use of all the codes I have collected and will continue to collect as my breakfasts roll by. This project, from start to finish, completes all the steps involved in redeeming Yoekens on successful submission of the promotional code. As of the time of completing this project Yeo Valley has partnered up with Octopus Energy to offer a Tesla Model 3 on a lucky promotional code. Now, not only are you rewarded with Yoekens when you submit a code you are also giving the chance to win a Tesla. 

The project has been broken down into four main steps. This has been done to make it easier to tackle the problem. As is the case most of the time it tends to be easier and faster to breakdown a situation/project/problem into more manageable tasks completing each one individually before tying it all together. 

1. The first step in this problem is to extract the image from my phone which I use to take a picture of the yogurt lid and code and load the file into computer memory. 
2. Secondly, the image must be processed and transformed to highlight the 14 character code and hide all the noise around it.
3. The third step is to input the processed image into an OCR engine in order to actually extract the code in such a way that is readable for a machine.  
4. Finally, once the code is read and stored as a variable the code submission website may be accessed, details entered and the form submitted. At that point Yeokens will be added to my Yeo Valley account and hopefully I also walk away with a winning Tesla!

How each step works, its objective and the code needed to complete each stage of the puzzle is explained in detail.


## Index
[Create Directory](#Create-Directory)  

[WhatsApp Web Image Download](#WhatsApp-Web-Image-Download)  
i. [Selenium](#Selenium)  
ii. [WhatsApp Web Image Download In Action](#WhatsApp-Web-Image-Download-In-Action)  

[Optical Character Recognition (OCR)](#Optical-Character-Recognition-(OCR))  
i. [Image Preprocessing](#Image-Preprocessing)  
ii. [OpenCV-Python (cv2)](#OpenCV-Python-(cv2))  
iii. [Circle Detection](#Circle-Detection)  
iv. [Geometric Transformations](#Geometric-Transformations)  
v. [Thresholding](#Thresholding)  
vi. [Morphological Transformations](#Morphological-Transformations)  
vii. [Image Blurring](#Image-Blurring)  

[Optical Character Recognition With the OCRSpace API](#Optical-Character-Recognition-With-the-OCRSpace-API)  

[Yeo Valley Competition](#Yeo-Valley-Competition)  
i. [Simulating Human-Like Typing](#Simulating-Human-Like-Typing)  
ii. [Octopus Energy / Yeo Valley Form Submission In Action](#Octopus-Energy-/-Yeo-Valley-Form-Submission-In-Action)  

[Conclusion](#Conclusion)  
[Further Reading](#Further-Reading)  

## Create Directory 
This project is continuous, it is an ongoing event that will require the full code to run every 2 to 3 days. Every time I open a Yeo yogurt to have with my breakfast muesli this code will need to run. The process involved is relatively complex and there are various moving parts in the system. At different stages of the process different files need to be downloaded, saved and opened. It is therefore imperative that files be organised and easily accessible not only to this Notebook and script but to me as the human who needs to find it easy to check on events if something goes wrong.

In order to keep it all clean and organised a folder called "form_automation" will hold a growing amount of folders called "submission_YYYY_MM_DD". Where YYYY_MM_DD is the day the the code runs and therefore the day that specific folder gets created. This folder will contain the original image that is sent and downloaded through WhatsApp, it will contains images that are saved at different stages of the image processing step and it will contain a screenshot of the page post successful submission.

In [14]:
## Create new directory for "today"
import os
from datetime import date
today = date.today()

newpath = rf'C:\Users\Jaume\Documents\Python Projects\form_automation\submission_{today}' 
if not os.path.exists(newpath):
    os.mkdir(newpath)

## WhatsApp Web Image Download
### Selenium
Selenium WebDriver is one of the most popular tools for Web UI Automation. The *selenium* package is used to automate web browser interaction from Python. Web UI Automation refers to the automatic execution of the actions preformed in a web browser window like navigating to a website, filing forms that deal with input to text boxes, with radio buttons, downloading of images and files, submission of forms and general website interactions. With Selenium a user is able to automate all this stuff; automate the boring stuff! 

In order to automate with selenium on Google Chrome, chromedriver.exe must be downloaded because a path to this needs to be executable on your computer. WebDriver is an open source tool for automated testing of webapps across many browsers. It provides capabilities for navigating to web pages, user input, JavaScript execution, and more.  ChromeDriver is a standalone server that implements the W3C WebDriver standard.

This section utilizes Selenium to access WhatsApp Web, find a specific chat and download the most recent image sent in that chat. This is the first step in the four step process needed to complete this project.


In [15]:
## Import packages
from selenium import webdriver
from selenium.webdriver import Chrome, ChromeOptions
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
import os
import shutil
import time
import warnings
warnings.filterwarnings('ignore')

#### Selenium to Access WhatsApp Web
WhatsApp has been chosen as the application to send and receive the image of the yogurt lid simply because it is the messaging system I use the most and I wanted the image capture to form submission steps to be a hassle-free process. I want to automate something and make is simpler not harder, so sticking with something I use seemed fit.

There are quite a few steps required to download and store and image from WhatsApp Web using selenium. In order to facilitate everything a group chat called "Win Me a Tesla" was created. This is the chat that will be the receiver of the yogurt lid images I take and send. Once the chat has received the image, we can now access it with our code. 

The first step is to load the WhatsApp Web using the chromedriver. WhatsApp requires you to scan a QR code with your phone the first time the WhatsApp Web service is accessed on a browser in order to verify and load the users conversations and media. This was completed manually and from then on, because a user profile is loaded using the 'user-data-dir=""' argument is added to the chromedriver options this will not have to be repeated. The page can now be accessed using the .get() method which loads the URL. The time.sleep() function is used to add delay in the execution of the program. It used to halt the execution of the program for given time in seconds and therefore give the web page enough time to load before moving on to the next line of code.


In [16]:
## Load Profile
options = webdriver.ChromeOptions()

## Load Chrome profile and download directorty 
options.add_argument(r'user-data-dir=C:\Users\Jaume\AppData\Local\Google\Chrome\User Data\Python')
options.add_argument("--disable-notifications")
options.add_experimental_option("prefs", {
  "download.default_directory": newpath,
  "download.prompt_for_download": False,
  "download.directory_upgrade": True,
  "safebrowsing.enabled": True
})

## Chrome with profile
browser = Chrome(r'C:\Users\Jaume\chromedriver.exe', chrome_options = options)

## Load WhatsApp Web
browser.get('https://web.whatsapp.com/')
time.sleep(20)

#### Finding Elements With Selenium
There are many ways to find an element on a page using selenium, all require some basic understanding of HTML. XPath is the language used for locating nodes in an XML document. As HTML can be an implementation of XML (XHTML), we can leverage this powerful language to target elements in the WhatsApp Web application. XPath extends beyond (as well as supporting) the simple methods of locating by id or name attributes, and opens up all sorts of new possibilities such as locating the third checkbox on the page.

One of the main reasons for using XPath is when you don’t have a suitable id or name attribute for the element you wish to locate. You can use XPath to either locate the element in absolute terms (not advised), or relative to an element that does have an id or name attribute. XPath locators can also be used to specify elements via attributes other than id and name. 

Absolute XPaths contain the location of all elements from the root (HTML) and as a result are likely to fail with only the slightest adjustment to the application. By finding a nearby element with an id or name attribute (ideally a parent element) you can locate your target element based on the relationship. This is much less likely to change and can make your tests more robust. 

The first element that must be found by selenium is the chat that contains the image. We know that the image is sent to "Win Me a Tesla", therefore by inspecting the HTML using the Chrome DevTools a unique attribute that relates to that chat may be found. Once found it can be located and clicked using the browser.find_element_by_xpath('//*[@title="Win Me a Tesla"]').click() function. Once that is action, we delay the execution of a system slightly to allow for new elements to load and populate the HTML.

In [17]:
## Find chat
browser.find_element_by_xpath('//*[@title="Win Me a Tesla"]').click()
time.sleep(3)

## Click image
picture = browser.find_element_by_xpath('//*[@class="_39rvu"]')
picture.click()
time.sleep(2)

#### Downloading and Structuring the Image
The chat is now accessed and the picture is visible and ready to download. The next steps are to click on the image, wait for it to load, click the download arrow, wait for it to download, and exit out of the full screen image view. Before that process runs the .listfor() method which returns a list containing the names of the entries in the directory given by path. The list is in arbitrary order. This returns all the file names found in the 'C:\Users\Jaume\Documents\Python Projects\form_automation\submission_{today}' path. This variable will later be used and compared to the same path after the file has been downloaded. The list of paths from the 'newpath' variable will have one more file on it than the snapshot that was taken before requesting the file download. The extra file can then be identified and it will be renamed in order to keep everything organised. 

Once the file has been downloaded and the file renamed to something more memorable related to the date it was downloaded, the file is than closed and will be deleted.


In [18]:
## Start download process
## Initalize directory for file name change
before = os.listdir(newpath)

## Click download
browser.find_element_by_xpath('//*[@title="Download"]').click()
time.sleep(2)

## Finalize file name change
initial_path = newpath
filename = max([initial_path + "\\" + f for f in os.listdir(initial_path)], key = os.path.getctime)
image_name = f"WhatsApp_Yeokens_{today}.PNG"
shutil.move(filename, os.path.join(initial_path, image_name)) ## dynamic filename

## Close picture preview
browser.find_element_by_xpath('//*[@title="Close"]').click()

An example of the image of the yogurt lid can be found below. The code we are after is the text at the bottom of the lid that starts in "PLR..."

<img src="http://drive.google.com/uc?export=view&id=1qaB5nvacYPowd8i9nsczCflR6QrEaYsT" style="height: 700px;"/>

#### Selenium's Powerful ActionChains
In WhatsApp Web, in order to delete, forward, star it or to action any further functionality a user must hover on top of the image, which triggers a 'v'-shaped icon to appear on the top right hand side of the image. A user then clicks that 'v'-shape icon and is presented with further options. This 'v'-shaped icon does not appear on the HTML until a user hovers on top of the image as the hover action triggers the '< span >' tag to open up and reveal the elements actual attributes. The code had to mimic this action do the move_to_element() method needs to be used. This is enhanced by ActionChains, which are a way to automate low-level interactions such as mouse movements, mouse button actions, keypress, and context menu interactions. They are useful for doing more complex actions like hover over and drag and drop. Action chain methods are used by advanced scripts where we need to drag an element, click an element, double click, etc. This allowed the code to hover over to the picture wait for the attributes involved with the 'v'-shape icon to populate and from there the image may be deleted by finding the element through and XPath and clicking the desired element.


In [19]:
## Hover on image in order for triangle dropdown
action = ActionChains(browser)
action.move_to_element(picture).perform()   
time.sleep(2)

## Click dropdown arrow for message options
browser.find_element_by_xpath('//span/div[@class="_4tndQ _1vTsI _1ohds"]/div[@class="huqNi"]').click()
time.sleep(1)

#### Deleting the Message
The 'Delete message' button when clicked can trigger various new option panels depending on the time the button is clicked. WhatsApp introduced a feature called 'Delete For All' which deletes the message that you sent for everyone in the chat, while the 'Delete Message' option only deletes the message for the sender (or for whoever uniquely deletes the message). The 'Delete For All' options is only available for an hour after the message is sent, after that the message is part of the chat and each individual member must delete it in order for it to disappear on that individuals device. The last step of this process is to have selenium delete the image/message in order to make it easier to find and extract a new image once it is sent. 

Because there are various delete message options, depending on what time the delete call is made, the code below as been programmed so that it tries all various different delete button element location clicks. This programming logic is characterized by Easier to ask for forgiveness than permission (EAFP). This common Python coding style assumes the existence of valid keys or attributes and catches exceptions if the assumption proves false. This clean and fast style is characterized by the presence of many try and except statements. Various try and except clauses are nested so that if one delete element is not found, the code continuous with the following attempt. Finally the browser and webpage are closed.


In [20]:
## Delete message
browser.find_element_by_xpath('//*[@title="Delete message"]').click()
time.sleep(1)

try: 
    browser.find_element_by_xpath('//*[@id="app"]/div/span[2]/div/span/div/div/div/div/div/div[3]/div[2]').click()
except:
    try:    
        browser.find_element_by_xpath('//*[@id="app"]/div/span[2]/div/span/div/div/div/div/div/div[3]/div/div[3]').click()
    except:
        pass

time.sleep(5)

browser.close()
browser.quit()

### WhatsApp Web Image Download In Action
Below is a .gif that is a screen recording of the whole process coded and described above. WhatsApp Web is loaded and the picture is found, downloaded and deleted before closing the browser.


<img src="http://drive.google.com/uc?export=view&id=1oW6La8O3B9ovJc7G4KunwlCt1zbJ2pRS"/>

## Optical Character Recognition (OCR)
Optical character recognition is the conversion of images typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document and other images of documents/signs/ and anything of text. OCR is widely used as a form of data entry in order to go from paper data records to electronic records without having to type the records out. These documents tend to be invoices, bank statements, printouts of static-data and many other things. It helps digitalize printed text so that it can be electronically searched for, stored and displayed. It is commonly used in machine processes such as cognitive computing, machine translation, text-to-speech and text mining. OCR is an important field of study in patter recognition, artificial intelligent and computer vision. 

OCR Engines are software tools used to convert image/handwritten text into machine readable and editable formats. The OCR engine should be designed for industrial strength, corporate volume scanning & OCR needs. Thorough robust functionality, configurations for speed, volume, and automation are required. The most common and powerful type of OCR engine can read more stylized fonts commonly available on the desktop PC. Some OCR engines generally do not process well on fonts that are designed specifically for recognition, such as OCR-A. That is because those fonts have peculiarities that set them apart from more standard fonts. Some other OCR engines are trained specifically to read fonts such as OCR, OCR-B, and MICR as on checks. Huge dictionary, despeckle, format retention, batch retention, and easier error correction are the features to look out for in good OCR engine. 

This project will use an OCR engine provided and supported by OCR.space which is a service of a9t9 software GmbH, a software company founded in 2016 by two robotic process automation (RPA) industry veterans. a9t9’s company goal is to convert the recent advances in computer vision into usable automation products and they help achieve this through their OCR.space API. 

### Image Preprocessing 
Text recognition depends on many factors in order to produce good quality outputs. OCR output highly depends on the quality of the image it processes. The images contrast, saturation, lighting. surroundings, blur, pixel density and quality, to name a few, have severe impacts on the engines ability to recognize the text. OCR engines thus provide guidelines regarding the quality of input image and its size in order to increase the engines accuracy. Image processing helps structure and tweak an image in order to improve the input into the OCR engine.

Digital image processing is the use of computer algorithms to perform image processing on digital images. As a subfield of digital signal processing, digital image processing has many advantages over analogue image processing. It allows a much wider range of algorithms to be applied to the input data — the aim of digital image processing is to improve the image data (features) by suppressing unwanted distortions and/or enhancement of some important image features so that our AI-Computer Vision models can benefit from this improved data to work on. 

The processing of digital images can be divided into several classes: image enhancement, image restoration, image analysis, and image compression. In image enhancement, an image is manipulated, mostly by heuristic techniques, so that a human viewer can extract useful information from it. Image restoration techniques aim at processing corrupted images from which there is a statistical or mathematical description of the degradation so that it can be reverted. Image analysis techniques permit that an image be processed so that information can be automatically extracted from it. Examples of image analysis are image segmentation, edge extraction, and texture and motion analysis. An important characteristic of images is the huge amount of information required to represent them. Even a gray-scale image of moderate resolution, say 512 × 512, needs 512 × 512 × 8 ≈ 2 × 106 bits for its representation. Therefore, to be practical to store and transmit digital images, one needs to perform some sort of image compression, whereby the redundancy of the images is exploited for reducing the number of bits needed in their representation.

This section creates multiple functions using the OpenCV-Python module which will allow for the image of the yogurt lid to be analyzed, processed and optimized for OCR engine input.


In [21]:
import cv2
import numpy as np

## get grayscale image
def get_grayscale(image):
    return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    
## noise removal
def remove_noise(image):
    return cv2.medianBlur(image,5)
 
## thresholding
def thresholding(image):
    # threshold the image, setting all foreground pixels to
    # 255 and all background pixels to 0
    return cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]

## dilation
def dilate(image):
    kernel = np.ones((5,5),np.uint8)
    return cv2.dilate(image, kernel, iterations = 1)
    
## erosion
def erode(image):
    kernel = np.ones((5,5),np.uint8)
    return cv2.erode(image, kernel, iterations = 1)

## opening - erosion followed by dilation
def opening(image):
    kernel = np.ones((5,5),np.uint8)
    return cv2.morphologyEx(image, cv2.MORPH_OPEN, kernel)

## canny edge detection
def canny(image):
    return cv2.Canny(image, 100, 200)

## skew correction
def deskew(image):
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    gray = cv2.bitwise_not(gray)
    thresh = cv2.threshold(gray, 0, 255,
        cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
    coords = np.column_stack(np.where(thresh > 0))
    angle = cv2.minAreaRect(coords)[-1]
    if angle < -45:
        angle = -(90 + angle)
    else:
        angle = -angle
    (h, w) = image.shape[:2]
    center = (w // 2, h // 2)
    M = cv2.getRotationMatrix2D(center, angle, 1.0)
    rotated = cv2.warpAffine(image, M, (w, h),
        flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE)    
    return rotated

## template matching
def match_template(image, template):
    return cv2.matchTemplate(image, template, cv2.TM_CCOEFF_NORMED)

### OpenCV-Python (cv2)
OpenCV was started at Intel in 1999 by Gary Bradsky, and the first release came out in 2000. OpenCV now supports a multitude of algorithms related to Computer Vision and Machine Learning and is expanding day by day. OpenCV supports a wide variety of programming languages such as C++, Python, Java, etc., and is available on different platforms including Windows, Linux, OS X, Android, and iOS. 

Compared to languages like C/C++, Python is slower. That said, Python can be easily extended with C/C++, which allows OpenCV to write computationally intensive code in C/C++ and create Python wrappers that can be used as Python modules. This gives them two advantages: first, the code is as fast as the original C/C++ code (since it is the actual C++ code working in background) and second, it easier to code in Python than C/C++. OpenCV-Python is a Python wrapper for the original OpenCV C++ implementation. OpenCV-Python makes use of Numpy, which is a highly optimized library for numerical operations with a MATLAB-style syntax. All the OpenCV array structures are converted to and from Numpy arrays. This also makes it easier to integrate with other libraries that use Numpy such as SciPy and Matplotlib.

In order to being the image processing it is always a smart idea to standardize the size of the image. Each image I will take of different yogurt lids as I eat them will be taken in the same location from roughly the height, but of course, no two pictures will ever be exactly the same. Sometimes I will be slightly closer to the yogurt lid and other times I won't be taking the picture straight on because it is impossible to set myself exactly the same each time I take a picture. This is where we need to begin to be creative... Since the yogurt lid is circular an initial first step is to detect the yogurt lid by searching for circles in the image. This is called circle detection.

### Circle Detection
The general standard equation for the circle centered at $(a, b)$ with radius $r$ is:

$$ (x - a)^2 + (y - b)^2 = r^2 $$ 

To detect circles, a point $(x, y)$ may be fixed. Three parameters: $a$, $b$ and $r$ must be found. Therefore the problem is in a 3-dimensional search space (3 variables). To find possible circles, the algorithm uses a 3D matrix called the "Accumulator Matrix" to store potential $a$, $b$ and $r$ values. The value of a The value of a (x-coordinate of the center) may range from 1 to rows, b (y-coordinate of the center) may range from 1 to cols, and r may range from 1 to $maxRadius = \sqrt{rows^2 + cols^2}$

The algorithm used to detect a circle may be broken down in the following six steps:

1. Initialize the Accumulator Matrix: Initialize the matrix of dimensions row * cols * maxRadius with zeros.
2. Pre-processing the image: Apply blurring, grayscale and an edge detector on the image. This is done to ensure the circles show as darkened edges.
3. Looping through the points: Pick a point $x_i$ on the image.
4. Fixing $r$ and looping through $a$ and $b$: use a double nested loop to find a value of $r$, varying $a$ and $b$ in the given ranges.
5. Voting: Pick the points in the accumulator matrix with the maximum value. These are strong points which indicate the existence of a circle with $a$, $b$ and $r$ parameters. This gives us the Hough space of circles.
6. Finding Circles: Finally, using the above circles as candidate circles, vote according to the image. The maximum voted circle in the accumulator matrix gives us the desired circle.


In [22]:
## FIND CIRCLE 
# Read image. 
img = cv2.imread(os.path.join(initial_path, image_name), cv2.IMREAD_COLOR) 
  
# Convert to grayscale. 
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 
  
# Blur using 3 * 3 kernel. 
gray_blurred = cv2.blur(gray, (3, 3)) 
  
# Apply Hough transform on the blurred image. 
detected_circles = cv2.HoughCircles(gray_blurred,  
                   cv2.HOUGH_GRADIENT, 1, 20, param1 = 50, 
               param2 = 30, minRadius = 100, maxRadius = 500) 
  
# Draw circles that are detected. 
if detected_circles is not None: 
  
    # Convert the circle parameters a, b and r to integers. 
    detected_circles = np.uint16(np.around(detected_circles)) 
  
    for pt in detected_circles[0][:1]: 
        a, b, r = pt[0], pt[1], pt[2] 
  
        # Draw the circumference of the circle. 
        cv2.circle(img, (a, b), r, (0, 255, 0), 2) 
  
        # Draw a small circle (of radius 1) to show the center. 
        cv2.circle(img, (a, b), 1, (0, 0, 255), 3) 
        cv2.imshow("Detected Circle", img) 
        cv2.waitKey(0) 
        cv2.destroyAllWindows()
        #print(a, b, r)

Once the circle is found and detected the image can be cropped to outline a square box around the circle. This is done to help standardize the different images studied which helps with the further processing transformations.

In [23]:
## Crop circle
rectA = (a - r) 
rectB = (b - r)
circle_crop_img = img[rectB:(rectB + 2 * r), rectA:(rectA + 2 * r)]
cv2.imshow('result', circle_crop_img)
cv2.waitKey(0)
cv2.destroyAllWindows()

The above code has identified the circle in the original image, in other words, identified the yogurt lid. The dimensions of the circle have been used to crop out a square that engulfs the yogurt lid. From here the image will be resized so that the circle for each image that is processed becomes identical. This will facilitate all future transformations. The code in question, what we want the OCR engine to extract is "PLRTXKFXPFLMHR".

<img src="http://drive.google.com/uc?export=view&id=1ar4xdrX3YdWN78Z6g81RTGvT2GjFUKdY" style="width: 400px;"/>

### Geometric Transformations
One important constraint that exists in some machine learning algorithms, such as Convolution Neural Networks, is the need to resize the images in your dataset to a unified dimension. This implies that the images must be preprocessed and scaled to have identical widths and heights before fed to the learning algorithm. Once the circle has been cropped to, the images, need to be of the same size so that the next crops happen at identical locations of the image. If any image was any bigger or smaller than another the crops that are used to further zoom in on the code before applying more transformations to it might slightly be off and this would cause an issue. Therefore the size of the image above is transformed to be have an image dimension of 536 for both height and width. The channels, so far are maintained.

Channels are different dimensions of an image that holds value for every pixel - mainly - independently from the value of the other channels. So for example in case of an RGB image all pixels are represented as a Red, a Green, and a Blue values and there is no one summed value for a single pixel. A grayscale image, however has only one channel.


In [24]:
## Circle image size
height, width, channels = circle_crop_img.shape
print(f'The current height of the image is {height}, the current width is {width}. The image has {channels} channels')

The current height of the image is 550, the current width is 550. The image has 3 channels


In [25]:
## Resize image 
resized_image = cv2.resize(circle_crop_img, (536, 536)) # for example

In [26]:
## Sanity check
height, width, channels = resized_image.shape
print(f'The new height of the image is {height}, the new width is {width}. The image has {channels} channels')

The new height of the image is 536, the new width is 536. The image has 3 channels


In [27]:
## Inital rough crop
y1 = 280
y2 = 490
x1 = 35
x2 = 455

roi = circle_crop_img[y1:y2, x1:x2].copy()
cv2.imshow("cropped", roi)
cv2.waitKey(0)
cv2.destroyAllWindows()

The code is now much more centered, the noise image has been reduced slightly. There is less overall text in the image because it has been cropped out as have unnecessary objects in the image. These are the first few steps involved in optimizing for code legibility.

<img src="http://drive.google.com/uc?export=view&id=1a4yM1iDqp5A12xTaFgq2G4rqnEEcemGQ" />

### Thresholding
Thresholding is a process of dividing an image into two (or more) classes of pixels, i.e. “foreground” and “background”. It is mostly used in various Image processing tasks, such as eliminating noise in the OCR process which allows greater image recognition accuracy and segmentation. In order to obtain a thresholded image, usually, we convert the original image into a grayscale image and then apply the thresholding technique. This method is also known as Binarization as we convert the image into a binarized form. A grayscale image is a rectangular tiling of fundamental elements called pixels. A pixel (short for picture element) is a small block that represents the amount of gray intensity to be displayed for that particular portion of the image. For most images, pixel values are integers that range from 0 (black) to 255 (white). 

Here, the matter is straight forward. If pixel value is greater than a threshold value, it is assigned one value (may be white), else it is assigned another value (may be black). In other words, it creates a binary image, a pixel is either pure white or pitch black The function used is cv2.threshold. First argument is the source image, which should be a grayscale image. Second argument is the threshold value which is used to classify the pixel values. Third argument is the maxVal which represents the value to be given if pixel value is more than (sometimes less than) the threshold value. 

The code has a threshold to be 160 (out of 255), then everything that was 160 and under would be converted to 0, or black, and everything above 160 would be converted to 255, or white. Basic thresholding as described above is done by using the type cv.THRESH_BINARY.

### Morphological Transformations
Morphological transformations are some simple operations based on the image shape. It is normally performed on binary images (thresholding). It needs two inputs, one is the original image, second one is the structuring element or kernel which decides the nature of the operation. The two most basic morphological operators are Erosion and Dilation. Then its variant forms like Opening, Closing, Gradient come into play as enhancements from the original two. The closing transformation is used in this section because it is useful in closing small holes inside the foreground objects, or small black points on the object. Closing is the process of Dilation followed by Erosion.

#### Dilation
Here, a pixel element is '1' if at least one pixel under the kernel is '1'. So it increases the white region in the image or size of foreground object increases. Normally, in cases like noise removal, erosion is followed by dilation. Because, erosion removes white noises, but it also shrinks our object. So we dilate it. Since noise is gone, they won't come back, but our object area increases. It is also useful in joining broken parts of an object.

#### Erosion 
The basic idea of erosion is just like soil erosion only, it erodes away the boundaries of foreground object (Always try to keep foreground in white). So what it does? The kernel slides through the image (as in 2D convolution). A pixel in the original image (either 1 or 0) will be considered 1 only if all the pixels under the kernel is 1, otherwise it is eroded (made to zero).
So what happens is that, all the pixels near boundary will be discarded depending upon the size of kernel. So the thickness or size of the foreground object decreases or simply white region decreases in the image. 

In [28]:
## Deskew
dskw = deskew(roi)

## Grayscale and threshold
crop_gray = get_grayscale(dskw)
thresh = cv2.threshold(crop_gray, 160, 255, cv2.THRESH_BINARY)[1]

## Create custom kernel
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 1))
## Perform closing (dilation followed by erosion)
close = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)

cv2.imshow("gray", crop_gray)
cv2.imshow("thresh", thresh)
cv2.imshow("close", close)
cv2.waitKey(0)
cv2.destroyAllWindows()

The image below shows the three transformations in order (grayscale, threshold and close) that have been programmed above.

<img src="http://drive.google.com/uc?export=view&id=1LQ0QYqmQOsD5qsSjsM0QQCyXQtFDA9nv" />


In [29]:
## Second crop
y1 = 75
y2 = 160
x1 = 47
x2 = 285

roi_2 = close[y1:y2, x1:x2].copy()
cv2.imshow("second_crop", roi_2)
cv2.waitKey(0)
cv2.destroyAllWindows()

### Image Blurring
As in any other signals, images also can contain different types of noise, especially because of the source (camera sensor). Image Smoothing techniques help in reducing the noise. In OpenCV, image smoothing (also called blurring) could be done in many ways. 

As for one-dimensional signals, images also can be filtered with various low-pass filters (LPF), high-pass filters (HPF), etc. A LPF helps in removing noise, or blurring the image. A HPF filters helps in finding edges in an image. OpenCV provides a function, cv2.filter2D(), to convolve a kernel with an image. As an example, we will try an averaging filter on an image. A 2x2 averaging filter kernel can be defined as follows:

$$ K = \frac{1}{4} \begin{bmatrix}1 & 1\\1 & 1\end{bmatrix}$$

Filtering with the above kernel results in the following being performed: for each pixel, a 2x2 window is centered on this pixel, all pixels falling within this window are summed up, and the result is then divided by 4. This equates to computing the average of the pixel values inside that window. This operation is performed for all the pixels in the image to produce the output filtered image.

Image blurring is achieved by convolving the image with a low-pass filter kernel. It is useful for removing noise. It actually removes high frequency content (e.g: noise, edges) from the image resulting in edges being blurred when this is filter is applied. (Well, there are blurring techniques which do not blur edges). The "Averaging" technique is used in this transformation. This is done by convolving the image with a normalized box filter. It simply takes the average of all the pixels under kernel area and replaces the central element with this average. This is done by the function cv2.blur().

Blurring in this case is used to create a distortion affect to the keys and blur them. This blur will be visualised below and the image will then be transformed with another thresholding function which will make the characters on the image thicker and more legible. 

In [30]:
## Blur image
blurImg = cv2.blur(roi_2,(2, 2))  
cv2.imshow('blurred_image',blurImg)  
cv2.waitKey(0) 
cv2.destroyAllWindows() 

In [31]:
## Threshold image
threshed = cv2.threshold(blurImg, 230, 255, cv2.THRESH_BINARY)[1]
cv2.imshow('threshed', threshed)  
cv2.waitKey(0) 
cv2.destroyAllWindows() 

The image below shows the first threshold image, the second image has a visible blur applied to it. Finally the third image is a thresholding of the blurred image. The blurred image creates thicker letters and changes the value of white (255) pixels on the leftmost image to darker pixel values. This is taken advantage on the last thresholding, a threshold value of 230 is set meaning that any pixel values of 230 or less will be transformed to pitch black (0) values. This makes the letters thicker and in some cases, usually when the letter has a straight edge, completes the line. It is easier to read the code on the right than the code on the left, for a human and for the OCR engine.


<img src="http://drive.google.com/uc?export=view&id=1wy00wm3nMJYef5tGo5yvv9NXMVZkEfqe" />


The latest thresholded image is saved into memory and given a specific name based on "todays" date. This image is stored as a checkpoint in order something goes wrong, it can be seen if things looked "right" up until this point. The image is then loaded back up and is centered on a white 300 by 300 square. 

This process is facilitated with the PIL module. Python Imaging Library (PIL) is a free and open-source additional library for the Python programming language that adds support for opening, manipulating, and saving many different image file formats.


In [32]:
## Save "final_crop"
from PIL import Image
image_name = f'final_crop_{today}.PNG'
cv2.imwrite(os.path.join(initial_path, image_name), threshed)

True

In [33]:
## Add background to image
img = Image.open(os.path.join(initial_path, image_name), 'r')
img_w, img_h = img.size

## Creat white background
background = Image.new('RGBA', (300, 300), (255, 255, 255, 255))
bg_w, bg_h = background.size
offset = ((bg_w - img_w) // 2, (bg_h - img_h) // 2)

## Place text on background
background.paste(img, offset)
image_name = f'ocr_image_{today}.PNG'
background.save(os.path.join(initial_path, image_name))

In [34]:
## View background image
ocr_image = np.array(background)

cv2.imshow('blurred image', ocr_image) 
  
cv2.waitKey(0) 
cv2.destroyAllWindows() 

The image below is the transformed image and the image that will be processed by the OCR engine. It is fun to remember that the initial image that was downloaded by the selenium chromedriver was that of a yogurt lid on a glass table with a mousepad underneath the image, it looked like [this](#Downloading-and-Structuring-the-Image). All the transformations have made that image become a 14 character black and white shot.  

<img src="http://drive.google.com/uc?export=view&id=1v7aL6G_wuzi403oPr3f1ZkOzCLIulDFH" />

## Optical Character Recognition With the OCRSpace API
The free OCR API provides a simple way of parsing images and multi-page PDF documents (PDF OCR) and getting the extracted text results returned in a JSON format. The free OCR plan has a request limit ogf 25,000 per month which definitely suits the required amount for this project being that I have maybe 10 or so yogurt pots and thus codes to take pictures of every month. 

The OCR API offers two different OCR engine with a different processing logic. Engine 2 has repeatedly been tested with these types of images and has been the more successful engine of the two. Engine 1 at times was not even able to detect any text from the image above while Engine 2 correctly labeled it. 

This project was made especially difficult because of the nature of the font the code is printed in. The font used is called "Dot Matrix". A dot matrix printed character is composed of discrete dots printed in specific order. Dot matrix printing is an extremely cheap alternative to the high quality inkjet printing, and is generally used when the printed content is more important than the print quality. Due to their cost effectiveness, they are extensively used in packaging industries all over the world to print the package contents on the cartons. Thousands of cartons/lids/covers are processed every day in industry, and the information printed on them varies a lot according to the contents. Manual classification of such high number of cartons, and keeping track of them is a tedious job. A robust OCR system to is needed to segment and recognize the sparse dot matrix text printed on these lids and read out the code. Because Engine 2 is better at single number OCR and alphanumeric OCR which is involved in dot matrix text recognition it is the engine the API calls on.

This section takes the final processed image and uses the OCR.space API to process and read the image. The result, "text_detected", is printed below.


In [35]:
## OCR Space API Code
import pandas as pd
api = pd.read_csv('api_key_ocrspace.csv', header = None)[1].iloc[0]

In [36]:
## Optical Character Recognition 
import cv2
import numpy as np
import requests
import io
import json

## Ocr api
url_api = "https://api.ocr.space/parse/image"
_, compressedimage = cv2.imencode(".PNG", ocr_image)
file_bytes = io.BytesIO(compressedimage)

result = requests.post(url_api,
              files = {"screenshot.jpg": file_bytes},
              data = {"apikey": api,
                      "scale" : True,
                      "OCREngine": 2})

result = result.content.decode()
result = json.loads(result)

parsed_results = result.get("ParsedResults")[0]
text_detected = parsed_results.get("ParsedText")
print(text_detected)

PLRTXKFXPFLMHR


The length of the code will always be 14 characters. Those 14 characters will always be capital letters in the English alphabet. The code below is a quick sanity check to ensure that the output of the OCR engine is a 14 letter string. The code inputted in the image above has now been converted to a string and a variable by the OCR engine. This variable is what will be passed on to selenium in the next section which involves code input and form submission automation.

In [38]:
## Sanity check
delimited_text = text_detected.split('\n')

for text in delimited_text:
    print(f'There are {len(text)} characters in the output string')

There are 14 characters in the output string


## Yeo Valley Competition 
We are finally at a point where we can enter and automate the final step in the process. The promotional code has now been extracted and it can now be entered along with my details to https://octopusenergy.yeovalley.co.uk/ in order to enter the draw.

For their new promotion, Yeo Valley and renewable energy company Octopus Energy have gone green – and teamed up to give away a Tesla electric car worth up to £40,000. In addition, there’s the chance to instantly win one of 10,000 young trees. 

Unfortunately the Tesla competition is the type of algorithmic promotion where the prizes are only ‘available’ to be won. Every code entered has the same chance of winning – eg. if there are 20 million promotional packs, and 10,001 prizes to be won, then only approximately 1 in every 2,000 codes entered will be a winner. Unlike a winning moment promotion, the time you enter your code will make no difference to your chance of winning a prize in this promotion. Because so few people bother to input codes for this type of promotion, it’s likely that less than 4% of the 10,000 trees will be won. And of course, with millions of promotional Yeo Valley products on the shelves, this also means the chance of the single winning code for the car prize being entered is absolutely tiny **but** we will try and hope to succeed. The process here makes it easier for sure!

However, like the 2019 Yeo Valley birthday promotion, the good news is that if the Tesla isn’t won instantly before the promotion ends on 13th August 2020, it will be given away in a prize draw from every entry received. So getting these codes inputted under my login is the right way to bolster up my chances of winning. 

### Simulating Human-Like Typing
While filling data in the text field, a function has been created to enter characters with some delay instead of populating the text field in a second. The function fills the text field with a character and add a delay of 0.3 seconds to mimic human behavior. A random pause is added to avoid being detected as a bot. We are also using a default profile that helps to mimic the behavior of a human who is visiting the website.

This section below takes the detected text from the output of the OCR engine and first open the competition website page. It then begins to slow type the promotional code in its appropriate field and populates the name, last name, email and postcode fields on the website. Finally the terms and conditions that are required to be selected are clicked and agreed on. Finally the form is submitted and a screenshot of message on the page post successful submission is taken in order to record the outcome of this submission.

In [39]:
## Function for slow (human-like) typing
def slow_typing(element, text):
   for character in text:
      element.send_keys(character)
      time.sleep(0.3)

In [292]:
## Date
promo_code = delimited_text

## Check length of code is 14
assert len('PLRTXKFXPFLMHR') == 14

## yeaovalley.co.uk
browser.get('https://octopusenergy.yeovalley.co.uk/')

time.sleep(2)

## Fill code
code = browser.find_element_by_id('entrycode')
slow_typing(code, promo_code)

## First name
username = browser.find_element_by_id('firstname')
slow_typing(username, 'Jaume')

time.sleep(1)

## Last name
email = browser.find_element_by_id('lastname')
slow_typing(email, 'Clave Domenech')

time.sleep(2)

## Email
email = browser.find_element_by_id('email')
slow_typing(email, 'j.clavedomenech@gmail.com')

## Postcode
postcode = browser.find_element_by_id('postcode')
slow_typing(postcode, 'SW6 1BQ')

## Click T&Cs
browser.find_element_by_id('terms').click()

## Submit
browser.find_element_by_id('onpack-submit-button').click()

time.sleep(5)

## Screenshot and save
image_name = f'submission_screenshot_{today}.PNG'
browser.save_screenshot(os.path.join(initial_path, image_name))

time.sleep(1)

browser.close()

### Octopus Energy / Yeo Valley Form Submission In Action
Below is a .gif that is a screen recording of the whole process coded and described above. The Yeo Valley Octopus Energy page is loaded and the input field for the code and other personal information are located, filled in and the form is finally submitted before actioning a screenshot and closing the browser.


<img src="http://drive.google.com/uc?export=view&id=1z3YFOmHRzSG0yi9VPJyWXXdCzTvjpaQI" />

## Conclusion
This project has been extremely interesting to think about and complete from start to finish. It has been fun because I've been able to automate something that will help me save time. I'll be able to collect Yeokens and enter myself in a draw to win a Tesla just by eating yogurts and by sending myself a picture of the yogurt lid containing the code on WhatsApp. 

The project has introduced me to website automation that has been experienced thanks to selenium. It is an incredible library that can help automate mundane tasks that are repeatable. This is fantastic for everyday life and office/work life. Optical Character Recognition is a vast field which is extremely difficult to master and understand due to the how complex the task of reading images in a machine understandable way truly is. There are so many variables in size, font, lighting, spacing and many others that make each recognition task different to the rest ultimately making it extremely hard to have a solution which converts and translates many images.  

The code from this project also needs to be more flexible and dynamic, especially the image processing part. What I experienced during this project was that taking pictures of the yogurt lid at different times of the day or even in different weather conditions led to different OCR engine outputs. Different lighting conditions **severely** impacted how the image needed to be processed. For this project, I must take the picture of the lid in an environment that only has artificial lighting in order to ensure that each image is as similar to another image as possible. Much work is needed in the image processing stage in order to make the transformation dynamic if I have the need to take the pictures at any place in the house and in any lighting conditions. 

While completing this project other ideas came to me in regards to the codes... it would be extremely interesting and potentially rewarding if it was possible to reverse-engineer how the codes are made. In other words, find patterns based on many actual yogurt lid codes and be able to generate a working code that has not yet been redeemed. Yeokens codes contain letters only, to make it easier for the user to enter them on tablets and smartphones, and they are 14 letters long. All codes will contain one of the following letters: C, E, F, H, K, L, M, P, R, T, W, X, Y, Z – no other letters will be used. If enough yogurt codes were collected then maybe a model could be trained to identify patterns in letter arrangement and sequence. Using a mixture between a probabilistic approach of finding the next upcoming letter based on hundreds of other samples and a brute force approach powered by Python programming "real" codes could be generated and submitted on the website.


## Further Reading
#### Selenium 
https://selenium-python.readthedocs.io/  
https://www.guru99.com/selenium-python.html   
https://towardsdatascience.com/web-scraping-using-selenium-python-8a60f4cf40ab   

#### Image Preprocessing
https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_thresholding/py_thresholding.html#thresholding  
https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_houghcircles/py_houghcircles.html#hough-circles
https://freecontent.manning.com/the-computer-vision-pipeline-part-3-image-preprocessing/ 

#### Optical Character Recognition (OCR)
https://en.wikipedia.org/wiki/Optical_character_recognition  
https://towardsdatascience.com/a-gentle-introduction-to-ocr-ee1469a201aa  
https://ocr.space/OCRAPI  

#### Yeo Valley / Octopus Energy
https://www.yeovalley.co.uk/  
https://octopus.energy/  
https://octopusenergy.yeovalley.co.uk/  