## Handwriting Recognition with HOG

In [1]:
# Import all the needed libraries
import cv2
import joblib
import mahotas
import numpy as np
from skimage import feature
from sklearn.svm import LinearSVC

In [2]:
# Define the HOG class to extract the features from the Handwritten images
class HOG:
    def __init__(self, orientations=9, pixelsPerCell=(8, 8), cellsPerBlock=(3, 3), transform=False):
        self.orientations = orientations
        self.pixelsPerCell = pixelsPerCell
        self.cellsPerBlock = cellsPerBlock
        self.transform = transform
    
    def describe(self, image):    
        hist = feature.hog(image,
                           orientations = self.orientations,
                           pixels_per_cell = self.pixelsPerCell,
                           cells_per_block = self.cellsPerBlock,
                           transform_sqrt = self.transform)
        return hist

Sets up __init__ constructor requires four parameters. The first, orientations, defines how many **gradient orientations** will be in each histogram (i.e., the number of bins). The **pixelsPerCell** parameter defines the number of pixels that will fall into each cell. When computing the HOG descriptor over an image, the image will be partitioned into multiple cells, each of size pixelsPerCell × pixelsPerCell.

A histogram of gradient magnitudes will then be computed for each cell. HOG will then normalize each of the histograms according to the number of cells that fall into each block using the **cellsPerBlock** argument.

Optionally, HOG can apply **power law compression** (taking the log/square-root of the input image), which can lead to better accuracy of the descriptor.

In [3]:
def load_digits(datasetPath):
    # Convert text data into dataframe
    data = np.genfromtxt(datasetPath, delimiter=",", dtype="uint8")
    target = data[:, 0]
    data = data[:, 1:].reshape(data.shape[0], 28, 28)
    return (data, target)

Next, the dataset of digits that he can use to extract features from and train his machine learning model. Let's use a sample of the **MNIST digit recognition dataset**.

The sample of the dataset consists of **5000 data points**, each with a feature vector of length 784, corresponding to the **28 × 28 grayscale pixel** intensities of the image

In [4]:
# Function used to resize the image according to width or height
def resize_image(image, width=None, height=None, inter=cv2.INTER_AREA):
    (h, w) = image.shape[:2]
    dim = None
    
    if width is None and height is None:
        return image
    
    if width is None:
        r = height / float(h)
        dim = (int(r * w), height)
    else:
        r = width / float(w)
        dim = (width, int(r * h))
        
    resized = cv2.resize(image, dim, interpolation=inter)
    return resized

In [5]:
def deskew(image, width):
    (h, w) = image.shape[:2]
    # find moments
    moments = cv2.moments(image)
    
    skew = moments["mu11"] / moments["mu02"]
    M = np.float32([[1, skew, -0.5 * w * skew], [0, 1, 0]])
    image = cv2.warpAffine(image, M, (w, h), flags=cv2.WARP_INVERSE_MAP|cv2.INTER_LINEAR)
    image = resize_image(image, width=width)
    
    return image

This **deskew** function takes two arguments. The first is the image of the digit that is going to be deskewed. The second is the width that the image is going to be resized to.

This function grabs the height and width of the image, then the moments of the image are computed. These **moments** contain statistical information regarding the distribution of the location of the white pixels in the image. The skew is computed based on the moments and the warping matrix. This matrix M will be used to deskew the image.

The actual deskewing of the image take places on when we call the cv2.warpAffine function. The first argument is the image that is going to be skewed, the second is the matrix M that defines the “direction” in which the image is going to be deskewed, and the third parameter is the resulting width and height of the deskewed image.

Finally, the flags parameter controls how the image is going to be deskewed(linear interpolation). The deskewed image is then resized and returned to the caller.

In [6]:
def center_extent(image, size):
    (eW, eH) = size
    
    if image.shape[1] > image.shape[0]:
        image = resize_image(image, width = eW)
    else:
        image = resize_image(image, height = eH)
        
    extent = np.zeros((eH, eW), dtype = "uint8")

    offsetX = (eW - image.shape[1]) // 2
    offsetY = (eH - image.shape[0]) // 2
    extent[offsetY:offsetY+image.shape[0], offsetX:offsetX+image.shape[1]] = image

    CM = mahotas.center_of_mass(extent)
    (cY, cX) = np. round(CM).astype("int32")
    
    (dX, dY) = ((size[0] // 2) - cX, (size[1] // 2) - cY)
    M = np. float32([[1, 0, dX], [0, 1 , dY]])
    
    extent = cv2.warpAffine(extent, M, size)
    
    return extent

In this function, First checks to see if the width is greater than the height of the image. If this is the case, the image is resized based on its width.Otherwise the height is greater than the width, so the image must be resized based on its height.

Hank notes that these are import ant checks to make. If this checks were not made and the image was always resized on its width than there is a chance that the height could be larger than the width, and thus would not fit into the “extent” of the image.

These offsets indicate the starting (x, y) coordinates (y, x order) of where the image will be placed in the extent. The actual extent is using some NumPy array slicing.The next step is to translate the digit so it is placed at the center of the image. For that computes the weighted mean of the white pixels in the image using the center_of_mass function of the mahotas package. This function returns the weighted (x, y) coordinates of the center of the image, then converts these (x, y) coordinates to integers rather than floats, then translates the digit so that it is placed at the center of the image.

In [7]:
# Load the data using our helper function called load_digits
(digits, target) = load_digits("data/train.csv")

In [7]:
# Creating the object hog for the HOG class 
hog = HOG(orientations=18,
          pixelsPerCell=(10, 10),
          cellsPerBlock=(1, 1),
          transform=True)

In [9]:
data = []

# Loop the image in the dataset 
for image in digits:
    # deskew the image
    image = deskew(image, 20)
    # place the image at the center
    image = center_extent(image, (20, 20))
    
    # Extract the feature of the image
    hist = hog.describe(image)
    data.append(hist)

In [10]:
# Calling the LinearSVC
model = LinearSVC(random_state=42)
# Fit the data
model.fit(data, target)

LinearSVC(random_state=42)

In [11]:
# Save the model 
joblib.dump(model, "models/svm3.cpickle")

['models/svm2.cpickle']

In [8]:
# Load the model
model = joblib.load("models/svm.cpickle")

In [16]:
# Read the image
image = cv2.imread("images/num.jpeg")
# Convert RGB to gray scale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Blur the image using Gaussian Blur
blurred = cv2.GaussianBlur(gray, (5, 5), 0)
# Edge Detection using Canny 
edged = cv2.Canny(blurred, 30, 150)

In [17]:
# Find the contuors of the numbers in the image
(cnts, _) = cv2.findContours(edged.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Sort the contours
cnts = sorted([(c, cv2.boundingRect(c)[0]) for c in cnts], key=lambda x: x[1])

In [18]:
for (c, _) in cnts:
    # un pack the contuors
    (x, y, w, h) = cv2.boundingRect(c)

    if w >= 7 and h >= 20:
        # Crop the digit in the image
        roi = gray[y:y + h, x:x + w]
        # Copy the image
        thresh = roi.copy()
        # find the threshold value using Otsu
        T = mahotas.thresholding.otsu(roi)
        # Replace 255 if the pixels exceds the threshold vlue
        thresh[thresh > T] = 255
        # Change black to white and white to black
        thresh = cv2.bitwise_not(thresh)
        
        # Deskew the digit in the image
        thresh = deskew(thresh, 50)
        # place it in the center
        thresh = center_extent(thresh, (50, 50))
        
        # Show the digit 
        #cv2.imshow("thresh", thresh)
        # find features for the digit
        hist = hog.describe(thresh)
        # Predict the digit using the trained model
        digit = model.predict([hist])[0]
        print("I think that number is: {}" .format(digit))
        
        # Make the rectangle on the digit
        cv2.rectangle(image, (x, y), (x+w, y+h), (0, 255, 0), 1)
        # Show the predicted image's digit on the left top 
        cv2.putText(image, str(digit), (x-10, y-10), cv2.FONT_HERSHEY_SIMPLEX, 1.2, (0, 255, 0), 2)
        # Show the image
        cv2.imshow("image", image)
        cv2.waitKey(0)

ValueError: X has 450 features per sample; expecting 72