# Case Study: Support Vector Machines in Object Detection



## What we will discuss today...

### 1. Basic Introduction of Support Vector Machines

-  #### a. What is SVM ?

-  #### b. Applications of SVM

### 2. Case Study: Digit Recognition in Images
  
   

-  ####   a. Object Detection Problem?      

- ####   b. How to prepare dataset?      

- ####   c. What kind of features required?  

- ####   d. Histogram of oriented gradient as the features.   

- #### e. Prepare feature set for traiining our model 

- #### f. Train our SVM model to classify digits in the Images.

- #### g. Test the classifier on images.

# What is Support Vector Machine?

Support Vector Machines are perhaps one of the most popular machine learning algorithms; it is a supervised machine learning algorithm which can be used for both classification and regression tasks. However, it is mostly used in classification problems. SVMs are also known as; Maximal margin classifier, Soft margin classifier, linear SVM and kernel based SVM. 

## What it does actually?

Support Vectors are simply the co-ordinates of individual observation. Let’s understand it with the help of an example.
 We have a population composed of 50%-50% Males and Females. Using a sample of this population, you want to create some set of rules which will guide us the gender class for rest of the population. Using this algorithm, we intend to build a robot which can identify whether a person is a Male or a Female. This is a sample problem of classification analysis. Using some set of rules, we will try to classify the population into two possible segments. For simplicity, let’s assume that the two differentiating factors identified are; height of the individual and hair Length. Following is a scatter plot of the sample.

<img src="SVM_2.png">

Now as I have mentioned earlier that SVM are the coordinates of individual observations; for instance, (45,150) is a support vector which corresponds to a female. Support Vector Machine is a frontier which best segregates the Male from the Females. In this case, the two classes are well separated from each other; hence it is easier to find a SVM.

Now question is how to find out the frontiers; for current example, following figure shows three possible frontiers;

<img src="SVM_3.png">

So what do you think; How do we decide which is the best frontier for this particular problem statement?
The easiest way to interpret the objective function in a SVM is to find the minimum distance of the frontier from closest support vector (this can belong to any class). For instance, orange frontier is closest to blue circles. And the closest blue circle is 2 units away from the frontier. Once we have these distances for all the frontiers, we simply choose the frontier with the maximum distance (from the closest support vector). Out of the three shown frontiers, we see the black frontier is farthest from nearest support vector (i.e. 15 units).

What is Hyper-plane?
Geometry tells us that a hyperplane is a subspace of one dimension less than its ambient space. For instance, a hyperplane of an n-dimensional space is a flat subset with dimension n − 1. By its nature, it separates the space into two half spaces. For machine learning tasks we can re write the hyper-planes as;
- Linear decision surface that splits the space into two parts.
- Hyper-plane is a Binary classifier.
Following figure shows the hyper planes;

<img src="Hyper-Planes.png">


# Application of SVMs

1. Image Segmentation and Categorization
2. Geographic Image Processing
3. Handwriting recognition
4. Healthcare : Analyzing a group of over million people for myocardial infarction within a period of 10 years is an application area of Support vector machines.
5. Prediction whether a person is depressed or not based on bag of words from the corpus seems to be conveniently solvable using SVM.

# Case Study: Object Detection and Classification using SVM

## What is an Image ?

An image is just another numerical matrix which you have seen in your maths classes earlier; you can apply any algebric operation on it as you can apply on any other matrix; these operations may be simple maths such as addition, subtraction, multiplication etc. or it may be any complex analysis such as singular vector decomposition or principle component analysis. we can represent an image as following:

<img src="lincoln_pixel_values.png">

## MNIST Digit Recognition Problem in Image processing

This problem is known as the "Hello World!" problem of Machine learning world; whether you are working with conventional ML algorithms like one we are working on or stat of the art deep learning every thing related to classification starts from here, reason for this love is it is simple to understand so we will also use the same application to start our journey with it.

Lets see how our the data set looks like and what are the expectations with the classifiers.

<img src="DigitData.png">

### What does we wanna do?
Well we want to build a classifier which can do this..

<img src="DigitRecognition.png">


### So How the hell we gonna do that!!!

Well answer is quite simple for that, we will train a Classifier to this work for us!

But before moving forward lets see how we can deal with this problem using a Machine learning based frame work


## Generalize Machine Learning framework
<img src="GeneralizedML.png">

We will start our journey with data preparation

# Data Preparation

Well as this is most published and well appreciated problem, we can easily download its data set from online repositary; original data set is quite huge in size, it have around 60000 hand written digits from different people. but we will not going to work with all 60k we will load only 5000 data instances and try to come up with a solution.

lets write some code to fetch the data set and see how it really looks like;

In [None]:
import numpy as np

def load_digits(datasetPath):
    # build the dataset and then split it into data
    # and labels
    data = np.genfromtxt(datasetPath, delimiter = ",", dtype = "uint8")
    target = data[:, 0]
    data = data[:, 1:].reshape(data.shape[0], 28, 28)
    print(type(data))
    print(data[0])

    # return a tuple of the data and targets
    return (data, target)

Lets plot some of the loaded instances

In [None]:
data,label = load_digits('digits.csv')

%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt

for i in range(9):
  plt.subplot(3,3,i+1)
  plt.tight_layout()
  plt.imshow(data[i], cmap='gray', interpolation='none')
  plt.title("Digit: {}".format(label[i]))
  plt.xticks([])
  plt.yticks([])
plt.show()

So this is how our data set look like; now what to do with this data set; well this problem is not that simple as it is looking. Before starting to implement a classifier we need to put a lot of work to pre-process the data so that we can make a efficient classifier we will do following pre-processing operations to our data set.

- a. Deskew images.
- b. Re-Center image content.

We will write following line of code for deskew an image.

In [3]:
import cv2
def deskew(image, width):
    # grab the width and height of the image and compute
    # moments for the image
    (h, w) = image.shape[:2]
    moments = cv2.moments(image)
    
    # deskew the image by applying an affine transformation
    skew = moments["mu11"] / moments["mu02"]
    M = np.float32([
        [1, skew, -0.5 * w * skew],
        [0, 1, 0]])
    image = cv2.warpAffine(image, M, (w, h),
        flags = cv2.WARP_INVERSE_MAP | cv2.INTER_LINEAR)

    # resize the image to have a constant width
    image = cv2.resize(image, (28,28))
    
    # return the deskewed image
    return image

ModuleNotFoundError: No module named 'cv2'

Lets see how a deskew image looks like;

In [4]:
image = deskew(data[0],28)

plt.subplot(1,2,1)
plt.tight_layout()
plt.imshow(data[0], cmap='gray', interpolation='none')
plt.title("Skewed Image")
plt.subplot(1,2,2)
plt.tight_layout()
plt.imshow(image, cmap='gray', interpolation='none')
plt.title("De-Skewed Image")

plt.show()

NameError: name 'deskew' is not defined

Now lets see effect of Extent Center; Code for that will look like;

In [None]:
import mahotas
def center_extent(image, size):
    # grab the extent width and height
    (eW, eH) = size
    
    #Image Shape is
    (h, w) = image.shape[:2]
    
    #New dimension according to image aspect ratio
    dim = None
    
    # handle when the width is greater than the height
    if image.shape[1] > image.shape[0]:
        #image = resize(image, width = eW)
        r = eW / float(w)
        dim = (eW, int(h * r))
        image = cv2.resize(image,dim,cv2.INTER_AREA)

    # otherwise, the height is greater than the width
    else:
        #image = resize(image, height = eH)
        r = eH / float(h)
        dim = (int(w * r), eH)
        image = cv2.resize(image,dim,cv2.INTER_AREA)

    # allocate memory for the extent of the image and
    # grab it
    extent = np.zeros((eH, eW), dtype = "uint8")
    offsetX = (eW - image.shape[1]) / 2
    offsetY = (eH - image.shape[0]) / 2
    extent[offsetY:offsetY + image.shape[0], offsetX:offsetX + image.shape[1]] = image

    # compute the center of mass of the image and then
    # move the center of mass to the center of the image
    (cY, cX) = np.round(mahotas.center_of_mass(extent)).astype("int32")
    (dX, dY) = ((size[0] / 2) - cX, (size[1] / 2) - cY)
    M = np.float32([[1, 0, dX], [0, 1, dY]])
    extent = cv2.warpAffine(extent, M, size)

    # return the extent of the image
    return extent

Lets see how it works

In [None]:
im = cv2.imread('Decenter.png')
im = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
image = center_extent(im,(28,28))

plt.subplot(1,2,1)
plt.tight_layout()
plt.imshow(im, cmap='gray', interpolation='none')
plt.title("De-Centered")
plt.subplot(1,2,2)
plt.tight_layout()
plt.imshow(image, cmap='gray', interpolation='none')
plt.title("Centered Image")

plt.show()

So we are ready with our pre-processing modules; now our next task is to extract meaniningfull features out of our images so we can use those features for our use. 

# Histogram of Oriented Gradients (HOG) features.

As its name suggests it contains three key terms

- Histogram
- Oriented
- Gradients

Now Histogram is nothing but a frequncy map which shows how many times a random variable appears in the context; Orientation is directly associated with angles; and Gradients signifies transitions of a random variable. so HOG shows us a frequency map of edges(Gradients) in different orientations of an Image.

Lets try to understand it with an example

Suppose we are having an Image of containing different shapes like following 

<img src="HOG_1.png">


So how we can calculate such histograms for our images;  in practice when we calculate HOG of an image it will always calculated in following manner.

<img src="HOG_2.png">

So How we will gonna do it in our case, well there is a function in skimage library which allows us to extract HOG features out of the images and we will do the same for our images too; we will write a method fo calculating HOG for our images.

definition will go as follows:

In [None]:
from skimage import feature

def HOG_describe(image):
    # compute HOG for the image
    hist = feature.hog(image, orientations = 9,
        pixels_per_cell = (8, 8),
        cells_per_block = (3, 3)
        )
    return hist

Lets try out our function for an image to calculate its HOG features;

In [None]:
hist = HOG_describe(data[0])

print(np.shape(hist))

So As you can see there are 81 features were extracted from the image it is roughly 9 histograms from an image with 9 bins 

So this is the time to combine all the knowledge we have gained and create a digit classifier so lets roll it we will do following steps to complete this task:

- Build a data set
- Pre-Process the data set (De-skew and Centralisation)
- Train a classifier on the data set.

Following method will do the same for us;

In [None]:
from sklearn.svm import LinearSVC
import cPickle
def train():
    # load the dataset and initialize the data matrix
    path2model = 'svm_cs.cpickle'
    path2data = 'digits.csv'
    
    #Image size
    factor = 28
    (digits, target) = load_digits(path2data)
    data = [] 
    
    # loop over the images
    for image in digits:
        # deskew the image, center it
        image = deskew(image, factor)
        image = center_extent(image, (factor, factor))
    
        # describe the image and update the data matrix
        hist = HOG_describe(image)
        data.append(hist)
    
    # train the model
    model = LinearSVC()
    print(model)
    model.fit(data, target)
    
    # dump the model to file
    f = open(path2model, "w")
    f.write(cPickle.dumps(model))
    f.close()

Lets call the above method for our data set and train a SVM classifier;

In [None]:
train()

So all good till here it is the time to test our classifier on real images
lets roll it;

In [1]:
def test():
    
    path2model = 'svm_cs.cpickle'
    path2im =  'cellphone.png'
    factor = 28
    # load the model
    model = open(path2model).read()
    model = cPickle.loads(model)
    
    # initialize the HOG descriptor
    # load the image and convert it to grayscale
    image = cv2.imread(path2im)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    
    # blur the image, find edges, and then find contours along
    # the edged regions
    blurred = cv2.GaussianBlur(gray, (5, 5), 0)
    edged = cv2.Canny(blurred, 30, 150)
    (_,cnts, _) = cv2.findContours(edged.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    
    # sort the contours by their x-axis position, ensuring
    # that we read the numbers from left to right
    cnts = sorted([(c, cv2.boundingRect(c)[0]) for c in cnts], key = lambda x: x[1])
    
    # loop over the contours
    for (c, _) in cnts:
        # compute the bounding box for the rectangle
        (x, y, w, h) = cv2.boundingRect(c)
    
        # if the width is at least 7 pixels and the height
        # is at least 20 pixels, the contour is likely a digit
        if w >= 7 and h >= 20:
            # crop the ROI and then threshold the grayscale
            # ROI to reveal the digit
            roi = gray[y:y + h, x:x + w]
            thresh = roi.copy()
            T = mahotas.thresholding.otsu(roi)
            thresh[thresh > T] = 255
            thresh = cv2.bitwise_not(thresh)
    
            # deskew the image center its extent
            thresh = deskew(thresh, factor)
            thresh = center_extent(thresh, (factor, factor))
#             thresh = cv2.resize(thresh,(28,28))    
            # extract features from the image and classify it
            hist = HOG_describe(thresh)
            digit = model.predict(np.array([hist]))[0]
            # draw a rectangle around the digit, the show what the
            # digit was classified as
            cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 1)
            cv2.putText(image, str(digit), (x - 10, y - 10),
                cv2.FONT_HERSHEY_SIMPLEX, 1.2, (0, 255, 0), 2)
    plt.figure(figsize=(15,10))
    plt.imshow(image)
    plt.show()

In [2]:
test()

NameError: name 'cPickle' is not defined