Austin Feydt (apf31)

EECS 531 - A1

21 February 2018
# Exercise 3: Template Matching

# Mathematical Background:

One strategy for identifying features in an image is to attempt to match a template to the image.  Similarly to a convolutional kernel, we will slide the template over the entire image.  However, rather than performing a convolution, we will instead define a metric to identify how "different" the template is to the current patch.

Let A be our template and B be a patch of the image (of size $nxn$)
$$A=[a_{ij}],\quad B=[b_{ij}],\quad 1\leq i,j\leq n$$

Then, our distance function is defined as:
$$D(A,B)=\sum_{i=1}^{n}\sum_{j=1}^{n}(a_{ij}-b_{ij})^2$$

When $A=B$, then $D(A,B)=0$. Otherwise, $D(A,B)>0$, meaning that any non-zero result for $D(A,B)$ implies that there is some difference between the current patch and the template, whether it be noise or just true negative. Thus, we will compute this distance function over every patch in the image, and then classify the patch as a positive or a negative based on some threshold level of acceptable value for the distance function.


# Letter Detection:

I started by recreating the original letter detection problem, in which we try to identify 'h's in a block of typed text:
&nbsp;
Paragraph:
![paragraph:](A1_Images/paragraph.png)

&nbsp;
Template:
![template:](A1_Images/h_template.png)



# Implementation:

In [8]:
from skimage import io, color
from skimage.data import camera
from skimage.filters import sobel
import numpy as np
import warnings
warnings.filterwarnings('ignore')

In [83]:
# Calculates "distance" between pixel values for each patch in the image
def get_distances(image, template):
   #Get appropriate dimensions
    par_height = image.shape[0]
    par_width = image.shape[1]
    template_height = template.shape[0]
    template_width = template.shape[1]
    rh = par_height - template_height
    rw = par_width - template_width 
     
    distances = np.zeros((rh,rw))

    #scan over the image
    for x in range(0,rh):
        for y in range(0,rw):
            patch = image[x:x + template_height,y: y + template_width]
            # compute using distance function
            distances[x,y] = np.sum(np.power(patch - template,2))
    return distances

# Draws bounding boxes around all positively classified objects (based on threshold)
#and returns positive count needed for ROC
def draw_boxes(image, template, distances, threshold, saveImage):
    new_image = np.copy(image)
    par_height = image.shape[0]
    par_width = image.shape[1]
    template_height = template.shape[0]
    template_width = template.shape[1]
    rh = par_height - template_height
    rw = par_width - template_width 
     
    pos_count = 0
    total = 0
    for x in range(0,rh):
        for y in range(0,rw):
            total += 1
            if distances[x,y] < threshold:
                pos_count+=1
                new_image[x,y] = [1,0,0]
                new_image[x,y:y + template_width] = [1,0,0]
                new_image[x + template_height,y:y + template_width] = [1,0,0]
                new_image[x: x + template_height,y] = [1,0,0]
                new_image[x: x + template_height,y + template_width] = [1,0,0]
    
    # So we don't save a bunch of pictures in exercise 4 :D
    if(saveImage):          
        io.imsave("A1_Images/threshold" + str(threshold) + ".png", new_image)
    return pos_count, total

In [86]:
text = io.imread("A1_Images/paragraph.png")
text = color.rgb2gray(text)

template = io.imread("A1_Images/h_template.png") 
template = color.rgb2gray(template)

distances = get_distances(text,template)
new_image = color.gray2rgb(text)

draw_boxes(new_image, template, distances, 5, True)
draw_boxes(new_image, template, distances, 20, True)
draw_boxes(new_image, template, distances, 30, True)

(2328, 372360)

# Results:
&nbsp;
Threshold=5
![Threshold=5](A1_Images/threshold5.png)

&nbsp;
Threshold=20
![Threshold=20](A1_Images/threshold20.png)

&nbsp;
Threshold=30
![Threshold=30](A1_Images/threshold30.png)


Above, I have displayed a few images, showing what the classifier identified as positive at different distance-threshold values.  As we increase our threshold, we allow the classifier to make more mistakes, as it starts thinking that n's and m's are the same as h's, since they have similar shapes in their bottom halfs.  If we raise the threshold even more, we see that almost all of the letters are getting classified as positives (almost all of them are false positives!)  Although this example seems trivial, we will see in a later example how important it is to tune this threshold parameter, especially if we are interested in getting as many true positives as possible.

# A New Feature Detector (Shape Detection):

To explore template matching more, I decided to make my own image.  It's comprised of rectangles, circles, and a few diamonds/polygons.  The goal for the classifier is to correctly identify the rectangles.  However, a few of the rectangles are obstructed by the other shapes.  We want to find an optimal threshold value that will identify these obstructed rectangles as positive examples!

&nbsp;
Original Image:
![Image:](A1_Images/shapes.png)

&nbsp;
Template:
![Template](A1_Images/box_template.png)


# Implementation

In [84]:
text = io.imread("A1_Images/shapes.png")
text = color.rgb2gray(text)

template = io.imread("A1_Images/box_template.png") 
template = color.rgb2gray(template)

distances = get_distances(text,template)
new_image = color.gray2rgb(text)

draw_boxes(new_image, template, distances, 1, True)
draw_boxes(new_image, template, distances, 9, True)
draw_boxes(new_image, template, distances, 15, True)

(6, 8064)

# Exploratory Results:
&nbsp;
Different Threshold Results:
![different threshold results:](A1_Images/result_shape.png)


With this example, we were interested in finding a threshold value that would correctly classify the obstructed rectangles.  By starting the threshold at a low value, we can see that it can correctly identify all un-obstructed rectangles.  However, as we increase the threshold, we see that the obstructed rectangles begin to fall below the threshold, and they get correctly classified. We can see that the more obstructed the object, the higher the threshold must be to correctly classify.  However, this begs the question: do we actually want to correctly classify an object that's 25% or 50% covered by another image? What is our confidence that this obstructed object is even what we think it is? 