<a href="https://colab.research.google.com/github/dionny/ai-tutorial-notebooks/blob/main/template_matching.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Template Matching for Element Localization and Classification

Template matching is a powerful tool for image-based element localization and classification. Template matching algorithms take as input two images, a template image (e.g., an element) and a query image (e.g., a screenshot). If the template image appears as a sub-image within the query image, the template matching algorithm will return the coordinates at which the template appears within the query image; some algorithms also return a confidence score or match percentage. There are a wide variety of template matching algorithms in existence. This notebook uses naive template matching algorithms from OpenCV.

After running the cell below, use the widget to upload a query image (e.g., a screenshot).

In [None]:
import ipywidgets

screenshot_uploader = ipywidgets.FileUpload(
    accept='.png',
    multiple=False,
)
display(screenshot_uploader)

After running the cell below, use the widget to upload a template image (e.g., a cropped screen element from the screenshot image). The image should have the same scale as the query image (i.e., do not re-scale the template image after cropping), otherwise, the algorithms used below will be unable to localize the template in the query image.

In [None]:
template_uploader = ipywidgets.FileUpload(
    accept='.png',
    multiple=False,
)
display(template_uploader)

Visualize the template and query images by running the cell below.

In [None]:
import io

import PIL.Image

import notebook_utils

# Get uploaded images.
screenshot_image = notebook_utils.get_uploaded_image(screenshot_uploader)
template_image = notebook_utils.get_uploaded_image(template_uploader)

# Populate grid with titles and images.
num_columns = 2
grid_gap = '30px'
title_grid = ipywidgets.GridspecLayout(1, num_columns, grid_gap=grid_gap)
column_titles = ['Screenshot Image', 'Template Image']
for i, column_title in enumerate(column_titles):
    title_grid[0, i] = ipywidgets.HTML(value='<h1>{text}</h1>'.format(text=column_title))
grid = ipywidgets.GridspecLayout(1, num_columns, grid_gap=grid_gap)
grid[0, 0] = ipywidgets.Image(value=notebook_utils.convert_image_to_bytes(screenshot_image), max_width='50%')
grid[0, 1] = ipywidgets.Image(value=notebook_utils.convert_image_to_bytes(template_image), max_width='50%')

# Display grid.
display(title_grid)
display(grid)

The cell below uses a template matching algorithm from OpenCV to create an element classifier that will detect the presence of the template in the screenshot image and localize the template within the screenshot image. You can experiment by adjusting the confidence threshold, [template matching comparison method](https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_template_matching/py_template_matching.html), using [canny edges](https://docs.opencv.org/master/da/d22/tutorial_py_canny.html) and converting to grayscale.

In [None]:
import cv2
from IPython.display import Markdown
import numpy


class TemplateMatcher(object):

    def __init__(
        self,
        threshold=0.9,
        comparison_method=cv2.TM_CCOEFF_NORMED,
        use_canny=False,
        use_grayscale=False,
    ):
        self.threshold = threshold
        self.comparison_method = comparison_method
        self.use_canny = use_canny
        self.use_grayscale = use_grayscale

    @staticmethod
    def _convert_to_numpy_array_if_necessary(image):
        if isinstance(image, PIL.Image.Image):
            image = numpy.array(image)
        return image

    def _convert_to_canny_if_option_set(self, image):
        if self.use_canny:
            image = cv2.Canny(image, 50, 200)
        return image

    def _convert_to_grayscale_if_option_set(self, image):
        if self.use_grayscale:
            image = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
        return image

    def _preprocess_image(self, image):
        image = self._convert_to_numpy_array_if_necessary(image)
        image = self._convert_to_grayscale_if_option_set(image)
        image = self._convert_to_canny_if_option_set(image)
        return image

    def fit(self, query_image):
        self.query_image = self._preprocess_image(query_image)

    def predict(self, template_image):
        template_image = self._preprocess_image(template_image)
        match = cv2.matchTemplate(self.query_image, template_image, self.comparison_method)
        _, confidence_score, _, coordinates = cv2.minMaxLoc(match)
        match_found = confidence_score >= self.threshold
        return match_found, confidence_score, coordinates


# Experiment by changing the threshold, comparison method, canny and grayscale settings below.
threshold = 0.95  # The confidence threshold can be adjusted to values between 0 and 1.
use_canny = False  # Specify whether to use canny edges.
use_grayscale = False  # Specify whether to use grayscale image.

# The following are valid values for the comparison method:
#     cv2.TM_CCOEFF
#     cv2.TM_CCOEFF_NORMED
#     cv2.TM_CCORR
#     cv2.TM_CCORR_NORMED
comparison_method = cv2.TM_CCOEFF_NORMED

# Run the classifier with the uploaded screenshot and template images.
template_matcher = TemplateMatcher(
    threshold=threshold,
    comparison_method=comparison_method,
    use_canny=use_canny,
    use_grayscale=use_grayscale,
)
template_matcher.fit(screenshot_image)
match_found, confidence_score, coordinates = template_matcher.predict(template_image)

# Display results of classification.
if match_found:
    # Populate grid.
    grid = ipywidgets.GridspecLayout(1, 2, grid_gap=grid_gap)
    title_grid = ipywidgets.GridspecLayout(1, num_columns, grid_gap=grid_gap)
    column_titles = ['Screenshot Image', 'Confidence Score']
    for i, column_title in enumerate(column_titles):
        title_grid[0, i] = ipywidgets.HTML(value='<h1>{text}</h1>'.format(text=column_title))
    top_left = coordinates
    bottom_right = notebook_utils.get_bottom_right(template_image, top_left)
    screenshot_with_bounding_box = notebook_utils.draw_bounding_box(screenshot_image, top_left, bottom_right)
    grid[0, 0] = ipywidgets.Image(value=notebook_utils.convert_image_to_bytes(screenshot_with_bounding_box), max_width='50%')
    grid[0, 1] = ipywidgets.Label('Score: {confidence_score}'.format(confidence_score=confidence_score))

    # Display results.
    display(title_grid)
    display(grid)
else:
    # Display message indicating no match found.
    message = 'No matches, confidence score {confidence_score} below the threshold. Try adjusting the values above or upload new images.'.format(confidence_score=confidence_score)
    display(Markdown(message))

The naive template matching algorithms used above are computationally inexpensive and have excellent [precision](https://en.wikipedia.org/wiki/Precision_and_recall#Precision) when used as element classifiers. However, they require the template and query images to have the same scale, otherwise, they will be unable to localize the template within the query image. This is a major drawback in the context of AI-based testing, since screenshots taken from different devices will typically differ in scale. Scale-invariant template matching algorithms exist that generalize across varying screen resolutions.