[Open in Google Colab](https://github.com/MagicShoebox/vt-cs4664-tiny-towns-scorer/blob/main/scorer.ipynb)

# Scorer

Tiny Towns Scorer\
CS 4664: Data-Centric Computing Capstone

### Authors
Alex Owens, Daniel Schoenbach, Payton Klemens

### Acknowledgements

Portions of this project were adapted from tutorials and examples available
on the [OpenCV](https://opencv.org/) and [Keras](https://keras.io/) websites.

In particular, significant portions of code from the tutorials [Feature Detection and Description](https://docs.opencv.org/4.6.0/db/d27/tutorial_py_table_of_contents_feature2d.html) and [Train an Object Detection Model on Pascal VOC 2007 using KerasCV](https://keras.io/guides/keras_cv/retina_net_overview/) were copied entirely or used as the basis for several cells in this notebook.

# Dependencies

This notebook uses some additional libraries not installed on Colab by default:
- `keras-cv` extends Keras with computer vision-focused tools.\
We use the RetinaNet model, bounding box utilities, and more.

- `luketils` is used in, and written by the author of, the tutorial referenced above.\
We use some of its visualization functions.

In [None]:
!pip install keras-cv luketils

Import the major packages we'll be using. Other imports will be done as needed.

In [None]:
import tensorflow as tf
from tensorflow import keras
import keras_cv
import matplotlib.pyplot as plt
import cv2 as cv

# Data

Global parameters that were used to construct the network portion of the model.

`IMAGE_SIZE` - The square image size for the neural network portion of the model. Images are resized to `IMAGE_SIZE`x`IMAGE_SIZE` without respect for their original aspect ratio. Note that once the network has made predictions, other parts of the model will use the original image.

In [None]:
IMAGE_SIZE = 512

The collected data has been made available on a shared Google Drive.\
To access the data, we need to connect the drive to the notebook.

In [None]:
# TODO: Figure out how to share dataset publicly and connect to notebook easily
from os import path
PROJECT_FOLDER = '/content/drive/Shareddrives/DCC Capstone'
IMAGES_FOLDER = path.join(PROJECT_FOLDER, 'Images')
MODEL_FOLDER = path.join(PROJECT_FOLDER, 'Model', 'model')

from google.colab import drive
drive.mount('/content/drive')

We'll also want the list of classes, simply hardcoded here:

In [None]:
class_ids = [
  'brick',
  'chapel',
  'cottage',
  'farm',
  'tavern',
  'theater',
  'wheat',
  'wood',
  'board',
  'factory',
  'stone',
  'well',
  'glass',
]
class_mapping = dict(zip(range(len(class_ids)), class_ids))
print(class_mapping)

# Visualization

Define a function to help us visualize detections.

In [None]:
from luketils import visualization

def visualize_detections(image, predictions):
  visualization.plot_bounding_box_gallery(
          [image],
          value_range=(0, 255),
          bounding_box_format='xywh',
          y_true=None,
          y_pred=[predictions],
          pred_color=(128,255,128),
          scale=12,
          rows=1,
          cols=1,
          show=True,
          thickness=12,
          font_scale=4,
          class_mapping=class_mapping,
    )

# Model Creation

## RetinaNet

Reconstruct the RetinaNet neural network created in `training.ipynb`. The networks must match in order to load the saved weights. There is no need to load the ImageNet weights, since they were included in the saved weights.

(In an application environment, model creation could be extracted into a shared module. Alternatively, Keras has tools for serializing the model structure in addition to the weights. In this setting, however, duplicating the code is much simpler and suffices for now.)

In [None]:
model = keras_cv.models.RetinaNet(
    # number of classes to be used in box classification
    classes=len(class_ids),
    # For more info on supported bounding box formats, visit
    # https://keras.io/api/keras_cv/bounding_box/
    bounding_box_format="xywh",
    # KerasCV offers a set of pre-configured backbones
    backbone="resnet50",
    # include_rescaling tells the model whether your input images are in the default
    # pixel range (0, 255) or if you have already rescaled your inputs to the range
    # (0, 1).  In our case, we feed our model images with inputs in the range (0, 255).
    include_rescaling=True,
    # Typically, you'll want to set this to False when training a real model.
    # evaluate_train_time_metrics=True makes `train_step()` incompatible with TPU,
    # and also causes a massive performance hit.  It can, however be useful to produce
    # train time metrics when debugging your model training pipeline.
    evaluate_train_time_metrics=False,
)

Load the weights saved to `MODEL_FOLDER` when `training.ipynb` was run.

In [None]:
model.load_weights(MODEL_FOLDER)

Configure the network's prediction decoder, then wrap the network in a function that resizes an input image, gets the predictions, and then returns predictions resized to the original image's proportions.

In [None]:
from keras_cv import bounding_box

model.prediction_decoder = keras_cv.layers.NmsPredictionDecoder(
    bounding_box_format="xywh",
    anchor_generator=keras_cv.models.RetinaNet.default_anchor_generator(
        bounding_box_format="xywh"
    ),
    suppression_layer=keras_cv.layers.NonMaxSuppression(
        iou_threshold=0.25, # 0.25?
        bounding_box_format="xywh",
        classes=len(class_ids),
        confidence_threshold=0.33, # 0.6?
    ),
)

# Get the network's object predictions for an image
# Each prediction has the format:
# [x, y, width, height, class_id, confidence]
def get_predictions(image):
  img = cv.resize(image, (IMAGE_SIZE, IMAGE_SIZE))
  preds = model.predict(np.array([img]))[0]
  rel_pred = bounding_box.convert_format(preds, 'xywh', 'rel_xyxy', img)
  preds = bounding_box.convert_format(rel_pred, 'rel_xyxy', 'xywh', image)
  return preds.numpy()

## ORB

Load the reference scan of the game board and compute its features using the ORB algorithm.

In [None]:
board_ref_file = path.join(IMAGES_FOLDER, 'board_scan_transparent.png')
board_ref_img = cv.cvtColor(cv.imread(board_ref_file), cv.COLOR_BGR2RGB)

orb = cv.ORB_create()
orb.setMaxFeatures(5000)
board_ref_kp, board_ref_des = orb.detectAndCompute(board_ref_img,None)

Define a function that takes an image of a board, computes its features using ORB, and then matches those features against the reference features. If enough matches are found, the function uses homography to compute a projective transformation from the perspective of the input board to the perspective of the reference board. Otherwise, it returns `None`.

In [None]:
# Creates projective transformation from input board perspective to reference board perspective
def find_board_homography(board_img, verbose=False):
  board_kp, board_des = orb.detectAndCompute(board_img,None)
  bf = cv.BFMatcher(normType=cv.NORM_HAMMING)
  matches = bf.knnMatch(board_ref_des,board_des,k=2)
  good = [m for m,n in matches if m.distance < 0.7*n.distance]
  if len(good) < 4:
    return None
  src_pts = np.float32([board_ref_kp[m.queryIdx].pt for m in good]).reshape(-1,1,2)
  dst_pts = np.float32([board_kp[m.trainIdx].pt for m in good]).reshape(-1,1,2)
  M, mask = cv.findHomography(src_pts, dst_pts, cv.RANSAC, 5.0)
  if M is None:
    return None
  if verbose:
    draw_params = {'matchColor': (128,255,128),
                  'singlePointColor': None,
                  # 'matchesMask': mask.ravel().tolist(),
                  'flags': cv.DRAW_MATCHES_FLAGS_NOT_DRAW_SINGLE_POINTS | cv.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS}
    fig,ax = plt.subplots(figsize=(12,12))
    ax.axis('off')
    ax.imshow(cv.drawMatches(board_ref_img,board_ref_kp,board_img,board_kp,good,None,**draw_params))
    plt.show()
  return np.linalg.inv(M)

## Utilities

A series of utility functions for manipulating images and bounding boxes. These will be used in the next section.

In [None]:
# Combination of min and max.
# Give x, returns x such that mn <= x <= mx.
def clamp(x, mn, mx):
  return mn if x < mn else (mx if x > mx else x)

In [None]:
# https://docs.python.org/dev/library/itertools.html#itertools-recipes
# Splits an iterable according to a predicate.
def partition(pred, iterable):
    "Use a predicate to partition entries into false entries and true entries"
    # partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
    t1, t2 = tee(iterable)
    return filterfalse(pred, t1), filter(pred, t2)

In [None]:
# Increases the area of a prediction box by a factor.
# The expanded box is clipped to stay inside (0,0) and maxdim.
def expand(box, factor, maxdim):
  xmin, ymin, width, height = box[:4]
  sqrtf = sqrt(factor)
  xmin = max(0, xmin - width*(sqrtf-1)/2)
  ymin = max(0, ymin - height*(sqrtf-1)/2)
  width = min(maxdim[1]-xmin, width * sqrtf)
  height = min(maxdim[0]-ymin, height * sqrtf)
  return [xmin, ymin, width, height, *box[4:]]

In [None]:
# Converts a prediction box to a prediction point.
def center(box):
  return [box[0]+box[2]/2, box[1]+box[3]/2, *box[4:]]

In [None]:
# Given a prediction box, returns a predicate
# that tests if a point is inside that box.
def inside(board_pred):
  xmin, ymin, width, height = board_pred[:4]
  def apply(pt):
    x,y = pt[:2]
    return xmin <= x <= xmin+width and ymin <= y <= ymin + height
  return apply

In [None]:
# Given a prediction box, returns a function
# that translates point coordinates to be relative to the box.
def translate(board_pred):
  xmin, ymin = board_pred[:2]
  def apply(pt):
    x,y = pt[:2]
    return [x-xmin, y-ymin, *pt[2:]]
  return apply

In [None]:
# Given an image and prediction box,
# returns the portion of the image contained by the box.
def extract(image, box):
  xmin = round(box[0])
  ymin = round(box[1])
  xmax = round(box[0]+box[2])
  ymax = round(box[1]+box[3])
  return image[ymin:ymax,xmin:xmax]

In [None]:
# Given an area, returns a function
# that classifies points in that area into a 4x4 grid.
def gridify(width, height):
  def apply(pt):
    x,y = pt[:2]
    return [clamp(int(x // (width/4)),0,3), clamp(int(y // (height/4)),0,3), *pt[2:]]
  return apply

## Get Game State

Combine all the functions into a single model that takes an image and returns the predicted grid of objects. If `verbose` is `True`, the function also returns the input image transformed into the reference perspective.

In [None]:
# Given an image, get its predicted game state.
def get_game_state(image, verbose=False):

  # Get object predictions from neural network
  predictions = get_predictions(image)
  if verbose:
    visualize_detections(image, predictions)
  piece_preds, board_preds = partition(lambda b: class_mapping[b[4]] == 'board', predictions)

  # Select highest-confidence board prediction
  board_pred = next(iter(sorted(board_preds, key=itemgetter(5), reverse=True)), None)
  if board_pred is None:
    raise ValueError('Board could not be found')

  # Isolate the board area
  board_pred = expand(board_pred, 1.25, image.shape[:2])
  board_img  = extract(image, board_pred)

  # Use board area to compute projective transformation to reference perspective
  homography = find_board_homography(board_img, verbose)
  if homography is None:
    raise ValueError('Board edges could not be found')

  # Replace piece boxes with center points
  piece_preds = map(center, piece_preds)
  
  # Get rid of piece predictions outside of board
  piece_preds = filter(inside(board_pred), piece_preds)
  
  # Shift center points to use board as origin
  piece_preds = map(translate(board_pred), piece_preds)
  
  # Make it a list so we can check if it's empty or not
  piece_preds = list(piece_preds)

  # Transform piece center points to reference perspective
  if piece_preds:
    transformed = cv.perspectiveTransform(np.float32([p[:2] for p in piece_preds]).reshape(-1,1,2), homography)
    transformed = map(itemgetter(0), transformed)
    piece_preds = [[t[0], t[1], *p[2:]] for t,p in zip(transformed, piece_preds)]

  height, width = board_ref_img.shape[:2]

  if verbose:
    warped = cv.warpPerspective(board_img, homography, (height, width))
    fig, ax = plt.subplots(1,1,figsize=(8,8))
    ax.axis('off')
    ax.imshow(warped)
    ax.scatter([p[0] for p in piece_preds], [p[1] for p in piece_preds], s=70, c='#80ff80')
    ax.vlines([x*(width/4) for x in range(1,4)], 0, height, colors='#80ff80', linewidth=3)
    ax.hlines([y*(height/4) for y in range(1,4)], 0, width, colors='#80ff80', linewidth=3)
    plt.show()

  # Partition piece predictions into grid,
  # keeping only the highest-confidence one in each spot.
  piece_preds = map(gridify(width, height), piece_preds)
  grid = [[(None, 0) for _ in range(4)] for _ in range(4)]
  for p in piece_preds:
    cls, conf = grid[p[1]][p[0]]
    if p[3] > conf:
      grid[p[1]][p[0]] = (p[2], p[3])
  
  # Return either the grid, or the grid and transformed image.
  if verbose:
    return grid, warped
  return grid

## Testing Function

Define a testing function to call the model on any image and report its results.

In [None]:
def test_image(file, verbose=False):
  image = cv.cvtColor(cv.imread(file), cv.COLOR_BGR2RGB)
  fig, ax = plt.subplots(1,1,figsize=(10,10))
  ax.axis('off')
  ax.imshow(image)
  plt.show()
  if verbose:
    grid, warped = get_game_state(image, verbose)
  else:
    grid = get_game_state(image, verbose)
  for y in range(4):
    cell = lambda p: 'None' if p[0] is None else f'{class_mapping[p[0]]}-{p[1]:.2}'
    line = ''.join(f'{cell(grid[y][x]):^15}' for x in range(4))
    print(line)
  
  if verbose:
    fig, ax = plt.subplots(1,1,figsize=(8,8))
    ax.axis('off')
    ax.imshow(warped)
    for y in range(4):
      for x in range(4):
        height, width = warped.shape[:2]
        x_pos, y_pos = (x+0.4) * width / 4.5, (y+0.5) * height / 4.5
        ax.annotate(
            class_mapping.get(grid[y][x][0]),
            (x_pos, y_pos),
            fontsize=18,
            weight='bold',
            c='#80ffff',
            path_effects=[patheffects.withStroke(linewidth=2, foreground="b")])
    plt.show()

# Try It Out

These are the 10 images in the dataset that were taken at the conclusion of the played game.

In [None]:
final_game_states = [
  'frontal/IMG_0278.jpeg', 
  'frontal/IMG_4556.JPG', 
  'frontal/IMG_6211.jpg', 
  'side_angle/IMG_0280.jpeg', 
  'side_angle/IMG_4557.JPG', 
  'side_angle/IMG_4558.JPG', 
  'side_angle/IMG_6212.jpg', 
  'top_down/IMG_0279.jpeg',
  'top_down/IMG_4555.JPG',
  'top_down/IMG_6210.jpg'
]

We can test the model on these or any of the image files and see how it does.

In [None]:
image_file = final_game_states[5]
test_image(path.join(IMAGES_FOLDER, image_file), True)