# Overview

In this notebook, we will explore computer vision techniques and a simple conceptualization of transfer learning. Our specific goal is to take a dataset of digits (numbers 0-9) and use it to train a model to read speed limit signs. We will try not to touch the training digits dataset, but we will attempt to transform the test signs dataset in order to improve performance. This emulates the situation of having only a pretrained model to work with.

This could be very similar to a problem a business faces. For example, Google could face this problem if they were trying to add in speed limit detection to their technology to map out roads. In that situation, the goal would be to take a street image and output what speed limits there are on the street. A huge problem in many DL tasks, but even moreso DL CV tasks, is labeling: labeling is expensive, extremely tedious, hard to standardize, and hard to ensure quality for. It would save lots of money, and perhaps even get a better result, to use existing high quality datasets and models on the new test set. This is the task we will be attempting.

The reality of the situation is that transfer learning is an extremely complex field with lots of research, and often turns out to be an unreachable dream, but we will see what we can do by building a model to detect digits and then transform speed limit sign images to improve test accuracy using custom metrics.

(To emphasize: **We will be transforming the test set (street sign images) and not the train set (digits)**).

# Data


## Training Data

The MNIST dataset is often called the "hello world" of image classification machine learning. It contains 70,000 examples of handwritten digit images paired with classifications (i.e., the corresponding digit 0-9). It has the right data for our task, but our task is not classification, it is object detection. Therefore, we will use an existing preprocessed derivative called YYMNIST, which generates images with multiple digits each, with a bounding box annotation for each digit. This data is suitable for object detection.

The usage of YYMNIST over MNIST corresponds to a functional difference between **classification** and **object detection**: object detection is capable of finding multiple instances in an image (`vector of pixels -> vector of (class, bbox)`) whereas classification is not (`vector of pixels -> single class`) (a good summary of this can be found at https://medium.com/analytics-vidhya/image-classification-vs-object-detection-vs-image-segmentation-f36db85fe81). This is helpful because speed limit signs have two- and three-digit numbers. One alternative not too long ago might have been to use some method to partition the sign image into its comprised digits, but this is no longer necessary and in fact may be much harder than using the highly sophisticated object detection models of today.

MNIST source: http://yann.lecun.com/exdb/mnist/

YYMNIST source: https://github.com/YunYang1994/yymnist

## Test Data
Our test data is a subset of a traffic sign dataset that contains many different sign types. We only need the signs that contain a speed limit, though it could be interesting to see if our model has false detections on signs that do not have any numbers. The speed limits we have are: 20, 30, 50, 60, 70, 80, 100, 120.

This dataset is from https://www.kaggle.com/meowmeowmeowmeowmeow/gtsrb-german-traffic-sign/notebooks. However, it is quite large (~600MB) and we need only a subset, so it seems justifiable to slightly deviate from the normal notebook style and instead chop it outside of the notebook. Note that **this is permitted under the dataset's license** ("You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission."; CC0 1.0). A script of the form below works (copies 30 samples of each relevant sign class; sampling is arguably necessary as there are multiple images of many signs slightly shifted).

```
import os
from random import sample
from shutil import copyfile

os.makedirs('../speedlimit_signs', exist_ok=True)

for cid in range(9):
    files = os.listdir(f'Train/{cid}/')
    for f in sample(files, 30):
        copyfile(f'Train/{cid}/{f}', f'../speedlimit_signs/{f}')
```

Please download as below.

## Imports

You cannot "Run All" until this cell has been run.

In [None]:
try:
  import detectron2
except:
  !pip install pyyaml==5.1
  !pip install git+https://github.com/facebookresearch/fvcore.git
  import torch, torchvision
  assert torch.__version__.startswith("1.7")
  !pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.7/index.html
  import os
  os.kill(os.getpid(), 9)
print('ok')

In [None]:
import os
import json
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()

# import some common libraries
import matplotlib.pyplot as plt
import numpy as np
import cv2
from google.colab.patches import cv2_imshow
from random import sample
import pandas as pd

# import some common detectron2 utilities
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog
from detectron2.engine import DefaultTrainer
from detectron2.data.datasets import register_coco_instances
from detectron2 import model_zoo
from PIL import Image
from collections import namedtuple

### Retrieve Train Data

YYMNIST generates the data on the fly via a script and the original MNIST dataset.

In [None]:
!git clone https://github.com/YunYang1994/yymnist.git
print('Generating train data...')
!python yymnist/make_data.py &>/dev/null
print('Train data done')

## Retrieve Test Data

This is our preprocessed signs dataset subset.

In [None]:
!wget 'https://docs.google.com/uc?export=download&id=1E1-yxHAj-Jvf_NUAfweEMjhlP_YhaJoP' -O speedlimit_signs.zip
!unzip -q speedlimit_signs.zip
print('Test data done')

If everything went well, there should be a folder `speedlimit_signs` and `yymnist/Images`.

In [None]:
assert set(['speedlimit_signs', 'yymnist']).issubset(os.listdir()),\
'missing top level directory, did you forget to run a cell?'

assert 'Images' in os.listdir('yymnist'),\
'missing yymnist images dir, make_data.py must have failed'

print('All data is there, excellent.')

If those assertions passed, everything should be correct. Let's clean up the filesystem a little bit.

In [None]:
!rm *.zip
!mkdir test
!mv speedlimit_signs/* test
!mkdir train
!mv yymnist/Images train/
!mv yymnist/labels.txt train/labels.txt
#!rm -r yymnist/

Now our data structure is like:
- train
  - Images
    - *.jpg
  - labels.txt
- test
  - *.png

# Exploratory Data Analysis

## Train data

Our train data is dynamically generated, so technically that could influence performance significantly, but it is unlikely. Essentially, the repo has the original MNIST digit dataset and generates a bunch of images with the digits randomly overlayed on a white background. The annotations are rectangle bounds for each digit along with the digit it is, i.e. 0-9.

The best way to get a feel for this data is to directly visualize it.

In [None]:
# partial source: https://github.com/YunYang1994/yymnist/blob/master/show_image.py
with open(f'train/labels.txt') as f:
  lines = f.readlines()
  print(lines)

  for x in range(5):
    assert lines[x][lines[0].find('.jpg')-1] == f'{x+1}', 'bad labels'
    boxes = lines[x][lines[x].find(' ')+1:].split(' ')
    img = cv2.imread(f'train/Images/00000{x+1}.jpg')
    print(f'train/Images/00000{x+1}.jpg')
    for box in boxes:
      vals = box.split(',')
      img = cv2.rectangle(img, (int(vals[0]), int(vals[1])), (int(vals[2]), int(vals[3])), (0,0,255), 2)
    cv2_imshow(img)


It's great (and not so great, later) the annotations are in a simple format. The images look pretty straightforward: a bunch of digits on each image, scaled randomly, positioned randomly.

A couple observations, partly from the future:
* The bounding boxes are pretty loose. Most likely, the original MNIST images have a bit of edge around them and the script just creates a box around the image. The relevance of this is mainly that since digits can be very close to each other on speed limit signs, this could be bad because it will not have the uniform background it's expecting.
* The digits are never rotated at all. This may not matter at all, but it's something to keep in mind.

We plan to use the Detectron 2 framework for training a model, and that pretty much requires MS COCO format (basically, a big json dict with everything laid out in a very particular way), so it will be somewhat of a pain to get these annotations to work, but at least we know everything is correct.

**The code below deviates from our claim to not modify the training set, but corrects for a bug in yymnist where it generates empty bounding boxes, polluting the training data**


In [None]:
print('Processing bounding boxes ...')

with open(f'train/labels.txt') as f,\
    open(f'train/labels2.txt', 'w') as nf:
  lines = f.readlines()
  newlines = []
  print(lines)

  for x in range(len(lines)):
    items = lines[x].split(' ')
    newitems = [items[0]]

    boxes = items[1:]
    #print(boxes)
    img = cv2.imread(f'train/Images/{x+1:06d}.jpg')
    for box in boxes:
      #print(box)
      vals = [v.strip() for v in box.split(',')]
      newvals = []
      #print(vals)
      x0 = int(vals[0])
      y0 = int(vals[1])
      x1 = int(vals[2])
      y1 = int(vals[3])
      sl = img[y0:y1+1,x0:x1+1]
      #cv2_imshow(sl)
      gray = cv2.cvtColor(sl, cv2.COLOR_BGR2GRAY)
      gray = 255 - gray #cv2.GaussianBlur(255 - gray, , 0)
      retval,thresh = cv2.threshold(gray, 10, 255, cv2.THRESH_BINARY)
      mask=Image.fromarray(thresh)
      box = mask.getbbox()
      if box is None:
        #cv2_imshow(sl)
        print('Warning: empty box')
      else:
        newitems.append(','.join(vals))
    newlines.append(' '.join(newitems))

  nf.write('\n'.join(newlines))
print('Ok done')

## Test data

As stated, the sign dataset has many classes, so we took out 30 random samples from each speed limit class and uploaded the result to drive. Class information can be found at the data source, but superficially, class ids are as follows:
* 0 -> 20
* 1 -> 30
* 2 -> 50
* 3 -> 60
* 4 -> 70
* 5 -> 80
* 6 -> 80 (striped)
* 7 -> 100
* 8 -> 120

One detail to note is that since two classes show 80, there are 60x 80 signs and 30x all other speeds. Let's briefly look at a few of each class to get a feel for how these images are. We can also easily verify the counts for each class.


In [None]:
imgs = os.listdir('test')
plt.figure(figsize=(10,10))
for x in range(9):
  clsimgs = list(filter(lambda i : i.startswith(f'0000{x}'), imgs))
  assert len(clsimgs) == 30, 'Unexpected number of images'
  for sidx in range(5):
    sample=clsimgs[sidx]
    plt.subplot(9, 5, x*5 + sidx + 1)
    plt.axis('off')
    plt.imshow(cv2.imread(f'test/{sample}')[:,:,::-1])

These signs look pretty bad. Their brightness varies tremendously, many of them are quite blurry, and they are pretty tiny (though that is not visible here). It's hard to predict how the stripe on the 80 signs may affect things, but it seems likely that it would be lower on the striped ones than the normal 80s.

# Model

Detectron 2 is a Python object detection and segmentation framework created by Facebook AI Research (FAIR). We will use Detectron 2 due to familiarity and relative simplicity compared to other DL CV frameworks. The primary difficulty with it is getting our data in the right format, since Detectron 2 mostly uses MS COCO, the format of the datased named COCO (Common Objects in COntext)

**Note, for full clarity, that this is the only distinct "model" we will use throughout the notebook.** Specifically what we are doing is training a Detectron 2 model on YYMNIST images, and then optimizing its performance on a street sign test dataset using custom metrics and various transformations.

The specific type of model we will use is called Faster R-CNN (the CNN being the same CNN we have discussed in class). It is a very recent and extraordinarily powerful object detection model that has achieved great performance, both in speed and accuracy, in many areas in the past few years. It works by extracting image features using a CNN architecture and then making region proposals over the features. These "proposals" are potential object detections which are associated with a probability. For example, the model could output a bounding box over a dog that means, "the model is ~85% confient this bounding box is a dog." Ideally, all detections would be have 100% probability/confidence, but this is far from the case in practice. If confidence is too low, then you get tons of garbage detections, and if it is too high, then you filter out too many real detections. We will require detections to have 60%, as we experimentally determined.

60% is low but this is logical as a YYMNIST object detection model is inherently extremely overfit in the context of evaluation on a street sign dataset.

The internals of our object detection model are not the focus of this notebook, but it is nonetheless difficult to convey rationale behind what is going on without giving a slight background. A good overview can be found at https://towardsdatascience.com/faster-rcnn-object-detection-f865e5ed7fc4.

The code below reads the labels2.txt we got earlier and transforms it into MS COCO format.

In [None]:
cocoDict = {
    'info': {
        'description': 'YYMNIST Generated Dataset',
        'url': 'https://github.com/YunYang1994/yymnist',
        'version': '1.0',
        'year': 2020
    },
    
    'images': [],
    'annotations': [],
    'categories': [
                   { 'id': x, 'name': str(x) } for x in range(10)
    ]
}

imgIdCounter = 0
annoIdCounter = 0

with open('train/labels2.txt', 'r') as f:
  for line in f.readlines():
    blocks = line.split(' ')
    imgfn = blocks[0]
    imgfn = imgfn[imgfn.rfind('/')+1:] # /content/yymnist/Images/x -> x

    # insert image
    cocoDict['images'].append({
        'id': imgIdCounter,
        'width': 416,
        'height': 416,
        'file_name': imgfn
    })

    for anno in blocks[1:]:
      parts = anno.split(',')
      cocoDict['annotations'].append({
          'id': annoIdCounter,
          'category_id': int(parts[-1]),
          'image_id': imgIdCounter,
          'bbox': [
                   int(parts[0]),
                   int(parts[1]),
                   int(parts[2]) - int(parts[0]),
                   int(parts[3]) - int(parts[1])
          ]
      })
      annoIdCounter += 1

    imgIdCounter += 1

with open('train/labels.json', 'w') as fo:
  json.dump(cocoDict, fo)

assert len(cocoDict['categories']) == 10, 'Something went horribly wrong'

Detectron works by defining a COCO dataset and then "registering" it in its internal database so that it can then be referenced by name as a train or test dataset for a model. The utility of this is that it makes things much more portable, but it's not of much concern for this project.

In [None]:
register_coco_instances('yymnist', {}, 'train/labels.json', 'train/Images')

Now for training. After some experimentation, 2000 iterations seems like a pretty good training number, although we did not determine this with proper training curves.

We trained a model this long and uploaded it to drive, but a new one can be trained easily.

Credit to
* https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5#scrollTo=Ya5nEuMELeq8 (official detectron 2; though tutorial uses mask r-cnn)
* https://www.flagly.org/projects/4/notes/41/sections/41/

(2000 iterations takes something like 22 minutes, so it's not that long, but it's far too long to wait in a presentation)

## Download or train

Models work using a yaml structure containing parameters, which you can then train or evaluate with.

In [None]:
train_new = False

cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("yymnist",)
cfg.DATASETS.TEST = ()   # no metrics implemented for this dataset
cfg.DATALOADER.NUM_WORKERS = 4
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")
cfg.SOLVER.IMS_PER_BATCH = 4
cfg.SOLVER.BASE_LR = 0.02
cfg.SOLVER.MAX_ITER = 3000
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # faster, and good enough for this toy dataset
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 10

if not train_new:
  !gdown https://drive.google.com/uc?id=1Esa1vz7JCS4K1JDZdvIv0feIL2sJGDjd
  !mkdir output
  !mv model_final.pth output
  cfg.MODEL.WEIGHTS = "output/model_final.pth"
  print('Loaded')
else:
  cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")

  os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
  trainer = DefaultTrainer(cfg)
  trainer.resume_or_load(resume=False)
  trainer.train()
  print('Trained')

Let's do a little sanity check (note this is NOT "testing", this is running the model on a **train** image. However, if this looks OK then it means everything is in the right format and the training went reasonably OK).

The metadata is something we could have registered properly with the dataset, but it doesn't really matter. We are just using it to make Detectron label the bounding boxes so we don't have to manually.

In [None]:
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")  # path to the model we just trained
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.6
predictor = DefaultPredictor(cfg)
im = cv2.imread('train/Images/000001.jpg')
eval_metadata = { 'thing_classes': [str(x) for x in range(10)] } # might have been better to register the dataset with this but doesn't really matter

def detect_image(im):
  # weird indexing notation is for RGB vs BGR
  v = Visualizer(im[:,:,::-1], eval_metadata, scale=1.5)
  outputs = predictor(im)
  out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
  return outputs['instances'], out.get_image()[:, :, ::-1]

instances, img = detect_image(im)
cv2_imshow(img)

## Testing

Now let's try evaluation on street signs. One caveat is that we lack a precise way of evaluation predictions, because we do not have bounding box labels for the street sign. One option, which we will do, is to take the expected speed limit based on the sign image's class and match it to the bounding boxes. Intuitively, if you have a '5' box to the left of a '0' box, then that represents '50', and if the sign is class 2 (i.e. 50 kph), then it was correct.

But there are problems with this. For one, any extraneous bounding boxes result in an incorrect result. For example, if there is a '2' detection outside of the sign (for whatever reason; it could be erroneous or there could be another sign or something) then the result might be '250' which is completely incorrect. In the other direction, it's possible to be correct when the prediction was really incorrect. For example, a sign of 100 kph could have the first 0 double-detected and the second 0 not detected and that would result in a false correct prediction.

However, it suffices here. As we will be able to see by eye, the vast majority of the time this metric says "correct", it really is correct.

One thing to note is that even if we did have bounding box annotations for the stop signs (though that would invalidate this whole scenario), it still would not necessarily be straightforward to evaluate, but we could use something like the Hungarian algorithm to match boxes for a slightly more strict metric. Unfortunately, this is far outside of the scope of this project.

In [None]:
speed_map = {
    '0': '20',
    '1': '30',
    '2': '50',
    '3': '60',
    '4': '70',
    '5': '80',
    '6': '80',
    '7': '100',
    '8': '120'
}

results_dict = {}

def eval_transform(tname, transform):
  imgs = os.listdir('test')

  num_correct = 0
  num_incorrect = 0
  num_nodet = 0

  for x in imgs:
    answer = speed_map[x[4]]
    im = transform(cv2.imread(f'test/{x}'))
    instances, img = detect_image(im)
    boxes = [x.cpu().numpy() for x in list(instances.get('pred_boxes'))]
    classes = [int(x.cpu().numpy()) for x in list(instances.get('pred_classes'))]
    if len(boxes) > 0:
      # https://stackoverflow.com/questions/6618515/sorting-list-based-on-values-from-another-list
      # sort detected classes by left edge, and then concat the digits
      # detected_num is the number the model things is on the sign
      detected_num = ''.join([str(x) for _,x in sorted(zip(boxes, classes), key=lambda pair : pair[0][0])])
      if detected_num == answer:
        print(f'{answer}... correct!')
        num_correct += 1
      else:
        print(f'{answer}... incorrect ({detected_num})!')
        num_incorrect += 1
      cv2_imshow(img)
    else:
      #print(f'{answer}... no detections!')
      num_nodet += 1

  print(f'{num_correct} correct')
  print(f'{num_incorrect} incorrect')
  print(f'{num_nodet} no detections')
  results_dict[tname] = [num_correct/len(imgs), num_incorrect/len(imgs), num_nodet/len(imgs)]
  return results_dict[tname]

### Baseline

In [None]:
# Evaluate images w/o changing them
eval_transform('Baseline', lambda x:x)

We have a 6.3% accuracy.

This isn't too bad because the chance of accidental correctness is very low. It's not guessing which class the sign is among 8; it's more like guessing the right number from 0 to 1000 (since the model can guess an arbitrary number of digits).

Also, many of the "incorrect" ones are partially correct. Commonly, the 0 gets detected but not the other digit. We suspect this has to do with the digits being close enough together that the overly loose YMNIST boxes see parts of the other digits in the boxes for each digit, making them unrecognizable.

We did a separate experiment using OpenCV thresholds to fully tighten the bounding boxes on the digits, but shockingly, this caused accuracy to drop further. New classes of errors were introduced such as two 0s being detected on the two circles of an 8. This is logically most likely because some distinct background is necessary context to determine where a digit starts and where it stops. Even now, some errors include things like a 0 being detected on the circle of a 6.

### Black & white + Resize
(better than either individually)

In [None]:
# turn to grayscale and resize
def grayscale_resize_transform(im):
  gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
  im2 = np.zeros_like(im)
  im2[:,:,0] = gray
  im2[:,:,1] = gray
  im2[:,:,2] = gray
  return cv2.resize(im2, (200,200))

eval_transform('Grayscale+Resize', grayscale_resize_transform)

By making the images black and white, accuracy went up slightly to 6.7%. It makes sense that black and white training data does not work well with RGB test data. It seems surprising "incorrect" (aka, often partially correct) went down because you'd think when the test data becomes more similar to the training data there would be more detections in general, even in the background, but that does not seem to be the case here. One can only really guess at the cause of this.

###Contrast

#### Summarized
We tried several combinations and orderings of transformations involving contrast. Combining contrast boosting, grayscale, and image resizing by first resizing, then grayscaling, then contrast boosting the test images, ended up producing the best results with our model at almost double baseline.

####Contrast Boosting

Pure contrast boosting seems to perform poorly on its own, further image transformations will be tested. The intuition behind contrast boosting is that it could make the number lines more distinctive compared to the background, as more prominent features are detected more easily.

In [None]:
import cv2
def contrast_boost(im): # from https://stackoverflow.com/questions/39308030/how-do-i-increase-the-contrast-of-an-image-in-python-opencv
  #-----Reading the image-----------------------------------------------------
  # images are already in correct format
  # img = cv2.imread(im, 1)

  #-----Converting image to LAB Color model----------------------------------- 
  lab= cv2.cvtColor(im, cv2.COLOR_BGR2LAB)

  #-----Splitting the LAB image to different channels-------------------------
  l, a, b = cv2.split(lab)

  #-----Applying CLAHE to L-channel-------------------------------------------
  clahe = cv2.createCLAHE(clipLimit=3.0, tileGridSize=(8,8))
  cl = clahe.apply(l)

  #-----Merge the CLAHE enhanced L-channel with the a and b channel-----------
  limg = cv2.merge((cl,a,b))

  #-----Converting image from LAB Color model to RGB model--------------------
  final = cv2.cvtColor(limg, cv2.COLOR_LAB2BGR)

  #_____END_____#
  return final

print(eval_transform('Contrast', contrast_boost))

Interestingly, this had the exact same result (on RGB) as going RGB -> B&W. This is most likely a coincidence, but the result is the same: small effect.

####Contrast Boost and Resize

Attempting resizing first and second, it appears that resizing the image first, then boosting the contrast produces a better result.

In [None]:
def contrast_resize(im):
  return cv2.resize(contrast_boost(im), (200, 200))

def resize_contrast(im):
  return contrast_boost(cv2.resize(im, (200, 200)))

print(eval_transform('Resize+Contrast', resize_contrast))

10% accuracy is getting pretty great in the grand scheme of things.

####Contrast, Resize, and Grayscale

From these tests, resizing first, then turning the image into grayscale, then boosting the contrast produced the best results.

In [None]:
def resize_contrast_grayscale(im):
  return grayscale_resize_transform(contrast_boost(cv2.resize(im, (200, 200))))

def resize_grayscale_contrast(im):
  return contrast_boost(grayscale_resize_transform(cv2.resize(im, (200, 200))))

eval_transform('Resize+Grayscale+Contrast', resize_grayscale_contrast)

Accuracy went up just a tiny bit more; while it doesn't entirely make intuitive sense, it seems the utility from constrast boosting and B&W overlaps a fair bit.

###Edge Detection

Edge detection ended up resulting in very poor performance, but very high detection rates. Unfortunately what got detected was not necessarily the numbers on the signs. Our hypothesis is that edge detection made background equally as prominent as the digits on the sign, and thus confused the model. This also revealed a secondary flaw in our model, that it cannot weed out false detections. For instance there is no way a sign could have more than 3 digits, but the model will happily detect far more digits than that.

One can also see that the edges for the speed limit digits generally trace around the number's text on both the inside and outside, so most of the "filled" part is actually white, just like the background. While performance is not great here, it reduces so much noise while still keeping valuable digit features so we suspect it has the potential to help a lot.

####Canny Edge Detection and Inverted Canny Edge Detection
Both edge detection and inverted edge detection appear to bring very high detection rates, but thus far no correct detections.

In [None]:
def edges(im):
  edges = cv2.Canny(im, 100, 200)
  im2 = np.zeros_like(im)
  im2[:,:,0] = edges
  im2[:,:,1] = edges
  im2[:,:,2] = edges
  return im2

def invert(im):
  return cv2.bitwise_not(im)

def inverse_edges(im):
  return invert(edges(im))

eval_transform('Inverse Edges', inverse_edges)

####Resizing and Canny Edge Detection

Now we add in resizing to the mix.

In [None]:
def edge_resize(im):
  return cv2.resize(edges(im), (200, 200))

def resize_edge(im):
  return edges(cv2.resize(im, (200, 200)))

def inverse_edge_resize(im):
  return invert(edge_resize(im))

def inverse_resize_edge(im):
  return invert(resize_edge(im))

eval_transform('Inverse Resize Edge', inverse_resize_edge)

It is interesting how destructive this small change is to the number of lines in the images. Intuitively, upscaling an image makes everything thicker, so perhaps many lines lose their qualification as edges due to being too thick.

####Automatic Parameter Tuning Canny Edge Detection

In [None]:
# from https://www.pyimagesearch.com/2015/04/06/zero-parameter-automatic-canny-edge-detection-with-python-and-opencv/
def auto_canny(image, sigma=0.33):
  # compute the median of the single channel pixel intensities
  v = np.median(image)
  # apply automatic Canny edge detection using the computed median
  lower = int(max(0, (1.0 - sigma) * v))
  upper = int(min(255, (1.0 + sigma) * v))
  edged = cv2.Canny(image, lower, upper)
  # return the edged image
  im2 = np.zeros_like(image)
  im2[:,:,0] = edged
  im2[:,:,1] = edged
  im2[:,:,2] = edged
  return im2

def inverse_auto_canny(im):
  return invert(auto_canny(im))

eval_transform('Auto Canny', inverse_auto_canny)

Still no luck.

Let's try with resizing.

In [None]:
def resize_auto_canny(im):
  return auto_canny(cv2.resize(im, (200, 200)))

def auto_canny_resize(im):
  return cv2.resize(auto_canny(im), (200, 200))

def inverse_resize_auto_canny(im):
  return invert(resize_auto_canny(im))

def inverse_auto_canny_resize(im):
  return invert(auto_canny_resize(im))

eval_transform('Resizing Auto Canny', inverse_resize_auto_canny)

In spite of all of the 0s, we still believe canny has potential, but it is clear that simple transformations like inversion, resizing, etc. will not enable compatibility with the model. Ultimately, some specialized algorithm is likely needed to either reinforce the digits or get rid of some of the background.

Even if it's not useful for being used with a regular MNIST model due to the simplicity yet lack of feature loss, it would probably be productive to train a model on edge-detected sign images over regular sign images.

## Summary

Evaluation metrics are complex here because it is hard to define correctness.
* A prediction of "150" for 50 means there was probably an extraneous 1 somewhere.
* A prediction of "5" for 50 means the 5 got detected but not the 0.
* A prediction of "1234" for 50 means it was completely wrong.

Yet, all of these cases are grouped together: there is a prediction but it is incorrect. This must be kept in mind; except for the edge detection evaluations, most incorrect predictions were partially correct actually. Clearly, though, all "no detection" cases are fully incorrect.

It would be possible to add heuristics to the accuracy metric to infer the real correct score. For example, the largest (up to) 3 boxes could be considered to eliminate some of the extraneous small boxes. We did not do this because it opens up a difficult balancing act.

In [None]:
 df = pd.DataFrame(results_dict, index=['Accuracy', 'Incorrect', 'No prediction'])
df.head()

One thing we will add is a pseudo-recall measuring "when the model predicted something, how often was it completely correct?"

In [None]:
df.loc['Recall'] = df.loc['Accuracy'] / (df.loc['Accuracy'] + df.loc['Incorrect'])
df.head()

Now we can plot. We will separate regular and edge/canny transformations to reduce clutter.

### Non-Edge Transformations

In [None]:
df.iloc[:,:5].plot.bar()

Resize + Grayscale + Contrast Boost is the winner here. While its accuracy is only slightly over 10%, its "recall" is around 25% and it has one of the lower "no prediction" rates. This is far from great, but given that most of "Incorrect" is partially correct (or correct w/ extraneous digits), it's not actually that bad.

### Edge Transformations

Edge transformations gave very poor objective accuracy, but their "Incorrect" vs "No prediction" rates did vary significantly. Part of this is because of the destructive effect pre-resizing has on the edges.

In [None]:
df.iloc[:,5:].plot.bar()

Resizing Auto Canny would likely be the one we would look more into because high "Incorrect" as opposed to "No prediction" means there are at least some features being detected.

# Conclusion
In this notebook, we explored computer vision as a means of experimenting with primitive ideas of transfer learning. We took a simple handwritten dataset, and trained a model to try to read speed limit signs of varying quality, brightness, and other traits.

We started at 6.3% accuracy, but through choosing and applying image transformations to the signs, we achieved 10.4% overall accuracy. This is low, but a big improvement and still far higher than randomly guessing a number.

The most natural next step for this project would be to perform **input augmentation**, i.e. modifying the training images rather than just the test images. This can be tricky and increases training time a lot, but things like extremely dark and messed up signs are really hard to clean whereas similar destruction could be applied to the training set and the model could possibly learn how to read them. This does break part of our scenario.

Another possible consideration would be using a different training set altogether. It is possible that the handwritten digits in the yymnist dataset caused the model to be poorly trained for the very uniform digits on the road signs. Perhaps a training dataset made up of images of digits in many fonts would produce better results. On the other hand the irregularities of the yymnist dataset could also have produced a better model, because the model might have been better prepared to handle skewed or rotated images by having broader criteria for what each digit might look like.

Overall, this was a really fun project that has near infinite room for further improvement. We learned a lot about computer vision (specifically object detection) and image transformations (both generally and OpenCV functionality), and we scratched the surface of transfer learning. Given our poor overall accuracy at the end, it becomes clear why this is a difficult area with tons of research and specialized techniques of its own.