# Digit Detection Validation - Streen View House Numbers
---
Using the provided RBNR annotations, this notebook crops out the bibs first, feeding each bib into the digit detector that we have training using SVHN dataset.  During the cropping process, a text file containing the image names of the cropped bib files along their true RBN will be created.  A similar list will also be created for the predicted RBNs during the digit detection step.  These lists can then be compared in the validation section. 

Set1 and Set2 of the RBNR dataset will be used to train the bib detection model. Note also that Set1 & Set2 have not been used in training the digit detection model yet so all three sets are being used as validation for this step to see how effective the model solely training on SVHN data performs when applying this to the RBNR datasets.

### Credits

This notebook is an adaptation of the notebook provided by Roboflow located [here](https://blog.roboflow.com/train-yolov4-tiny-on-custom-data-lighting-fast-detection/) and Eric Bayless' implementation [here](https://github.com/ericBayless/bib-detector).  Thank you Roboflow & Eric!!

### Details:
The annotations for the RBNR dataset are provided as Matlab formatted files named `<image name>.mat`.  There is one file for each image.  A description of the format can be found in the `readme.txt` located in each set folder.  In this project, I will be using Darknet to train custom Yolo models with this dataset, and Darknet requires annotations to be in the Darknet TXT format.  

More information about Darknet annotation format can be found [here](https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects).

## Grabbing Content 
---
Pulling repos, data, and organizing directories to prep for YOLOv4 training.

***IMPORTANT!!*** Colab does not have the latest opencv-python library installed (as of April 2022); therefore, make sure to upgrade the opencv-python so that you can run the detections. This Stack Overflow Question helped me understand the problem: [Stack Overflow Question](https://stackoverflow.com/questions/66007373/how-to-read-yolov3-yolov4-in-opencv-to-get-the-detections)

In [1]:
!pip install --upgrade opencv-python

Collecting opencv-python
  Downloading opencv_python-4.5.5.64-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (60.5 MB)
[K     |████████████████████████████████| 60.5 MB 66 kB/s 
Installing collected packages: opencv-python
  Attempting uninstall: opencv-python
    Found existing installation: opencv-python 4.1.2.30
    Uninstalling opencv-python-4.1.2.30:
      Successfully uninstalled opencv-python-4.1.2.30
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
albumentations 0.1.12 requires imgaug<0.2.7,>=0.2.5, but you have imgaug 0.2.9 which is incompatible.[0m
Successfully installed opencv-python-4.5.5.64


In [2]:
# imports
import cv2 as cv
import numpy as np
import scipy.io as sio
import os
import pandas as pd

In [3]:
%mkdir /usr/validation/
%mkdir /usr/validation/data/
%mkdir /usr/validation/data/bibs/
%cd /usr/validation/data/

/usr/validation/data


In [4]:
#Mount google drive 
from google.colab import drive
drive.mount('/content/drive/')

Mounted at /content/drive/


In [5]:
#Grab the Racing Bib Number Recognition Data
!wget https://people.csail.mit.edu/talidekel/Data/RBNR/RBNR_Datasets.zip 
!unzip -q RBNR_Datasets.zip

--2022-05-15 15:47:21--  https://people.csail.mit.edu/talidekel/Data/RBNR/RBNR_Datasets.zip
Resolving people.csail.mit.edu (people.csail.mit.edu)... 128.30.2.133
Connecting to people.csail.mit.edu (people.csail.mit.edu)|128.30.2.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 56925300 (54M) [application/zip]
Saving to: ‘RBNR_Datasets.zip’


2022-05-15 15:47:26 (11.8 MB/s) - ‘RBNR_Datasets.zip’ saved [56925300/56925300]



In [6]:
%rm /usr/validation/data/RBNR_Datasets.zip

# Import Functions to Crop Bibs
---

In [7]:
%cp "/content/drive/MyDrive/bib-project/clean code/utils.py" /content
import utils as ut

In [8]:
# set input and output info for set1
images_path = '/usr/validation/data/datasets/set1_org/'
images = [file for file in os.listdir(images_path) if file[-3:]=='JPG']

output_path = '/usr/validation/data/bibs/'

In [9]:
#check for existing bib_numbers.txt and remove if exists
if os.path.exists(output_path + 'bib_numbers.txt'):
    os.remove(output_path + 'bib_numbers.txt')

In [10]:
for image in images:
    ut.get_cropped_bib(image, images_path, output_path)

In [11]:
# repeat process for set2
images_path = '/usr/validation/data/datasets/set2_org/'
images = [file for file in os.listdir(images_path) if file[-3:]=='JPG']

for image in images:
    ut.get_cropped_bib(image, images_path, output_path)

In [12]:
# repeat process for set3
images_path = '/usr/validation/data/datasets/set3_org/'
images = [file for file in os.listdir(images_path) if file[-3:]=='JPG']

for image in images:
    ut.get_cropped_bib(image, images_path, output_path)

# Digit Detection
---

In [16]:
# Copy config & weights to local
%cp /content/drive/MyDrive/bib-project/SVHN/custom-yolov4-tiny-detector.cfg /usr/local/
%cp /content/drive/MyDrive/bib-project/SVHN/custom-yolov4-tiny-detector_best.weights /usr/local/

In [17]:
# get random colors for boxes
np.random.seed(42)
colors = np.random.randint(0, 255, size=(10, 3), dtype='int64')

In [18]:
# Give the configuration and weight files for the model to load into the network.
configPath = '/usr/local/custom-yolov4-tiny-detector.cfg'
weightsPath = '/usr/local/custom-yolov4-tiny-detector_best.weights'

net = cv.dnn.readNetFromDarknet(configPath, weightsPath)
net.setPreferableBackend(cv.dnn.DNN_BACKEND_OPENCV)

# determine the output layer(s)
ln = net.getLayerNames()
ln = [ln[i- 1] for i in net.getUnconnectedOutLayers()]
ln

['yolo_30', 'yolo_37']

In [19]:
# set input and output info for detections
images_path = '/usr/validation/data/bibs/'
images = [file for file in os.listdir(images_path) if file[-3:]=='JPG']
%mkdir /usr/validation/data/nums/
output_path = '/usr/validation/data/nums/'

In [20]:
#check for existing bib_numbers.txt and remove if exists
if os.path.exists(output_path + 'rbn_preds.txt'):
    os.remove(output_path + 'rbn_preds.txt')

In [21]:
# run detections on all images in input directory
for image in images:
    ut.create_labeled_image(image, images_path, output_path,configPath,weightsPath)

# Validation
---

## Training Validation

In [22]:
true_df = pd.read_csv('/usr/validation/data/bibs/bib_numbers.txt', delimiter=',', 
                      index_col=0, names=['image', 'rbn'])
true_df.head()

Unnamed: 0_level_0,rbn
image,Unnamed: 1_level_1
set1_02_bib_1.JPG,3637
set1_56_bib_1.JPG,2692
set1_79_bib_1.JPG,3331
set1_19_bib_1.JPG,2078
set1_72_bib_1.JPG,1478


In [23]:
true_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 290 entries, set1_02_bib_1.JPG to set3_28_bib_1.JPG
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   rbn     290 non-null    int64
dtypes: int64(1)
memory usage: 4.5+ KB


In [24]:
pred_df = pd.read_csv('/usr/validation/data/nums/rbn_preds.txt', delimiter=',', 
                      index_col=0, names=['image', 'pred_rbn'])
pred_df.head()

Unnamed: 0_level_0,pred_rbn
image,Unnamed: 1_level_1
set2_36_bib_1.JPG,10190
set2_52_bib_2.JPG,19078
set1_16_bib_1.JPG,23
set3_15_bib_2.JPG,3608
set3_43_bib_1.JPG,2074


In [25]:
pred_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 290 entries, set2_36_bib_1.JPG to set1_71_bib_1.JPG
Data columns (total 1 columns):
 #   Column    Non-Null Count  Dtype
---  ------    --------------  -----
 0   pred_rbn  290 non-null    int64
dtypes: int64(1)
memory usage: 4.5+ KB


In [26]:
all_df = pd.merge(true_df, pred_df, on='image', how='left')

In [27]:
all_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 290 entries, set1_02_bib_1.JPG to set3_28_bib_1.JPG
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype
---  ------    --------------  -----
 0   rbn       290 non-null    int64
 1   pred_rbn  290 non-null    int64
dtypes: int64(2)
memory usage: 6.8+ KB


#### Accurate Predictions

In [28]:
all_df.loc[all_df['rbn'] == all_df['pred_rbn']]

Unnamed: 0_level_0,rbn,pred_rbn
image,Unnamed: 1_level_1,Unnamed: 2_level_1
set1_02_bib_1.JPG,3637,3637
set1_56_bib_1.JPG,2692,2692
set1_79_bib_1.JPG,3331,3331
set1_19_bib_1.JPG,2078,2078
set1_72_bib_1.JPG,1478,1478
...,...,...
set3_01_bib_4.JPG,4407,4407
set3_09_bib_1.JPG,2830,2830
set3_26_bib_1.JPG,1466,1466
set3_26_bib_2.JPG,2464,2464


#### Inaccurate Prediction

In [29]:
all_df.loc[all_df['rbn'] != all_df['pred_rbn']]

Unnamed: 0_level_0,rbn,pred_rbn
image,Unnamed: 1_level_1,Unnamed: 2_level_1
set1_36_bib_1.JPG,2475,15
set1_71_bib_1.JPG,560,56
set1_28_bib_1.JPG,311,31
set1_57_bib_2.JPG,3521,3527
set1_62_bib_1.JPG,941,94
set1_16_bib_1.JPG,1463,23
set1_17_bib_1.JPG,1463,4
set2_51_bib_1.JPG,11191,1119
set2_31_bib_2.JPG,80653,80635
set2_59_bib_1.JPG,10991,10997


#### Accuracy

In [30]:
true_positives = len(all_df.loc[all_df['rbn'] == all_df['pred_rbn']])
total = len(true_df)

true_positives / total

0.8551724137931035

# Conclusions
---

While the overall accuracy of the Bib Number detction is ~85%, when looking at the inaccurate predictions, many are only off by a single digit and/or missing 1-2 digits on the bib.  Given that this model will be used in real time where an athlete will be moving in a natural scene, we will continue to train the model by using additional bib images in set 1 & set 2.  

A similar & AWESOME study (thank you, [Dylan Seychell](https://www.linkedin.com/in/dylanseychell/) ) leverages Convultional Neural Networks (CNN) to segment bib numbers where the second state consists of a Convolutional Recurrent Neural Network (CRNN) to recognize the the detected bib numbers. Further information on that study can be found [here](https://ieeexplore.ieee.org/document/8868768).