# REAL-TIME HANDWIRITNG RECOGNITION USING TENSORFLOW AND RASPBERRY PI
Prepared By: Luis Rivera

In partial fulfillment of the requirements for CPE 4903

May 02, 2023

Kennesaw State University

## Table of Contents
1. Introduction
2. Theory
    1. MNIST Dataset
    2. Training
    3. Testing
     
     
3. Procedure/Design
    1. Training
    2. Testing 
     
     
4. Data and Analysis
    1. Training Results
     
 
5. Conclusion
6. Evidence

## Introduction

        The following report will detail my work to create and test a convolutional neural network that can identify handwritten numbers captured in real time with a raspberry pi camera. It will describe the theory behind the components as well as detail the specific implementations.

## Theory

### 1. MNIST Dataset

        In order to train a neural network to recognize handwritten numbers, we need a large set of such handwritten numbers. Our starting point is the MNIST database (Modified National Institute of Standards and Technology database). This collection of 28 x 28 pixel grayscale images was compiled for the sole purpose of training a neural network like our own. It will save us a lot of time.

### 2. Training

        To train this model, we will use a Convolutional Neural Network, which is designed for training on 2-dimensional images. It is precisely what we need for our normalized collection of handwritten numbers provided by MNIST. We use Keras to train and analyze a pretty simple model. We determine the success of training by splitting the dataset into train and test sets and test the accuracy of the model trained with part of the set by comparing it with the unseen part of the set. The goal is to have a model accurate in the upper 90% range.

### 3. Testing

        We plan to test our model on real handwritten numbers in real time with a camera. In our case we are using a raspberry pi with camera. Generally, we will initialize a video capture, and, for every frame, process the image to be appropriate for input into our model predictor and report the predicted character and it's confidence in that prediction. We are aiming for a confidence greater than 50%.

        

## Procedure/Design

### 1. Training

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt

To start we load and process the MNIST dataset

In [None]:
# Model / data parameters
num_classes = 10
input_shape = (28, 28, 1)

# Load the data and split it between train and test sets
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Scale images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")

We define the model and it's layers here. We are using 2 layers of Conv2D. I originally had many large dense relu layers after the Conv2D, but it was determined late that large relu layers before the softmax output will result in artificially large weights that distort the confidence of the prediction. Even without these dense layers, we achieve a >98% accuracy.

In [None]:
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)


model = keras.models.Sequential()
model.add(keras.layers.Conv2D(32, (2, 2), input_shape = (28, 28, 1), activation = 'relu'))
model.add(keras.layers.MaxPooling2D(pool_size = (2,2)))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.Conv2D(32, (2, 2), input_shape = (28, 28, 1), activation = 'relu'))
model.add(keras.layers.MaxPooling2D(pool_size = (2,2)))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(units = 10, activation = 'softmax'))

We actually train the model here

In [None]:
batch_size = 128
epochs = 5

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
print(model.summary())
history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.2, verbose=1)

Here we can evaluate the model and plot the accuracy and loss over epochs

In [None]:
score = model.evaluate(x_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])

plt.plot(history.history['loss'])
plt.plot(history.history['accuracy'])
plt.title('model loss')
plt.ylabel('loss & accuracy')
plt.xlabel('epoch')
plt.show()

Since we will be using the model in another script and we do not want to retrain the model everytime, we will save the model to a file.

In [None]:
model.save('/home/pi/handrecog/keras_convnet_adam')


### 2. Testing


In [None]:
import os
import numpy as np
import cv2
import tensorflow as tf
from tensorflow import keras
from sense_hat import SenseHat

print(tf.version.VERSION)

We begin by initializing our sense hat to display our prediction later, as well as load our saved model and begin a video capture. The count variable here count the number of frame that are analyzed in the following loop.

In [None]:
sense = SenseHat()

loaded_model = tf.keras.models.load_model('/home/pi/handrecog/keras_convnet_adam')

loaded_model.summary()

cap = cv2.VideoCapture(0)

count = 0

This is the main loop of the program. In each loop, a frame of video is captured and displayed so that you can view the feed in real time. The image is then inverted, resized to 28 x 28, converted from bgr to rgb, converted to a tensor, normalized to values 0-255, converted to grayscale, and a column is added for model input. After all this processing, we call the prediction function with our single image as the input. This returns an array of predictions for each number. The predictions are the probability that each number is correct and all add up to 1.0. We find our actual predicted nuumber by finding the number with the greatest probability. The model's confidence is the highjest predicted probability. I chose to only display a prediction if the confidence is greater than or equal to 70%.

In [None]:
while(True):
    ret, raw_bgr = cap.read()
    gray = cv2.cvtColor(raw_bgr, cv2.COLOR_BGR2GRAY)
    cv2.imshow('frame', gray)
    bgr_inverted = cv2.threshold(raw_bgr, 50, 255, cv2.THRESH_BINARY_INV)
    resized_bgr = cv2.resize(bgr_inverted[1], (28, 28), interpolation = cv2.INTER_CUBIC)
    rgb = cv2.cvtColor(resized_bgr, cv2.COLOR_BGR2RGB)
    rgb_tensor = (tf.convert_to_tensor(rgb, dtype=tf.float32) / 255.0)
    gray_tensor = tf.image.rgb_to_grayscale(rgb_tensor)
    cv2.imwrite("preview.jpg", gray_tensor.numpy())
    gray_tensor = tf.expand_dims(gray_tensor, axis=0)

    y_hat = loaded_model.predict(gray_tensor)

    prediction= np.argmax(y_hat)
    score = y_hat[[0],[prediction]].item()

    if score >= 0.7:
        print('Number is a {} with a certainty of {:.2%}'.format(prediction, score))
        sense.show_letter(str(prediction))
    else:
        if count % 2:
            sense.show_letter('/')
        else:
            sense.show_letter('\\')
        count = count + 1

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
    
cap.release()
cv2.destroyAllWindows()

## Data and Analysis

The important quantifiable metrics for our model are the loss and accuracy of the model.

|Name|Value|
|----|-----|
|Model Training Loss|0.0703|
|Model Training Accuracy|0.9798|
|Model Test Loss|0.0632|
|Model Test Accuracy|0.9808|

## Conclusion

        In conclusion, I am very pleased with the resulting product of this project. I was able to build a CNN model and raspberry pi application that quickly and accurately identify handwritten numbers. More than just make working software, I was able to work between multiple platoforms and successfully integrate hardware and software into a cohesive system. I beleive this project to be a resounding success.

## Evidence

![Proof of installation](Screenshot%202023-05-02%20170241.png)

![Proof of Training](Screenshot%202023-05-02%20171857.png)