# Machine Learning-based Highway Information Panel Reading

## Fundamental implementation of an Optical Character Recognition (OCR) system using Python, OpenCV and scikit-learn

    Original publish date: Jun, 2023
    Version: v1.1
    License: MIT
    Author: Alejandro Asensio Pérez
    Tags: Computer Vision, Optical Character Recognition, Machine Learning, Python

**Disclaimer:** This work was created as part of my academic coursework at King Juan Carlos University. While I claim copyright for the selection and arrangement of the content, the copyright for the original materials used in this work is held by the university and the instructors, Victoria Ruiz and Ángel Sánchez.

### Summary

The notebook provides a foundation for building a system capable of automatically extracting information from highway panels.

### Connect and Engage

Please, feel free to comment on typos, propose improvements or expand the content.

![](https://drive.google.com/uc?export=view&id=1s5p0uid3hLLs9mc4QNs07nNxzLVA8aBX)
*This figure illustrates the anticipated outcome of the character recognition process. On the left, we present an original road sign image captured in a real-world setting. On the right, the individual characters successfully detected and isolated by the classifier are displayed.*

**Notice:** This notebook is designed to run on Google Colab. However, it can be run locally using a virtual python environment with the following Setup.

## Setup and Helper Functions

This section lays the groundwork for the OCR system by:

* Importing libraries: Includes necessary libraries for image processing, machine learning, and visualization.
* Defining helper functions: Establishes custom functions to streamline tasks like plotting confusion matrices and preprocessing images for character recognition.

These preparations ensure a smooth and organized development process for the subsequent character recognition.

In [None]:
# Import standard libraries
import string
import numpy as np
import glob
import cv2
import imutils

# Import libraries for plotting
import matplotlib.pyplot as plt

# Import machine learning libraries
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score

# Import Google Colab patches for OpenCV
from google.colab.patches import cv2_imshow

# Import Google Drive integration
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

In [None]:
# Function to plot confusion matrix
def plot_confusion_matrix(cm, title='Confusion matrix', cmap='Blues'):
  """
  Visualizes the performance of a classification model using a confusion matrix.

  This function generates a color-coded heatmap that displays the counts of true
  positive, true negative, false positive, and false negative predictions,
  providing insights into the model's accuracy and error patterns.

  Args:
      cm: The confusion matrix to plot.
      title: The title of the plot.
      cmap: The color map to use.
  """
  plt.imshow(cm, interpolation='nearest', cmap=plt.cm.get_cmap(cmap))
  plt.title(title)
  tick_marks = np.arange(cm.shape[0])
  plt.xticks(tick_marks, range(cm.shape[0]))
  plt.yticks(tick_marks, range(cm.shape[0]))
  plt.tight_layout()
  plt.ylabel('True label')
  plt.xlabel('Predicted label')

  # Add annotations
  ax = plt.gca()
  for x in range(cm.shape[1]):
    for y in range(cm.shape[0]):
      ax.annotate(str(cm[y, x]), xy=(y, x), horizontalalignment='center',
                      verticalalignment='center')

In [None]:
# Function to process and img for OCR train
def process_image(img, size = 25):
  """
  Prepares an image for character recognition by resizing and binarizing it.

  This function transforms the input image into a format suitable for
  character recognition algorithms by:

  1. Resizing it to a standardized size for consistency.
  2. Converting it to a binary image (black and white) using Otsu's thresholding
     to enhance contrast and simplify feature extraction.
  3. Flattening the image into a 1D array for compatibility with machine learning models.

  Args:
      img: The image to process.

  Returns:
      The processed image.
  """
  img_resized = cv2.resize(img, (size, size))
  _, img_binary = cv2.threshold(img_resized, 0, 1, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
  img_reshaped = img_binary.reshape(1, size * size)

  return img_reshaped

## OCR Classifier

This section focuses on building and evaluating a character classifier to recognize individual characters from images. It explores different machine learning algorithms, including Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), and Random Forest, to determine the most effective approach for character recognition. The goal is to train a model that can accurately predict the character present in an image based on its features.

This classifier forms the core component of the OCR system, enabling the recognition of characters within highway information panels.

### Preparing Training and Validation Data

This section sets up the paths to the training and validation datasets. The `train_path` dictionary organizes the paths for different character categories (numbers, lowercase, uppercase) stored in Google Drive.

![](https://drive.google.com/uc?export=view&id=1z8_9r2em8HdGSgA8dy0BQDeztisVNBN3)
*This figure showcases a diverse set of images used to train the text classifier.*

Please note that the actual data is not available due to copyright related issues. You can find similar examples in the MNIST dataset. Remember that the performance of a machine learning model is heavily dependent on the quality and representativeness of its training data.

> Note: If running this notebook locally, replace the Google Drive paths with local file paths using the `os` module.

In [None]:
train_path = {
  'numbers': ('/content/drive/MyDrive/ocr-system/d/nums', string.digits),
  'lowercase': ('/content/drive/MyDrive/ocr-system/d/lowers', string.ascii_lowercase),
  'uppercase': ('/content/drive/MyDrive/ocr-system/d/uppers', string.ascii_uppercase)
}

validation_path = '/content/drive/MyDrive/ocr-system/d/validation'

panel_path = '/content/drive/MyDrive/ocr-system/d/panels'

train_values = []
train_tags = []

validation_values = []
validation_tags = []

### Loading and Preprocessing Training and Validation Data

This section handles the crucial task of preparing both training and validation datasets for model training and evaluation. It involves loading images, applying preprocessing steps, and organizing the data for effective use.

**Training Data**

The training images for numbers, lowercase, and uppercase characters are loaded and preprocessed. This includes:

1. Resizing: Each image is scaled down to 25x25 pixels to standardize input size.
2. Binarization: A threshold is applied to convert grayscale images into binary (black and white) for simplified feature extraction.
3. Flattening: The binary images are converted into 1-dimensional arrays (1-row binary matrices) for compatibility with machine learning algorithms.
4. Labeling: Each image is associated with its corresponding character label.

**Validation Data**

Similarly, validation images are loaded and preprocessed using the same steps. This validation set serves as an independent dataset to assess the performance of the trained model on unseen data, providing a more realistic estimate of its generalization capabilities.

> Note: For this validation, the number '2' is used as the target character.

By carefully preparing both training and validation data, we ensure that the model is trained on a representative dataset and evaluated on its ability to generalize to new examples.

In [None]:
for category, (path, chars) in train_path.items():
  print(f'Loading {category}: [', end='')
  for char in chars:
    current_path = path + '/' + char
    for img_path in glob.glob(current_path + '/*.png'):
      img_raw = cv2.imread(img_path, 0)
      train_values.append(process_image(img_raw)[0])
      train_tags.append(char)
    print(f'{char}', end='')
  print('] Done')

In [None]:
print(f'Loading validation: [', end='')
for img_path in glob.glob(validation_path + '/' + '*.png'):
  img_raw = cv2.imread(img_path, 0)
  validation_values.append(process_image(img_raw)[0])
  validation_tags.append('2')  # Label all validation images as '2' in these use-case
  print(f'.', end='')
print('] Done')

### Preparing Data for Model Training

This section formats the preprocessed data for use with machine learning algorithms. Currently, the code utilizes all available samples for training, meaning there's no separate testing set (`x_test`) allocated. As a result, model evaluation relies solely on the validation set (`validation_tags`).

It's important to note that using all data for training can lead to overfitting, where the model performs well on the training data but poorly on unseen data. Ideally, a portion of the data should be reserved for testing to assess the model's generalization capabilities.

In [None]:
X_train, _, y_train, _ = train_test_split(train_values, train_tags, test_size=1, random_state=0)

While the current setup uses raw pixel values, further preprocessing can sometimes improve model performance. For instance, standardizing the data can be beneficial for certain algorithms.

**Standardization?**

Standardization of a dataset is often necessary for machine learning estimators because many of them assume that the features follow a standard normal distribution (Gaussian with 0 mean and unit variance). If the features don't roughly adhere to this distribution, the model's performance might suffer.

You can achieve standardization using `StandardScaler` as shown below:

```
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train = sc.fit_transform(train_values)
X_test = sc.transform(train_tags)
```

### Model Training and Evaluation

This section delves into the core of the character recognition process, involving:

* Model training: Utilizes the preprocessed training data to train various machine learning models (LDA, KNN, Random Forest) for character classification.
* Model evaluation: Assesses the performance of the trained models using validation data and metrics like confusion matrices and accuracy scores.

This systematic approach aims to identify the most effective model for accurately recognizing characters within highway information panels.

#### Training the LDA Model

We begin by training a Linear Discriminant Analysis (LDA) model for character recognition. In this initial step, we utilize all available training data to fit the LDA model. This means that the model learns to distinguish between characters based on the patterns and features present in the entire training dataset.

**Dimensionality Reduction with LDA**

A key aspect of LDA is its ability to reduce the dimensionality of feature vectors. The LinearDiscriminantAnalysis.fit() method finds the optimal LDA projection matrix. We then use the transform method of the fitted LDA object to project the original matrix into a lower-dimensional space, resulting in the Column-Row (CR) Matrix. This dimensionality reduction can improve model efficiency and potentially enhance performance by focusing on the most discriminative features.

In [None]:
lda = LinearDiscriminantAnalysis(n_components=1)

X_train_lda = lda.fit_transform(X_train, y_train)

#### LDA values prediction and Validation

After training the LDA model and preprocessing the validation images, we use the trained model to predict the character present in each validation image. The `lda.predict(validation_values)` function takes the preprocessed validation images as input and generates a list of predicted character labels. These predictions represent the model's best guess for the character in each image based on the patterns it learned from the training data.

By comparing these predictions to the actual character labels (which are all '2' in this case), we can evaluate the accuracy of the LDA model on unseen data.

Evaluating LDA Performance

To gain a deeper understanding of the LDA model's performance, we generate a confusion matrix and calculate the accuracy score.

* Confusion Matrix: The confusion matrix provides a detailed breakdown of the model's predictions, showing the counts of true positives, true negatives, false positives, and false negatives. This helps identify specific areas where the model might be making errors.
* Accuracy Score: The accuracy score represents the overall percentage of correct predictions made by the model on the validation set. It provides a simple and intuitive metric to assess the model's overall performance.

By analyzing these evaluation metrics, we can gain insights into the strengths and weaknesses of the LDA model and make informed decisions about potential improvements or alternative approaches.

In [None]:
# Value prediction
y_pred_lda = lda.predict(validation_values)
print(y_pred_lda)

# Validation
cm = confusion_matrix(validation_tags, y_pred_lda)
plot_confusion_matrix(cm)
print('Accuracy: ' + str(accuracy_score(validation_tags, y_pred_lda)))

#### Evaluating a KNN Classifier with 5 Neighbors

We assess the performance of a KNN classifier with k=5 using the `.score` method on the validation set. This provides a direct measure of the model's accuracy on unseen data, helping us gauge its generalization capabilities.

In [None]:
# Training
knn = KNeighborsClassifier(5)
knn.fit(X_train, y_train)

# Validation
score = knn.score(validation_values, validation_tags)
print('Accuracy: ' + str(score))

#### Exploring Random Forest Classification with Depth 15 and 10 Estimators

We employ a Random Forest classifier with a maximum depth of 15 and 10 decision trees (estimators). Due to the inherent randomness in the algorithm's tree construction, the performance metrics (such as accuracy) may vary slightly between different executions.


In [None]:
# Training
rf = RandomForestClassifier(max_depth=15, n_estimators=10, max_features=1)
rf.fit(X_train, y_train)

# Validation
score = rf.score(validation_values, validation_tags)
print('Accuracy: ' + str(score))

> Note: KNN achieves the highest accuracy, due to being a multivariate approach.

## Automatic Road Panel Reading

This section delves into the development of a system capable of automatically identifying and interpreting road panels. By leveraging image processing techniques and machine learning algorithms, we aim to extract crucial information from these panels, such as warnings and directions.

![](https://drive.google.com/uc?export=view&id=1ppweByOCBLyfy6o_Jn8oE-scVFbBbFbY)
*This photograph captures a typical roadside scene featuring two distinct signs. Our focus lies on the sign with high contrast between the font color and background.*

### Extract and Recognize Characters from Images

* Contour detection: We utilize thresholding and OpenCV's contour detection utilities to identify potential character regions within the image.
* Size filtering: Only contours with dimensions similar to characters are considered for further processing.
* Character recognition: Extracted regions are fed into a character recognition algorithm for prediction.
* Result visualization: Recognized characters are overlaid onto the original image for visualization.

**Text Line Detection**

For improved text analysis, consider incorporating an inline/outline technique estimator like RANSAC (Random Sample Consensus) to identify and align text lines within the image.

In [None]:
features = []

print('Loading cropped images', end='')
for img in glob.glob(panel_path + '/' + '*.png'):
  img_raw = cv2.imread(img)

  # Turn the image gray, apply a binary threshold and finf contours
  img_gray = cv2.cvtColor(img_raw, cv2.COLOR_BGR2GRAY)
  _, img_binary = cv2.threshold(img_gray, 127, 255, cv2.THRESH_BINARY)
  contours, _ = cv2.findContours(img_binary, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

  # For every contour found
  for cnt in contours:
    x,y,w,h = cv2.boundingRect(cnt)

    # If it has a character size
    if (w>5 and h>5) and (w<50 and h<50):
      # Crop the character section
      char = img_binary[y:y+h,x:x+w]

      # Resize keeping the aspect ratio
      char_resized = imutils.resize(char, width=50, inter=cv2.INTER_AREA)

      # Canvas where to place the character so that it has the same format as the training data
      char_canvas = np.zeros((150, 150), dtype=np.uint8)

      # Calculate padding
      x_i = int((150 - 50) / 2)
      x_f = x_i + 50
      y_i = int((150 - char_resized.shape[0]) / 2)
      y_f = y_i + char_resized.shape[0]

      try:
        char_canvas[y_i:y_f, x_i:x_f] = char_resized
      except:
        pass  # Character exceeds the bounds

      # Final processing
      char_processed = process_image(char_canvas)
      features.append(char_processed[0])

      # Character prediction with the proposed algorithms
      char_lda = lda.predict(char_processed)
      char_knn = knn.predict(char_processed)
      char_rf = rf.predict(char_processed)

      # Write the character into the image at the beginning of the contour
      # KNN proved to be the best classifier, so it is the one used to print the character
      cv2.putText(img_raw, (char_knn[0]), (x, y), cv2.FONT_HERSHEY_SIMPLEX,
                  .5, (255, 255, 255), 2, cv2.LINE_AA)

  # Final image with all predicted characters
  cv2_imshow(img_final)

![](https://drive.google.com/uc?export=view&id=1n0O4YMA6Jy3PaoQ1ZS6PKqh5TLFBTs3q)
*This figure depicts the entire panel recognition process, including inline character detection. From left to right, binary thresholding separates characters, followed by detection and alignment of character contours. Finally, the rightmost section displays the predicted characters overlaid onto the panel, revealing the extracted text.*

**Thanks for following along!** 🙂 If you have implemented a similar approach or have ideas for improvement, I'd love to see your code.

Also, if you find the work helpful and would like to cite them, you can use the following bibtex:

```bibtex
@misc{aaseper2024mlhipr,
   title        = {{Machine Learning-based Highway Information Panel Reading}},
   author       = {Alejandro Asensio},
   year         = 2024,
   howpublished = {\url{https://drive.google.com/file/d/13Vh_FzU2B65amu_QCTyWuon2wn0YgBSY/view?usp=sharing}}
}
```

© Copyright 2024 Alejandro Asensio Pérez.