# Recon

`Richard Rivaldo 13519185`

`Informatics Engineering Institut Teknologi Bandung`

**Recon** is a `Text Recognition` project on building a `Deep Learning` model of recognizing texts containing alphabets in a picture. Without using the infamous `Tesseract`, this project will try to apply and use `Convolutional Recurrent Neural Network` which combines both CNN and RNN in its pipeline. I also plan to use `CTC Loss` and `Attention Model` in the project. This is actually my first project in `Computer Vision`, let's see how it goes!

The repository of the project: [Recon](https://github.com/RichardRivaldo/Recon). The dataset used in this project can also be found in the repository.

# Preparations

**Libraries**

In [None]:
import os
import cv2
import numpy as np
import tensorflow as tf
from pathlib import Path
from PIL import Image as im
from google.colab import drive
import matplotlib.pyplot as plt
from tensorflow.keras import Sequential
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K
from nltk.metrics.distance import edit_distance
from scipy.ndimage import interpolation as inter
from tensorflow.keras.optimizers import Adam, SGD
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.layers import Dense, LSTM, BatchNormalization, Input, Conv2D, MaxPool2D, Lambda, Bidirectional, Layer

**Mount Drive**

In [None]:
drive.mount("/content/drive")

Mounted at /content/drive


**Path to Images Data**

In [None]:
%rm -rf images
!unzip "/content/drive/MyDrive/GaIB/Recon/images.zip"

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  inflating: images/95_Banes_5758.jpg  
  inflating: images/94_Undisclosed_82317.jpg  
  inflating: images/93_Affiliating_1411.jpg  
  inflating: images/92_COEQUALS_14685.jpg  
  inflating: images/91_Shadowier_69823.jpg  
  inflating: images/90_acquired_811.jpg  
  inflating: images/89_snoopier_72273.jpg  
  inflating: images/88_HUGHES_37216.jpg  
  inflating: images/87_Depopulates_20746.jpg  
  inflating: images/86_REVIEWS_65747.jpg  
  inflating: images/85_Craniums_17804.jpg  
  inflating: images/84_MISCONCEIVED_48832.jpg  
  inflating: images/83_philately_57195.jpg  
  inflating: images/82_Bullfrogs_10133.jpg  
  inflating: images/81_dengue_20578.jpg  
  inflating: images/80_preregistering_59794.jpg  
  inflating: images/79_FOUNDER_30506.jpg  
  inflating: images/78_millimeters_48468.jpg  
  inflating: images/77_endear_25654.jpg  
  inflating: images/76_MAHDI_46074.jpg  
  inflating: images/75_BULLETINS_10119.jpg  
  i

In [None]:
IMAGES_PATH = "/content/images"

# Image Preprocessing

**Get Images Path**

In [None]:
def get_images_path(path):
    # Get all images file name 
    images_name = [image_name for image_name in os.listdir(path)]
    # Get full path with Python OS if the image labels is only lowercase alphabets
    return [os.path.join(path, image_name) for image_name in images_name if 
            image_name.split("_")[1].isalpha() and image_name.split("_")[1].islower()]

**Reading the Images**

In [None]:
def read_all_images(images_list):
    return [cv2.imread(image_path) for image_path in images_list]

**Deskew Images**

In [None]:
# Determine the score of a certain angle
# To decide and detect which angle the image is currently skewed in
def determine_score(arr, angle):
        data = inter.rotate(arr, angle, reshape=False, order=0)
        histogram = np.sum(data, axis=1)
        score = np.sum((histogram[1:] - histogram[:-1]) ** 2)
        return histogram, score

def deskew_images(all_images, delta=0.08, limit=12):
    rotated_images = []
    for image in all_images:
        # Find the threshold of the image after grayscaled
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
        thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1] 

        # Find the score of the image in a certain skew angle
        scores = []
        angles = np.arange(-limit, limit + delta, delta)
        for angle in angles:
            _, score = determine_score(thresh, angle)
            scores.append(score)
        # Find the best skew angle for the image 
        best_angle = angles[scores.index(max(scores))]

        # Rotate the image based on the best skew angle
        height, width = image.shape[:2]
        # Get the center of the image and generate rotation matrix from it
        center = (width // 2, height // 2)
        M = cv2.getRotationMatrix2D(center, best_angle, 1.0)
        # Rotate the image 
        rotated = cv2.warpAffine(image, M, (width, height), flags=cv2.INTER_LINEAR, borderMode=cv2.BORDER_REPLICATE)
        rotated_images.append(rotated)

    return rotated_images

**Resize Image to Uniform Size**

In [None]:
# Resize with linear interpolation to 100x40 dimension
def resize_images(all_images, height=40, width=100):
    resize_dim = (width, height)
    return [cv2.resize(image, resize_dim, interpolation=cv2.INTER_LINEAR) for image in all_images]

**Convert to Grayscale**

In [None]:
def convert_to_grayscale(all_images):
    return [cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) for image in all_images]

**Binarization**

In [None]:
def binarization(all_images):
    return [cv2.threshold(image, 180, 230, cv2.THRESH_BINARY, cv2.THRESH_OTSU + cv2.THRESH_BINARY)[1] for image in all_images]

**Image Dilation**

In [None]:
# Usually applied only to binarized image
def dilate_images(all_images, kernel_shape=(1, 1), num_iter=1):
    dilated_images = []
    # Dilate the images
    # Dilate basically tries to increase the size of an object in the image
    kernel = np.ones(kernel_shape, np.uint8)
    for image in all_images:
        dilated_images.append(cv2.dilate(image, kernel, iterations=num_iter))
    
    return dilated_images

**Images Erosion**

In [None]:
def erode_images(all_images, kernel_shape=(1, 1), num_iter=1):
    # Erode the images
    # Erode is the inverse of dilatation, where it sets the pixel value of the image to 0
    eroded_images = []
    for image in all_images:
        kernel = np.ones(kernel_shape, np.uint8)
        eroded_image = cv2.erode(image, kernel, iterations=num_iter)
        eroded_images.append(eroded_image)
    
    return eroded_images

**Image Smoothening**

In [None]:
# Combine both binary threshold and Otsu Threshold to smoothen the images
# Also use Gaussian Blur to soften the images
def smoothen_images(all_images):
    smooth_images = []
    for image in all_images:
        _, first_threshold = cv2.threshold(image, cv2.THRESH_BINARY, 200, 255, cv2.THRESH_BINARY)
        _, second_threshold = cv2.threshold(first_threshold, 180, 230, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
        blurred_image = cv2.GaussianBlur(second_threshold, (5, 5), 0)
        _, third_threshold = cv2.threshold(blurred_image, 180, 230, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
        smooth_images.append(third_threshold)
    return smooth_images

**Open-Closed Morphology Transformation**

In [None]:
def transform_morph(all_images):
    transformed_images = []
    for image in all_images:
        # # Apply Adaptive Threshold to the image
        filtered = cv2.threshold(image.astype(np.uint8), 180, 220, cv2.THRESH_OTSU, cv2.THRESH_BINARY)
        kernel = np.ones((1, 1), np.uint8)
        # Apply morphing to the image
        opening = cv2.morphologyEx(image, cv2.MORPH_OPEN, kernel)
        closing = cv2.morphologyEx(opening, cv2.MORPH_CLOSE, kernel)
        transformed_images.append(closing)
    return transformed_images

**Noise Removal**

In [None]:
def noise_removal(all_images):
    # Erode and dilate the image
    eroded_images = erode_images(all_images)
    dilated_images = dilate_images(eroded_images)
    # Get both transformed and smooth images from previous functions
    transformed_images = transform_morph(dilated_images)
    smooth_images = smoothen_images(dilated_images)

    # Zip both of the list and apply OR bitwise to the images in both side
    transformed_smooth_images = list(zip(smooth_images, transformed_images))
    return [cv2.bitwise_or(smooth_image, morph_image) for smooth_image, morph_image in transformed_smooth_images]

**Main Preprocessing Caller**

In [None]:
IMAGE_HEIGHT, IMAGE_WIDTH = (32, 128)

In [None]:
def preprocess_images(all_images):
    # Deskew images
    deskewed_images = deskew_images(all_images)
    # Resize the images
    resized_images = resize_images(deskewed_images, height=IMAGE_HEIGHT, width=IMAGE_WIDTH)
    # Convert the images into grayscaled images
    grayscaled_images = convert_to_grayscale(resized_images)
    # Binarize the images with Otsu Threshold
    # binarized_images = binarization(grayscaled_images)
    # Remove the noises in the images
    # final_images = smoothen_images(binarized_images)

    return grayscaled_images

Phew.. Man, that was one hell of a preprocessing. It's not perfect for some pictures, either words with characters stick together, word too thin, or other case where the image's resolution is bad. Even so, for good images and words, it is great enough to make the word look bolder and clearer than the provided image. Learnt many things from this process, and yep Computer Vision is hard, dude. :"D

Me from the future: turns out that I ended up not using all the preprocessing functions and only some basic techniques :)) Nevertheless, these functions might come in handy for future projects!

# Building the Dataset

**Extract Labels from Files' Names**

In [None]:
def extract_label(images_path):
    # Get the list of all images files in the path by stemming the full path
    return [(Path(label).stem).split("_")[1] for label in images_path]

**Building Dataset**

In [None]:
def build_data(path):
    # Get all images path
    images_list = get_images_path(path)
    # Get all images from files in the path
    raw_images = read_all_images(images_list)
    # Preprocess all images through the same pipeline
    final_images = preprocess_images(raw_images)
    # Extract all labels of the image from the path
    image_labels = extract_label(images_list)

    return np.array(final_images), np.array(image_labels)

**Convert the Label into Sequences**

In [None]:
ALL_CHARS = "abcdefghijklmnopqrstuvwxyz"
def generate_index_char():
    char_dict = {}
    for i in range(len(ALL_CHARS)):
        char_dict[ALL_CHARS[i]] = i
    
    return char_dict

In [None]:
def convert_to_sequence(labels):
    char_dict = generate_index_char()
    all_sequence = []
    
    for label in labels:
        sequence_labels = []
        for char in label:
            sequence_labels.append(char_dict.get(char))
        all_sequence.append(sequence_labels)
    
    return all_sequence

**Generate Timesteps for Bi-LSTM**

In [None]:
# Timesteps are the number of datapoints considered to predict a new data
# In this case, the number of maximum length of sequences in the input data
def generate_timesteps(data_size, steps):
    return np.array([steps] * data_size)

# Modelling

**Modelling the Attention Mechanism**

In [204]:
# https://stackoverflow.com/questions/62948332/how-to-add-attention-layer-to-a-bi-lstm
class Attention(Layer):
    # Constructor of the layer
    def __init__(self, return_sequences=True):
        self.return_sequences = return_sequences
        super(Attention, self).__init__()
    # Build the layer based on the Layer super class
    def build(self, input_shape):
        self.W=self.add_weight(name="att_weight", shape=(input_shape[-1], 1),
                               initializer="normal")
        self.b=self.add_weight(name="att_bias", shape=(input_shape[1], 1),
                               initializer="zeros")
        super(Attention, self).build(input_shape)
    # Caller
    def call(self, x):
        e = K.tanh(K.dot(x, self.W) + self.b)
        a = K.softmax(e, axis=1)
        output = x * a
        if self.return_sequences:
            return output
        return K.sum(output, axis=1)
    # Config
    def get_config(self):
        config = super().get_config().copy()
        config.update({
            'return_sequences': self.return_sequences 
        })
        return config

**Construct the CRNN Model with Connectionist Temporal Classification (CTC) Loss**

In [205]:
def construct_model(n_units=64, dropouts=0.2, MAX_LABEL_LEN=23):
    # Input tensor with image's dimensions, the last dimension is for grayscaled image
    input_layer = Input(shape=(IMAGE_HEIGHT, IMAGE_WIDTH, 1))
    # First downsampling block, the Conv2D will use kernel size of (3, 3)
    downsampling_1 = Conv2D(n_units, (3, 3), activation="relu", padding="same")(input_layer)
    pooling_1 = MaxPool2D(pool_size=(2, 2), strides=2)(downsampling_1)
    # Second downsampling block, same as before
    downsampling_2 = Conv2D(n_units * 2, (3, 3), activation="relu", padding="same")(pooling_1)
    pooling_2 = MaxPool2D(pool_size=(2, 2), strides=2)(downsampling_2)
    # Third downsampling block, the Conv2D will use kernel size of (3, 3)
    downsampling_3_1 = Conv2D(n_units * 4, (3, 3), activation="relu", padding="same")(pooling_2)
    downsampling_3_2 = Conv2D(n_units * 4, (3, 3), activation="relu", padding="same")(downsampling_3_1)
    pooling_3 = MaxPool2D(pool_size=(2, 1))(downsampling_3_2)
    # Fourth downsampling + batch normalization to ease operations
    downsampling_4 = Conv2D(n_units * 4, (3, 3), activation="relu", padding="same")(pooling_3)
    normalize_1 = BatchNormalization()(downsampling_4)
    # Fifth downsampling, normalized + applied pooling
    downsampling_5 = Conv2D(n_units * 8, (3, 3), activation="relu", padding="same")(normalize_1)
    normalize_2 = BatchNormalization()(downsampling_5)
    pooling_4 = MaxPool2D(pool_size=(2, 1))(normalize_2)
    # Last CNN layer
    downsampling_6 = Conv2D(n_units * 8, (2, 2), activation="relu")(pooling_4)
    # Squeeze the output of CNN before
    lambda_squeeze = Lambda(lambda x: K.squeeze(x, 1))(downsampling_6)
    # Beginning the RNN layers
    bi_lstm_1 = Bidirectional(LSTM(n_units * 2, return_sequences=True, dropout=dropouts))(lambda_squeeze)
    bi_lstm_2 = Bidirectional(LSTM(n_units * 4, return_sequences=True, dropout=dropouts))(bi_lstm_1)
    bi_lstm_3 = Bidirectional(LSTM(n_units * 8, return_sequences=True, dropout=dropouts))(bi_lstm_2)
    # Attention layer for LSTM
    attention = Attention(return_sequences=True)(bi_lstm_3)
    # Dense output layer with Softmax activation function to find chars with highest probabilities
    # The units will be 27 (number of alphabets + 1 padding element)
    output_layer = Dense(27, activation="softmax")(attention)

    # Define the functional model to use in testing phase
    test_model = Model(input_layer, output_layer)

    # Define the CTC Loss function
    # Need 4 args: Predicted output, ground truth label, steps for both input and output
    def ctc_loss(args):
        y_pred, labels, input_length, label_length = args
        return K.ctc_batch_cost(labels, y_pred, input_length, label_length)

    # Define the training model additional inputs for CTC Loss
    labels = Input(shape=[MAX_LABEL_LEN], dtype="float32")
    input_length = Input(shape=[1], dtype="int64")    
    label_length = Input(shape=[1], dtype="int64")

    # Define the loss output with Lambda function for CTC Loss
    loss_function = Lambda(ctc_loss, output_shape=(1, ), name="ctc")([output_layer, labels, input_length, label_length])

    # Create the model used in training session
    training_model = Model(inputs=[input_layer, labels, input_length, label_length], outputs=loss_function)

    return training_model, test_model

**Train and Fit the Model**

In [206]:
def train_and_fit(epochs=50, batch_size=256):
    # Initialize model saving checkpoint, callbacks list, and file path to save only the best parameters
    # The metric to determine which model is the best is the Validation Loss score 
    MODEL_SAVE_FILE_PATH = "Recon.hdf5"
    checkpoint = ModelCheckpoint(filepath=MODEL_SAVE_FILE_PATH, monitor='val_loss', verbose=1, save_best_only=True, mode='auto')
    callbacks_list = [checkpoint]

    training_model.fit(
    x=[training_images, training_sequences, training_timesteps, training_label_length],
    y=np.zeros(len(training_images)),
    batch_size=batch_size,
    epochs=epochs,
    validation_data=(
        [validation_images, validation_sequences, validation_timesteps, validation_label_length],
        [np.zeros(len(validation_images))]),
    verbose=1, callbacks=callbacks_list)

# Metrics

**Predict All Validation Data**

In [None]:
def get_predicted_label(prediction):
    full_text = []
    for pred in prediction:
        if int(pred) != -1:
            full_text.append(ALL_CHARS[int(pred)])
    return "".join(full_text)

In [None]:
def get_all_predicted_labels():
    # Make and decode the predictions
    predictions = test_model.predict(validation_images)
    decoded_preds = K.get_value(K.ctc_decode(predictions, 
                                        input_length=np.ones(predictions.shape[0]) * predictions.shape[1],
                                        greedy=True)[0][0])
    
    # Store all predicted labels
    all_predictions = []
    for prediction in decoded_preds:
        all_predictions.append(get_predicted_label(prediction))
    
    return np.array(all_predictions)

**Calculating Accuracy Metrics**

In [None]:
def calculate_accuracy(validation_label, all_predictions):
    # Calculate the accuracy
    accuracy = (all_predictions == validation_label).mean()

    return accuracy

**Calculating Mean Leveinshtein Distance Metrics**

In [None]:
def calculate_mean_levenshtein_distance(labels, predictions):
    # Recursively calculate Levenshtein
    # The manual recursion stucks for words with duplicate characters sticking together
    def levenshtein_score(label, predicted):
        # if not label:
        #     return len(predicted)
        # if not predicted:
        #     return len(label)
        # if label[-1] == predicted[-1]:
        #     cost = 0
        # else:
        #     cost = 1
        
        # return min([levenshtein_score(label[:-1], predicted) + 1,
        #             levenshtein_score(label, predicted[:-1]) + 1, 
        #             levenshtein_score(label[:-1], predicted[:-1]) + cost])
        return edit_distance(label, predicted)
    assert(len(labels) == len(predictions))

    # Calculate total distances for all strings in the labels and predictions
    total = 0
    for label, prediction in list(zip(labels, predictions)):
        total += levenshtein_score(label, prediction)
    
    # Return the mean distance
    return total / len(labels)

**Main Metrics Caller**

In [None]:
def output_metrics(validation_labels):
    # Get all predicted labels
    predictions = get_all_predicted_labels()
    # Accuracy
    print(f"Accuracy of the Model: {calculate_accuracy(validation_labels, predictions)}")
    # Mean Levenshtein Distance
    print(f"Mean Levenshtein Distance: {calculate_mean_levenshtein_distance(validation_labels, predictions)}")

# Main Pipeline

**Read All Images Data and Labels**

In [None]:
# Get all images data and labels
all_images_data, all_images_labels = build_data(IMAGES_PATH)

In [None]:
MAX_LABEL_LEN = max([len(label) for label in all_images_labels])

**Create Training Data**

In [207]:
# Create the training data with first 14000 images
n = 14000
training_images = all_images_data[:n]
training_labels = all_images_labels[:n]
training_label_length = np.array([len(text) for text in training_labels])

In [208]:
# Normalize the image, convert label to sequences and generate timesteps
training_images = training_images / 255.
training_sequences = convert_to_sequence(training_labels)
training_timesteps = generate_timesteps(len(training_images), MAX_LABEL_LEN)

In [209]:
# Pad the sequences with post-sequence method
# The value will be the number of unique chars, that is 26, since 0 is already used
training_sequences = pad_sequences(training_sequences, maxlen=MAX_LABEL_LEN, padding='post', value = 26)

**Create Validation**

In [210]:
# Create the validation data with remaining images
validation_images = all_images_data[n:]
validation_labels = all_images_labels[n:]
validation_label_length = np.array([len(text) for text in validation_labels])

In [211]:
# Normalize the image, convert label to sequences and generate timesteps
validation_images = validation_images / 255.
validation_sequences = convert_to_sequence(validation_labels)
validation_timesteps = generate_timesteps(len(validation_images), MAX_LABEL_LEN)

In [212]:
# Pad the sequences with post-sequence method
# The value will be the number of unique chars, that is 26, since 0 is already used
validation_sequences = pad_sequences(validation_sequences, maxlen=MAX_LABEL_LEN, padding='post', value = 26)

**Initiate the Model**

In [213]:
training_model, test_model = construct_model(MAX_LABEL_LEN=MAX_LABEL_LEN)

**Check the Models Summary**

In [214]:
training_model.summary()

Model: "model_13"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_26 (InputLayer)           [(None, 32, 128, 1)] 0                                            
__________________________________________________________________________________________________
conv2d_49 (Conv2D)              (None, 32, 128, 64)  640         input_26[0][0]                   
__________________________________________________________________________________________________
max_pooling2d_28 (MaxPooling2D) (None, 16, 64, 64)   0           conv2d_49[0][0]                  
__________________________________________________________________________________________________
conv2d_50 (Conv2D)              (None, 16, 64, 128)  73856       max_pooling2d_28[0][0]           
___________________________________________________________________________________________

**Compile the Model**

In [215]:
training_model.compile(loss={"ctc": lambda y, y_pred: y_pred}, optimizer="adam")

**Train, Fit, and Save the Model**

In [234]:
train_and_fit()

Epoch 1/50

Epoch 00001: val_loss improved from inf to 5.78577, saving model to Recon.hdf5
Epoch 2/50

Epoch 00002: val_loss improved from 5.78577 to 5.61473, saving model to Recon.hdf5
Epoch 3/50

Epoch 00003: val_loss improved from 5.61473 to 5.51683, saving model to Recon.hdf5
Epoch 4/50

Epoch 00004: val_loss did not improve from 5.51683
Epoch 5/50

Epoch 00005: val_loss did not improve from 5.51683
Epoch 6/50

Epoch 00006: val_loss did not improve from 5.51683
Epoch 7/50

Epoch 00007: val_loss did not improve from 5.51683
Epoch 8/50

Epoch 00008: val_loss did not improve from 5.51683
Epoch 9/50

Epoch 00009: val_loss did not improve from 5.51683
Epoch 10/50

Epoch 00010: val_loss did not improve from 5.51683
Epoch 11/50

Epoch 00011: val_loss did not improve from 5.51683
Epoch 12/50

Epoch 00012: val_loss did not improve from 5.51683
Epoch 13/50

Epoch 00013: val_loss did not improve from 5.51683
Epoch 14/50

Epoch 00014: val_loss did not improve from 5.51683
Epoch 15/50

Epoch 00

# Inference and Testing

**Some Inferences**

In [230]:
# Load the trained model weights from the saved file
test_model.load_weights("Recon.hdf5")
test_model.summary()

Model: "model_12"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_26 (InputLayer)        [(None, 32, 128, 1)]      0         
_________________________________________________________________
conv2d_49 (Conv2D)           (None, 32, 128, 64)       640       
_________________________________________________________________
max_pooling2d_28 (MaxPooling (None, 16, 64, 64)        0         
_________________________________________________________________
conv2d_50 (Conv2D)           (None, 16, 64, 128)       73856     
_________________________________________________________________
max_pooling2d_29 (MaxPooling (None, 8, 32, 128)        0         
_________________________________________________________________
conv2d_51 (Conv2D)           (None, 8, 32, 256)        295168    
_________________________________________________________________
conv2d_52 (Conv2D)           (None, 8, 32, 256)        590

In [231]:
# Randomize ten indices to index the image
random_indices = np.random.randint(len(validation_images), size=10)
# Get the prediction from the model of random indices images
predictions = test_model.predict(validation_images[random_indices])
# Decode the prediction output with CTC Loss Decoder
decoded_preds = K.get_value(K.ctc_decode(predictions, 
                                        input_length=np.ones(predictions.shape[0]) * predictions.shape[1],
                                        greedy=True)[0][0])

In [232]:
# Output the prediction results
curr_index = 0
for prediction in decoded_preds:
    predicted_text = get_predicted_label(prediction)
    print(f"True Label: {validation_labels[random_indices[curr_index]]}")
    print(f"Predicted Label: {predicted_text}\n")
    curr_index += 1

True Label: fanged
Predicted Label: fanged

True Label: subjoins
Predicted Label: guliains

True Label: hydrocephalus
Predicted Label: huratohes

True Label: guantanamo
Predicted Label: guantonamo

True Label: travelogue
Predicted Label: travelocues

True Label: hogarth
Predicted Label: hogarth

True Label: enliven
Predicted Label: enliven

True Label: hoer
Predicted Label: hoer

True Label: torrider
Predicted Label: torrider

True Label: mingo
Predicted Label: mingo



**Output the Metrics**

In [233]:
output_metrics(validation_labels)

Accuracy of the Model: 0.5229598530569405
Mean Levenshtein Distance: 1.2377328785095776


# Takeaways

I actually used up all my GPU limits in Colab and cannot try more experiments to do the training (I actually also forgot that there are limits to its usage bruh) `:')`. The model is not actually that bad, in my opinion. 

Anyway, the model is not actually that bad, in my opinion. The average edit distance needed for the model's validation data is ~1 character with the accuracy of slightly more than 50%.

This means that most inaccuracies made by the model is actually only because one different character, and that is actually great. Nonetheless, the validation loss is still high (~5). I was actually preparing to feed more data into the model and the limit notification pops up right away `T_T`.