# **Automate extraction of handwritten text from an image.**
CNN Model - Kaggle
---
---

**Name Of The Student		:**	Viddesh Kamble

**Internship Project Title		:** Automate extraction of handwritten text from an image

**Name of the Organization	:** TCS iON

**Name of the Industry Mentor	:** Debashis Roy Sir

**Name of the Institute		:** B.K Birla college

**Name of the Academic Mentor  :** Swapna Ma’am

---
---

## **Introduction:**
An optical character recognition problem is basically a type of image-based sequence recognition problem. And for sequence recognition problem, most suited neural networks are recurrent neural networks(RNN) while for an image-based problem most suited are convolution neural networks(CNN). To cop up with the OCR problems we need to combine both of these CNN and RNN and use SoftMax.

## **Steps being followed are as follows:**
The complete implementation of the project can be divided into the following major steps:

**1.	Collecting the Dataset.**

**2.	Uploading the Dataset on directory and acessing it.**

**3.	Preprocessing the data.**

**4.	Splitting the datset into train, test and validation datsets.**

**5.	Creating the defining the model/network architecture.**

**6.	Training the model.**

**7.	Saving the ML model.**

**8.	Testing the model.**

**9.	Prediction.**

**10.	Plotting the loss and accuracy plots.**


## **Step 1: Collecting the Dataset**
we used IAM handwritten datset. This is good dataset with several preprocessing already done.

To download the dataset either you can directly download from [this link](https://www.kaggle.com/datasets/tejasreddy/iam-handwriting-top50) or use the following commands to download the data and unzip.

My data subset which I am using for training and testing having around 4899 images and 50 writers.

In [None]:
# # Dataset :- https://www.kaggle.com/datasets/tejasreddy/iam-handwriting-top50

# # In Kaggle or Google drive add prefix "!" for pip command.

# # Below are all the packages or requirements 

# # pip install numpy==1.24.3
# # pip install Pillow==10.0.0
# # pip install keras==2.13.1
# # pip install tensorflow==2.13.0
# # pip install scikit-learn==1.3.0
# # pip install matplotlib==3.7.2
# # pip install pandas==2.1.0
# # pip install openpyxl==3.1.2

# import numpy 
# import PIL
# import tensorflow
# import keras
# import sklearn
# import matplotlib
# import pandas
# import openpyxl
# import sys


# print(f"Numpy version:- {numpy.__version__}") # 1.24.3
# print(f"Pillow version:- {PIL.__version__}") # 10.0.0
# print(f"Keras version:- {keras.__version__}") # 2.13.1
# print(f"Tensorflow version:- {tensorflow.__version__}") # 2.13.0
# print(f"Sklearn version:- {sklearn.__version__}") # 1.3.0
# print(f"Matplotlib version:- {matplotlib.__version__}") # 3.7.2
# print(f"Pandas version:- {pandas.__version__}") # 2.1.0
# print(f"openpyxl version:- {openpyxl.__version__}") # 3.1.2
# print(f"Python version:- {sys.version.split()[0]}") # 3.10.12

In [None]:
# Importing all the necessary libraries needed for Automate extraction of handwritten text from an image

import os # Used for interacting with the operating system, such as navigating directories and file management.
import glob # Useful for file pattern matching and retrieving a list of filenames.
import numpy as np # A fundamental library for numerical computations and array manipulation.
# from matplotlib import image as mpimg
from matplotlib import pyplot as plt # For data visualization
from sklearn.preprocessing import LabelEncoder # Helpful for encoding categorical labels (e.g., writer IDs) as numerical values.
from sklearn.model_selection import train_test_split #  Used to split your data into training and testing sets.
from PIL import Image # Part of the Python Imaging Library (PIL) and widely used for image processing and manipulation.
from random import sample # Randomly selecting samples from your dataset.
from tensorflow.keras.utils import to_categorical # Used for one-hot encoding categorical labels.
from sklearn.utils import shuffle # Shuffling dataset
from tensorflow.image import resize # Resizing the images for pre-processing and feeding to model.
from tensorflow.keras.models import Sequential, load_model # Required for building Sequential ML model and load saved model
from tensorflow.keras import layers # Contains various layers for building neural networks.
from tensorflow.keras.callbacks import ModelCheckpoint # Used for saving the model after every epochs.
import pandas as pd # Useful for data manipulation and analysis, Saving the trained model history as excel file.

## **Step 2: Uploading the Dataset on directory and acessing it.**

After uploading the dataset accessing the content stored.

These are the forms in the dataset for quick access from manipulation of the file names on each column. Let's create a dictionary with form and writer mapping.

In [None]:
d = {}

# "C:\\TCS_internship\\OCR\Dataset\\formss.txt"
with open("../input/iam-handwriting-top50/forms_for_parsing.txt") as forms:
    for line in forms:
        key = line.split(" ")[0]
        writer = line.split(" ")[1]
        d[key] = writer

print(len(d.keys()))

## **Step 3: Preprocessing the Data.**

After fetching the dataset we will preprocess the data.

All file-names list and target-writer names list are created.

In [None]:
temp = []
target_list = []
# Getting the relative path below

# Dataset\\data_subset
path_to_files = os.path.join("../input/iam-handwriting-top50/data_subset/data_subset", "*")
# ../input/iam-handwriting-top50/data_subset/data_subset/a01-000u-s00-00.png
for file_name in sorted(glob.glob(path_to_files)):
    # print(file_name) # ../input/iam-handwriting-top50/data_subset/data_subset/a01-000u-s00-00.png
    temp.append(file_name) # ['../input/iam-handwriting-top50/data_subset/data_subset/a01-000u-s00-00.png', '../input/iam-handwriting-top50/data_subset/data_subset/a01-000u-s00-01.png', ...
    image_name = file_name.split("/")[-1] # a01-000u-s00-00.png
    # print(image_name) # # a01-000u-s00-00.png
    file, ext = os.path.splitext(image_name) # Split the extension from a pathname.
    # print(file, ext) # It gives filename, extention
    parts = file.split("-")
    # print(parts) # ['a01', '000u', 's00', '00']
    form = parts[0] + "-" + parts[1]
    # print(form) # a01-000u

    for key in d:
        if key == form:
            target_list.append(str(d[form]))

# print(target_list) # ['000', '000', '000', '000', '000', '000', '000', ...

# print(temp)
img_files = np.array(temp)
img_targets = np.array(target_list)
print(f"Shape of the image files:- {img_files.shape}")
print(f"Shape of the target image files:- {img_targets.shape}")

## **Visualising of images.**
Let's visualize the image data.

In [None]:
# Visualizing the image data


for file_name in img_files[:4]:
    img = plt.imread(file_name)
    plt.figure(figsize = (10, 10))
    plt.axis("on") # plt.axis("off")
    plt.imshow(img, cmap = "gray")

Good to observe that there are no categorical data. So, normalisation is done using label encoder.

In [None]:
# Label Encoding the writer names for one hot encoding later

encoder = LabelEncoder()
encoded_Y = encoder.fit_transform(img_targets)

print(img_files[:1], img_targets[:5], encoded_Y[:5])

## **Setp 4: Splitting the datset into train, test and validation datsets.**

Splitting of data into training and validation sets for cross validation with 4:1:1 ratio.

After splitting we have 3233 train images, 833 test and validation images

In [None]:
train_files, rem_files, train_targets, rem_targets = train_test_split(img_files, encoded_Y, train_size = 0.66, random_state = 52, shuffle = True)

validation_files, test_files, validation_targets, test_targets = train_test_split(rem_files, rem_targets, train_size = 0.5, random_state = 24, shuffle = True)

print("#" * 20)
print(f"Train files shape:- {train_files.shape}")
print(f"Validation files shape:- {validation_files.shape}")
print(f"Test files shape:- {test_files.shape}")

print("#" * 20)
print(f"Train targets shape:- {train_targets.shape}")
print(f"Validation targets shape:- {validation_targets.shape}")
print(f"Test targets shape:- {test_targets.shape}")


## **Generator Helper Function as Input to Model**

We take patches of data, each of size 113 x 113. This generator function is implemented for that purpose.

In [None]:
batch_size = 16
num_classes = 50 # Number of writers are 50.

def generate_data(samples, target_files, batch_size = batch_size, factor = 0.1):
    num_samples = len(samples)
    while True: # Loop forever so the generator never terminates
        for offset in range(0, num_samples, batch_size):
            batch_samples = samples[offset : offset + batch_size]
            batch_targets = target_files[offset : offset + batch_size]

            images = []
            targets = []

            for i in range(len(batch_samples)):
                batch_sample = batch_samples[i]
                batch_target = batch_targets[i]
                im = Image.open(batch_sample)
                cur_width = im.size[0]
                cur_height = im.size[1]

                # print(cur_width, cur_height)
                height_fac = 113 / cur_height

                new_width = int(cur_width * height_fac)
                size = new_width, 113

                # # PIL.Image.Resampling.LANCZOS
                imresize = im.resize((size), Image.LANCZOS) ## Resize so height = 113 while keeping aspect ratio 
                now_width = imresize.size[0]
                now_height = imresize.size[1]
                # Generate crops of size 113x113 from this resized image and keep random 10% of crops

                # total x start points are from 0 to width -113
                avail_x_points = list(range(0, now_width - 113))

                # Pick random x%
                pick_num = int(len(avail_x_points) * factor)

                # Now pick
                random_startx = sample(avail_x_points, pick_num)

                for start in random_startx:
                    imcrop = imresize.crop((start, 0, start + 113, 113))
                    images.append(np.asarray(imcrop))
                    targets.append(batch_target)

            # trim image to only see section with road
            X_train = np.array(images)
            y_train = np.array(targets)

            # reshaping X_train for feeding in later
            X_train = X_train.reshape(X_train.shape[0], 113, 113, 1)
            #convert to float and normalize
            X_train = X_train.astype("float32")
            X_train = X_train / 255 # Normalizing

            # One hot encode y
            y_train = to_categorical(y_train, num_classes)

            yield shuffle(X_train, y_train)

For training and testing,  generator helper function is called with the intent of making train and test generator data.

In [None]:
# Generate train, test and validation data

train_generator = generate_data(train_files, train_targets, batch_size = batch_size, factor = 0.3)
validation_generator = generate_data(validation_files, validation_targets, batch_size = batch_size, factor = 0.3)
test_generator = generate_data(test_files, test_targets, batch_size = batch_size, factor = 0.1)

## **Step 5:	Creating the defining the model/network architecture.**

Adding the Pooling Layers, CNN Layers, Activation Functions, etc.

Building tensorflow.keras model and printing the model summary

In [None]:
def resize_image(image):
    return resize(image, [56, 56])

row, col, ch = 113, 113, 1 # Rows, columns, channels

model = Sequential(
    [
        layers.ZeroPadding2D(padding = (1, 1), input_shape = (row, col, ch)),

        # Resizing the data within the neural network.
        layers.Lambda(resize_image, name = "Image_resize"), # resizing of the images allows easy computation.
        
        # CNN Model
        layers.Convolution2D(filters = 32, kernel_size = (5, 5), strides = (2, 2), padding = "same", name = "conv1"),
        layers.Activation(activation = "relu"),
        layers.MaxPooling2D(pool_size = (2, 2), strides = (2, 2), padding = "valid", name = "pool1"),

        layers.Convolution2D(filters = 64, kernel_size = (3, 3), strides = (1, 1), padding = "same", name = "conv2"),
        layers.Activation(activation = "relu"),
        layers.MaxPooling2D(pool_size = (2, 2), strides = (2, 2), name = "pool2"),

        layers.Convolution2D(filters = 128, kernel_size = (3, 3), strides = (1, 1), padding = "same", name = "conv3"),
        layers.Activation(activation = "relu"),
        layers.MaxPooling2D(pool_size = (2, 2), strides = (2, 2), padding = "valid", name = "pool3"),

        layers.Flatten(),
        layers.Dropout(rate = 0.5),

        layers.Dense(units = 512, name = "dense1"),      
        layers.Activation(activation = "relu"),
        layers.Dropout(rate = 0.5),

        layers.Dense(units = 256, name = "dense2"),
        layers.Activation(activation = "relu"),
        layers.Dropout(rate = 0.5),

        layers.Dense(units = num_classes, name = "output"),
        layers.Activation(activation = "softmax"), # Using Softmax activation since output is within 50 classes.
    ]
)

model.compile(optimizer = "adam", loss = "categorical_crossentropy", metrics = ["accuracy"])

print(model.summary())

### **Step 6: Training the model**

Using 8 epochs of 3000 train samples and 280 validation samples.

In [None]:
# Define the directory path within /kaggle/output/
checkpoint_directory = '/kaggle/working/checkpoint_1/' # /kaggle/output/checkpoint_1/

# Create the directory if it doesn't exist
if not os.path.exists(checkpoint_directory):
    os.makedirs(checkpoint_directory)  # Use os.makedirs() to create parent directories if needed

# Now you can save your data or model checkpoints to the checkpoint_directory
# For example, if you want to save a model checkpoint:
# model.save(os.path.join(checkpoint_directory, 'my_model.h5'))


In [None]:
# Training the model

epochs = 4 # 8 -> Accuracy :- 0.90
steps_per_epoch = 100 # 3000 # 3268
validation_steps = 20 # 280 # 842

# Saving every model using tensorflow.keras checkpoint
filepath = "/kaggle/working/checkpoint_1/check-{epoch:02d}-{val_loss:.4f}.hdf5" # Use .keras to remove warning 
checkpoint = ModelCheckpoint(filepath = filepath, verbose = 1, save_best_only = False)
callbacks_list = [checkpoint]

# fit_generator() is depricated therefore fit() will work
history_object = model.fit(train_generator, steps_per_epoch = steps_per_epoch, 
                           validation_data = validation_generator,
                           validation_steps = validation_steps, epochs = epochs,
                           verbose = 1, callbacks = callbacks_list)

In [None]:
# Saving the history object in excel file using pandas. 
# pip install openpyxl 

history_dict = {
    'epoch': list(range(1, len(history_object.history['loss']) + 1)),
    'loss': history_object.history['loss'],
    'accuracy': history_object.history['accuracy'],
    'val_loss': history_object.history['val_loss'],
    'val_accuracy': history_object.history['val_accuracy']
}
history_df = pd.DataFrame(history_dict)
excel_file_path = '/kaggle/working/training_history.xlsx'  # Replace with your desired file path
history_df.to_excel(excel_file_path, index=False)


## **Step 7: Saving the ML model**
Saving the model using HDF5 file extension.

In [None]:
# Saving the whole model (Architecture, weights and configurations)

model.save("/kaggle/working/HTR_model.hdf5")
print("Whole model is saved successfully")

In [None]:
# Loading the whole model (Architecture, weights and configurations)

loaded_model = load_model("/kaggle/working/HTR_model.hdf5")
print("Whole model loaded successfully")
print(loaded_model.summary())

## **Step 8: Testing the model**

In [None]:
# Testing the loaded model

# evaluate_generator() is deprecated use evaluate() method instead.
scores = loaded_model.evaluate_generator(test_generator , 280) # 842
print(scores)
print(f"Model loss:- {scores[0]}")
print(f"Model Accuracy:- {scores[1]}")

## **Step 9: Prediction**
Prediction is done on test files

In [None]:
# Doing some preprocessing on test files

images = []

for filename in test_files[:50]:
    im = Image.open(filename)
    cur_width = im.size[0]
    cur_height = im.size[1]

    # print(cur_width, cur_height)

    height_fac = 113 / cur_height

    new_width = int(cur_width * height_fac)
    size = new_width, 113
    
    imresize = im.resize((size), Image.LANCZOS) # Resizing so height = 113 while keeping aspect ratio
    now_width = imresize.size[0]
    now_height = imresize.size[1]
    # Generating crops of size 113 x 113 from this resized image and keep random 10% of crops

    # Total x start points are from 0 to width -113
    avail_x_points = list(range(0, now_width - 113))

    # Pick random x%
    factor = 0.1
    pick_num = int(len(avail_x_points) * factor)

    random_startx = sample(avail_x_points, pick_num)

    for start in random_startx:
        imcrop = imresize.crop((start, 0, start + 113, 113))
        images.append(np.asarray(imcrop))

    X_test = np.array(images)

    X_test = X_test.reshape(X_test.shape[0], 113, 113, 1)
    # Convet to float and normalize
    X_test = X_test.astype("float32")
    X_test /= 255
    shuffle(X_test)


print(X_test.shape)    

### Predictions

In [None]:
predictions = loaded_model.predict(X_test, verbose = 1)

print(predictions)
print(predictions.shape)

predicted_writer = []

for pred in predictions:
    predicted_writer.append(np.argmax(pred))

# print(predicted_writer)
print(len(predicted_writer))

## **Step 10: Plotting the loss and accuracy plots**

Loss vs Epochs line Plot
It shows decrease in loss over epochs.

Accuracy vs Epochs line plot
It shows increase in accuracy over epochs.

In [None]:
# Extract data from the history object
acc = history_object.history["accuracy"]
val_acc = history_object.history["val_accuracy"]
loss = history_object.history["loss"]
val_loss = history_object.history["val_loss"]
epochs = range(1, len(loss) + 1)

# Plot training and validation accuracy
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.plot(epochs, acc, "b", label="Training Accuracy")
plt.plot(epochs, val_acc, "r", label="Validation Accuracy")
plt.title("Training and Validation Accuracy")
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.legend()

# Plot training and validation loss
plt.subplot(1, 2, 2)
plt.plot(epochs, loss, "b", label="Training Loss")
plt.plot(epochs, val_loss, "r", label="Validation Loss")
plt.title("Training and Validation Loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()

# Save the plot in /kaggle/output/
plot_filename = '/kaggle/working/Accuracy_Loss_Plot.png'  # Specify the desired file name and extension
plt.savefig(plot_filename)

plt.tight_layout()
plt.show()

In [None]:
# Downloading all data of /kaggle/working/
# Define the directory you want to zip and download
directory_to_zip = '/kaggle/working/'

# Specify the name for the zip file
zip_file_name = 'working_directory'

# Create a zip archive of the directory using the 'zip' shell command
!zip -r {zip_file_name}.zip {directory_to_zip}

# Display the download link for the zip file
from IPython.display import FileLink
FileLink(f'{zip_file_name}.zip')
