In [None]:
import tensorflow as tf
from tensorflow.keras import layers

These two lines import TensorFlow and the layers module from Keras, a high-level API for building and training deep learning models.

In [None]:
# Load the dataset
train_data = tf.keras.preprocessing.image_dataset_from_directory(
    'dataset/Train',
    labels='inferred',
    label_mode='categorical',
    batch_size=32,
    image_size=(128, 128),
    color_mode='grayscale')

train_data: A variable that stores the dataset returned by the function.

tf.keras.preprocessing.image_dataset_from_directory(): A function from TensorFlow's Keras API that loads the dataset from the directory of image files and converts it into a tf.data.Dataset object that can be used for training the machine learning model.

'dataset/Train': The directory path from which to load the images for the training dataset.

labels='inferred': This parameter specifies how the labels for each image should be determined. In this case, it is set to "inferred" so that the subdirectory names in the "Train" directory are used as the labels for each image.

label_mode='categorical': This parameter specifies how the labels should be encoded. In this case, it is set to "categorical" so that the labels are one-hot encoded.

batch_size=32: The number of samples in each batch of the dataset.

image_size=(128, 128): The size of each image in the dataset after being resized.

color_mode='grayscale': The color mode of the images. In this case, it is set to "grayscale" which means each image will be converted to grayscale before being used.

Overall, this code loads the dataset from the "Train" directory, converts it into a tf.data.Dataset object, and preprocesses the data so that it is ready for use in training the OCR model.




Regenerate response


In [None]:
val_data = tf.keras.preprocessing.image_dataset_from_directory(
    'dataset/Validation',
    labels='inferred',
    label_mode='categorical',
    batch_size=32,
    image_size=(128, 128),
    color_mode='grayscale')

val_data: A variable that stores the dataset returned by the function.

tf.keras.preprocessing.image_dataset_from_directory(): A function from TensorFlow's Keras API that loads the dataset from the directory of image files and converts it into a tf.data.Dataset object that can be used for evaluating the machine learning model.

'dataset/Validation': The directory path from which to load the images for the validation dataset.

labels='inferred': This parameter specifies how the labels for each image should be determined. In this case, it is set to "inferred" so that the subdirectory names in the "Validation" directory are used as the labels for each image.

label_mode='categorical': This parameter specifies how the labels should be encoded. In this case, it is set to "categorical" so that the labels are one-hot encoded.

batch_size=32: The number of samples in each batch of the dataset.

image_size=(128, 128): The size of each image in the dataset after being resized.

color_mode='grayscale': The color mode of the images. In this case, it is set to "grayscale" which means each image will be converted to grayscale before being used.

Overall, this code loads the validation dataset from the "Validation" directory, converts it into a tf.data.Dataset object, and preprocesses the data so that it is ready for use in evaluating the OCR model.






In [None]:
# Define the OCR model architecture
model = tf.keras.Sequential([
    layers.experimental.preprocessing.Rescaling(1./255, input_shape=(128, 128, 1)),
    layers.Conv2D(16, 3, activation='relu'),
    layers.MaxPooling2D(),
    layers.Conv2D(32, 3, activation='relu'),
    layers.MaxPooling2D(),
    layers.Conv2D(64, 3, activation='relu'),
    layers.MaxPooling2D(),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dense(39, activation='softmax')
])

model: A variable that stores the OCR model architecture.

tf.keras.Sequential(): A function from TensorFlow's Keras API that allows for creating a sequential model by passing a list of layers as an argument.

layers.experimental.preprocessing.Rescaling(1./255, input_shape=(128, 128, 1)): This line creates a layer that rescales the pixel values of the image by dividing each pixel value by 255.0, which normalizes the pixel values to be between 0 and 1. This layer also sets the input shape of the image to be (128, 128, 1), which indicates that each input image will be a grayscale image with a height and width of 128 pixels.

layers.Conv2D(16, 3, activation='relu'): This line creates a convolutional layer with 16 filters and a filter size of 3x3. The activation parameter is set to "relu", which means that the output of this layer will be passed through a rectified linear unit (ReLU) activation function.

layers.MaxPooling2D(): This line creates a max pooling layer that reduces the spatial dimensions of the output of the previous convolutional layer by taking the maximum value within each pooling window. This helps to reduce the dimensionality of the data and extract the most important features.

The next two lines (layers.Conv2D(32, 3, activation='relu') and layers.MaxPooling2D()) are similar to the previous two lines, but they create a convolutional layer with 32 filters and a max pooling layer, respectively.

The next two lines (layers.Conv2D(64, 3, activation='relu') and layers.MaxPooling2D()) are again similar to the previous two lines, but they create a convolutional layer with 64 filters and a max pooling layer, respectively.

layers.Flatten(): This layer flattens the output of the previous max pooling layer, which is a 2D matrix, into a 1D vector. This prepares the data to be passed through a fully connected neural network.

layers.Dense(128, activation='relu'): This line creates a fully connected layer with 128 neurons and a ReLU activation function.

layers.Dense(39, activation='softmax'): This line creates another fully connected layer with 39 neurons, which corresponds to the number of classes in the dataset. The activation parameter is set to "softmax", which normalizes the output of the layer so that each output value represents the probability that the input belongs to a particular class.

Overall, this code defines the OCR model architecture using a combination of convolutional and fully connected layers, which allows the model to extract important features from the input images and make accurate predictions about the characters in the images.

RESULTS==>
Found 834036 files belonging to 39 classes.
Found 22524 files belonging to 39 classes.
26064/26064 [==============================] - 4986s 191ms/step - loss: 0.2349 - accuracy: 0.9244 - val_loss: 0.1869 - val_accuracy: 0.9415







In [None]:
# Compile the OCR model
model.compile(
    optimizer='adam',
    loss=tf.losses.CategoricalCrossentropy(),
    metrics=['accuracy'])

model.compile: This method configures the model for training. It takes the optimizer, loss function, and metrics as arguments.

optimizer='adam': This sets the optimizer to the Adam optimizer, which is a popular gradient descent optimization algorithm.

loss=tf.losses.CategoricalCrossentropy(): This sets the loss function to categorical cross-entropy. Categorical cross-entropy is commonly used in multi-class classification problems and measures the difference between the predicted and true class labels.

metrics=['accuracy']: This sets the evaluation metric to accuracy. The accuracy metric measures the percentage of correctly classified samples out of all samples.

In [None]:
# Train the OCR model
history = model.fit(
    train_data,
    validation_data=val_data,
    epochs=1,
    batch_size=32)

history = model.fit: This method trains the model using the training dataset and validates it using the validation dataset. It returns a History object that contains the training history, which can be used to analyze the model's performance during training.

train_data: This is the training dataset object that was created earlier using the image_dataset_from_directory method.

validation_data=val_data: This is the validation dataset object that was created earlier using the image_dataset_from_directory method.

epochs=1: This sets the number of epochs or passes through the training dataset that the model will undergo during training. An epoch is a complete iteration over the entire training dataset.

batch_size=32: This sets the batch size, which is the number of samples that will be propagated through the network at once during training. In this case, 32 samples will be processed at once.

The model.fit method trains the model for one epoch, i.e., it goes through the entire training dataset once, and updates the model's parameters based on the error between the predicted and true labels. At the end of each epoch, the model's performance is evaluated on the validation dataset. This process is repeated for the specified number of epochs.







In [None]:
# Save the OCR model
model.save('ocr_model2.h5')


This code block saves the trained OCR model to a file named ocr_model2.h5.

model.save: This method saves the trained model to a file. The file format used here is the HDF5 format, which is a data model, library, and file format for storing and managing large and complex data.

'ocr_model2.h5': This is the name of the file to which the model will be saved. The .h5 extension is added to indicate that the file is in the HDF5 format.

By saving the model to a file, it can be reused later without the need to retrain it. This is useful when the model is required to be deployed or used for prediction on new data.

In [1]:
import cv2
import numpy as np
import tensorflow as tf

# Define the label map
label_map = {}
label_map[0]='#'
label_map[1]='$'
label_map[2]='&'
label_map[3]='@'
for i in range(4, 14):
    label_map[i] = str(i - 4)
for i, c in enumerate('ABCDEFGHIJKLMNPQRSTUVWXYZ'):
    label_map[14+i] = c

# Load the OCR model
model = tf.keras.models.load_model('ocr_model2.h5')

# Load the sample image and preprocess it
img = cv2.imread('A.jpg', cv2.IMREAD_GRAYSCALE)
img = cv2.resize(img, (128, 128))
img = np.expand_dims(img, axis=-1)
img = np.expand_dims(img, axis=0)

# Predict the output label for the sample image
pred = model.predict(img)
pred_label = np.argmax(pred)

# Print the predicted label
print(f"Predicted label:",label_map[pred_label])


Predicted label: A


np.expand_dims(img, axis=-1): This function adds an extra dimension at the end of the image array, which represents the channel dimension. For grayscale images, this dimension will have a value of 1, indicating that the image has only one channel. The axis=-1 argument specifies that the new dimension should be added at the end of the array.

np.expand_dims(img, axis=0): This function adds an extra dimension at the beginning of the image array, which represents the batch dimension. The batch dimension specifies how many images are being fed into the model at once. In this case, we are preprocessing a single image, so the batch dimension will have a size of 1. The axis=0 argument specifies that the new dimension should be added at the beginning of the array.

By adding these dimensions to the image array, we are conforming to the input shape of the OCR model, which expects a batch of images with shape (batch_size, height, width, channels). In this case, the batch size is 1, the height and width of the image are 128 pixels each, and there is only 1 channel since the image is grayscale. The resulting shape of the preprocessed image will be (1, 128, 128, 1).

