
# Task: Word Recognition

Goal of this project is to train a deep CNN model that can help in recognising a word, i.e. given a word-image(image with a word present in it) as input, the model yields a representation that can help in recognising the word from a set of possible words (termed as  'lexicon' of words). 

# Method 1
Using Alpha representation

# ALPHA REPRESENTATION




 **Alpha representation**: This is based on the claim that a word can be represented in terms of occurences of characters in various segments of image.

The word is split into equal parts at various levels.

At level *i*:  
* A word is split into *i* (nearly) equal segments.  
* For every segment, we compute a binary vector in which each segment correspond to alphabets/characters (Shown in fig.).

![Alpha Vector](https://drive.google.com/uc?export=view&id=17rUvYXvWUc2IP8aD-O3kualSjr2dcn2b)

*  Individual vectors of each segment are concatenated after one another, i.e. the level vector is obtained by concatenating individual vectors of first segment followed by second, third and so on.

The final vector is obtained by concatenating vectors of all levels $\{L_i.L_{i+1}.L_{i+2}\cdots\}$.


*For this Project, i am using levels 2-5. 
This makes the length of final Alpha vector to be (2+3+4+5) * 26 = 364*

**Note**: For both representations, while splitting, in case of unequal lengths of segments, segments at the end should be of more length e.g. Level 3 split of "omega" = {o,me,ga} and "play" = {p,l,ay}. Also, for a smaller words like "ok" level 3 split = {$\epsilon$,o,k} where $\epsilon$ = empty string. 

In [1]:
# Importe the necessary libraries

import tensorflow as tf
import numpy as np
import pandas as pd
import seaborn as sns
import cv2
import matplotlib.pyplot as plt
import os
import shutil
import random
import pandas

In [2]:
IMG_HEIGHT = 50
IMG_WIDTH = 250

# Dataset

The dataset used here is a synthetic word recognition dataset. It consists of images of lowercase English words, generated with handwritten-fonts. All images are single channel (grayscale) and have size 250 * 50. <br>

The dataset has the following directory structure:

<pre>
<b>WR-Dataset</b>
|__ <b>train</b>: [foo_1.png, bar_2.png, sample_3.png ....]
|__ <b>validation</b>: [foo_221.png, bar_322.png, sample_353.png ....]
|__ <b>test</b>: [bar_521.png, foo_272.png, example_433.png ....]
|__ <b>Alphabet.csv</b>
    
</pre>

Number of images in train,test and validate folder is 2052, 400 and 108 respectivly .

In [3]:
#Add the paths for train, validation and test directories

train_dir_path="C:/Users/hp/Desktop/PROJECT/Word_Recognition/WR-Dataset/Train"
validation_dir_path="C:/Users/hp/Desktop/PROJECT/Word_Recognition/WR-Dataset/Validation"
test_dir_path="C:/Users/hp/Desktop/PROJECT/Word_Recognition/WR-Dataset/Test"

# Visualizing sample images

Prepare an image to label map and visualizing 5 randomly chosen images from training, validation and test sets (along with their labels). Also, Mention the number of word classes present in each of the three sets.

In [4]:
def get_classes_count(df):
    l = list(df['label'])
    map = {}
    for i in l:
        map[i] = 0
    return len(map)

In [5]:
def visualize_image(img):
    # Insert your code here to visualize a given image
    %pylab inline
    import matplotlib.pyplot as plt
    import matplotlib.image as mpimg
    img = mpimg.imread(img)
    imgplot = plt.imshow(img)
    plt.show()


In [6]:
def get_dataframe(folder_name):
    import glob
    images_fullpath = glob.glob(folder_name + '/*')
    images = []
    for image in images_fullpath:
        images.append(os.path.basename(image))
    labels = []
    for image in images:
        labels.append(image.split("_")[0])
    data = np.column_stack((images,labels))
    df = pandas.DataFrame(data = data, columns = ['Image','label']) 
    return df

In [None]:
#code to build a dataframe with Images and their corresponding labels for 3 folders.
test_df = get_dataframe(test_dir_path)
train_df = get_dataframe(train_dir_path)
validation_df = get_dataframe(validation_dir_path)

# display the dataframes
display(test_df)
display(train_df)
display(validation_df)

# Visualise images from the train set
print("Number of classes in Train set : ", get_classes_count(train_df))
print("Images from Train Dataset")
for i in range(5):
    visualize_image(train_dir_path + '/'+ random.choice(train_df['Image']))
# Visualise images from the validation set
print("Number of classes in Validation set : ", get_classes_count(validation_df))
print("Images from Validation Dataset")
for i in range(5):
    visualize_image(validation_dir_path + '/'+ random.choice(validation_df['Image']))

# Visualise images from the test set
print("Number of classes in Test set : ", get_classes_count(test_df))
print("Images from Test Dataset")
for i in range(5):
    visualize_image(test_dir_path + '/'+ random.choice(test_df['Image']))


# modules that can give  vector representations for the input words.

In [9]:
def get_Vector(levels,word):
    l = []
    left = len(word)
    pos = 0
    while left:
        temp = left // levels
        l.append(word[pos:pos+temp])
        pos += temp
        left -= temp
        levels -= 1
    return l

In [10]:
def get_Alpha_vector(word):
  #Insert the code for a function that returns Alpha representation of the input word 
    l = []
    for i in range(2,6):
        temp = get_Vector(i,word)
        l2 = []
        for str in temp:
            l3 = [0 for i in range(26)]
            for c in str:
                l3[ord(c)-ord('a')] = 1
            l2.extend(l3)
        l.extend(l2)
    return l  

Let's test our Alpha Vector Representation

In [11]:
name = "akash"
print("Alphavector:",get_Alpha_vector(name))

Alphavector: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

# Building Alpha model architecture 

Following is the architecture of the model that will learn Alpha representation:

Input shape: 250 * 50 ( RGB )


* 2 Convolution layers with 64  filters
* A Max Pool layer with pool size 2 * 2   
* 2 Convolution layers with 128  filters
* A Max Pool layer with pool size 2 * 2    
* 6 Convolution layers with 256  filters
* 3 Convolution layers with 512  filters
* GLobal Average Pooling layer
* Dense  layer with 4096 units
* Dropout layer with rate 0.5
* Dense  layer with 4096 units
* Dropout layer with rate 0.5
* Dense  layer with 364 units (Output)

For all convolution layers,i kept kernel size as 3 * 3, use ReLu activation 

For all max pool layers, kept stride as 2

For all dense layers, except the final layer used ReLu activation.

For final layer, used sigmoid activation.

**Loss function**: Binary cross-entropy

**Similarity Metric**: Cosine Similarity

In [None]:
def Alpha_model(learning_rate=1e-4):
    # Instantiate Sequential model
    model = tf.keras.models.Sequential()
    s = 2
    # Add Layers
    model.add(tf.keras.layers.Conv2D(strides = s,filters = 64, kernel_size = (3,3), activation='relu' ,input_shape=(IMG_HEIGHT,IMG_WIDTH, 3), padding = 'same'))
    model.add(tf.keras.layers.Conv2D(strides = s,filters = 64, kernel_size = (3,3), activation='relu', padding = 'same'))
    model.add(tf.keras.layers.MaxPooling2D(pool_size = (2,2), strides = 2))
    model.add(tf.keras.layers.Conv2D(strides = s,filters = 128, kernel_size = (3,3), activation='relu', padding = 'same'))
    model.add(tf.keras.layers.Conv2D(strides = s,filters = 128, kernel_size = (3,3), activation='relu', padding = 'same'))
    model.add(tf.keras.layers.MaxPooling2D(pool_size = (2,2), strides = 2))
    for i in range(6):
        model.add(tf.keras.layers.Conv2D(strides = s,filters = 256, kernel_size = (3,3), activation='relu', padding = 'same'))
    for i in range(3):
        model.add(tf.keras.layers.Conv2D(strides = s,filters = 512, kernel_size = (3,3), activation='relu', padding = 'same'))
    model.add(tf.keras.layers.GlobalAveragePooling2D())
    model.add(tf.keras.layers.Dense(4096, activation='relu'))
    model.add(tf.keras.layers.Dropout(0.5))
    model.add(tf.keras.layers.Dense(4096, activation='relu'))
    model.add(tf.keras.layers.Dropout(0.5))
    model.add(tf.keras.layers.Dense(364, activation='sigmoid'))

    # Define optimizers(Adam Optimizers), loss function and similarity metrics
    opt = tf.keras.optimizers.Adam(learning_rate=learning_rate)
    loss = tf.keras.losses.BinaryCrossentropy()        
    m = tf.keras.metrics.CosineSimilarity()

    # Compile the model
    model.compile(optimizer=opt, loss=loss, metrics=m)
    
    # return model
    return model
    

we are using sigmoid as activation for final layer in Alpha model because output of the Alpha model is a binary vector so sigmoid is the best available activation function .
we are using Binary Cross Entropy as loss function for final layer because  Binary Cross Entropy gives a binary output.

# Seen and Unseen Words
----
Words whose images hase been seen by the model during training are termed as seen words, while those which are part of the test set but not seen during training are called unseen words.



In [None]:
#code to print seen word classes 
dict_seen = {}
for label in train_df['label']:
    dict_seen[label] = 1
print("Seen words : " , list(dict_seen.keys()))

#Insert code to print unseen word classes from test set
dict_unseen = {}
for label in test_df['label']:
    dict_unseen[label] = 1
print("\nUnseen words : ", list(dict_unseen.keys()))

In [18]:
NUM_EPOCHS=10
BATCH_SIZE=8

# Tuning Hyperparameters for Alpha Model
We will now tune the *learning rate* for the Alpha model. 

For that, we first load the train and validation data (images and their labels, i.e. Alpha vectors) 


In [19]:
#code for loading train and validation set images and their corresponding labels 
 
x_train, y_train = [], []
for i,row in train_df.iterrows():
    image = tf.keras.preprocessing.image.load_img(train_dir_path + '/' + row['Image'], target_size = (IMG_HEIGHT, IMG_WIDTH))
    x_train.append(tf.keras.preprocessing.image.img_to_array(image))
    y_train.append(get_Alpha_vector(row['label']))
x_train = np.asarray(x_train)
y_train = np.asarray(y_train)

x_validate, y_validate = [], []
for i,row in validation_df.iterrows():
    image = tf.keras.preprocessing.image.load_img(validation_dir_path + '/' + row['Image'], target_size = (IMG_HEIGHT, IMG_WIDTH))
    x_validate.append(tf.keras.preprocessing.image.img_to_array(image))
    y_validate.append(get_Alpha_vector(row['label']))
x_validate = np.asarray(x_validate) 
y_validate = np.asarray(y_validate)



NameError: name 'train_df' is not defined

## Now find the best LR for the Alpha model.

In [None]:
def determine_Alpha_lr():
    learning_rates = [1e-3,1e-4,1e-5]
    avg_val_similarity = []
    

    for l_rate in learning_rates:
        
        # code to build a model with the current learning rate
        model = Alpha_model(l_rate)
        
        # code to train the model using the training set and validate using the validation set
        hist = model.fit(
          x=x_train,y = y_train,epochs = NUM_EPOCHS ,batch_size = BATCH_SIZE,
          validation_data=(x_validate, y_validate)).history
        # code to find the average validation similarity for this model setting and append it to the maintained list
        temp = np.mean(hist['val_cosine_similarity'])
        avg_val_similarity.append(temp)

    # code to figure out the learning rate which gives the highest average validation similarity. 
    pos = np.argmax(avg_val_similarity)
    print("Learning Rate which gives highest validation accuracy : ", learning_rates[pos])
    return learning_rates[pos]

# determine_best_learning_rate() is being called here
best_Alpha_lr = determine_Alpha_lr()

# Model building and training using callbacks
---

Now we will build and summarize the Alpha model as per the best learning rate value determined earlier. 


In [None]:
#code for building model using the best LR for Alpha model determined
model_a = Alpha_model(best_Alpha_lr)


Now instantiate the four callbacks for Alpha model.

In [None]:
# EarlyStopping after validation loss has not improved for 5 epochs 
earlystop_callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience = 5)

# ReduceLROnPlateau reducing LR by half when validation loss has not improved for 3 epochs. 
reduce_callback = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3)

# CSVlogger for keeping logs in filename of our choice
csv_logger = tf.keras.callbacks.CSVLogger('C:/Users/Ravi/Desktop/ML assignment 3/WR-Dataset/training2.csv')

# ModelCheckpoint that saves the best weights of model after every 10 epochs
checkpoint_filepath_alpha = 'C:/Users/Ravi/Desktop/ML assignment 3/WR-Dataset/checkpoints/cp1.ckpt'
model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_filepath_alpha, save_best_only=True)

callbacks = [earlystop_callback, reduce_callback, csv_logger, model_checkpoint_callback]

# Now we will train the model with training data using these callbacks.

In [None]:
# code to train with callbacks
x_train, y_train = [], []
for i,row in train_df.iterrows():
    image = tf.keras.preprocessing.image.load_img(train_dir_path + '/' + row['Image'], target_size = (IMG_HEIGHT, IMG_WIDTH,3))
    x_train.append(tf.keras.preprocessing.image.img_to_array(image))
    y_train.append(get_Alpha_vector(row['label']))
x_train = np.asarray(x_train)
y_train = np.asarray(y_train)

# code for loading train and validation set images and their corresponding labels
x_validate, y_validate = [], []
for i,row in validation_df.iterrows():
    image = tf.keras.preprocessing.image.load_img(validation_dir_path + '/' + row['Image'], target_size = (IMG_HEIGHT, IMG_WIDTH,3))
    x_validate.append(tf.keras.preprocessing.image.img_to_array(image))
    y_validate.append(get_Alpha_vector(row['label']))
x_validate = np.asarray(x_validate) 
y_validate = np.asarray(y_validate)

hist = model_a.fit(
          x=x_train,y = y_train,epochs = 10,batch_size = BATCH_SIZE,
          validation_data=(x_validate, y_validate), callbacks = callbacks).history

# Insert your code here to obtain the lists: epochs, training similarity, validation similarity, training loss, validation loss from CSV log file (1 point)
log_file = pandas.read_csv("C:/Users/Ravi/Desktop/ML assignment 3/WR-Dataset/training2.csv")
train_similarity = list(log_file['cosine_similarity'])
valid_similarity = list(log_file['val_cosine_similarity'])
train_loss = list(log_file['loss'])
valid_loss = list(log_file['val_loss'])
epochs = [i for i in range(1,11)]

# Insert your code here to plot Epochs Vs. training and validation accuracy (2 points)
fig, ax = plt.subplots(nrows=2, figsize = (20,20))
data1 = list(zip(epochs,train_similarity, valid_similarity))
data1 = pandas.DataFrame(data = data1, columns = ['epochs', 'Training Accuracy', 'Validation Accuracy'])
data1 = pandas.melt(data1, id_vars = "epochs")
sns.barplot(x="epochs", y="value", data=data1, ax = ax[0], hue = 'variable')

# Insert your code here to plot Epochs Vs. training and validation loss (2 points)
data2 = list(zip(epochs, train_loss, valid_loss))
data2 = pandas.DataFrame(data = data2, columns = ['epochs', 'Training Loss', 'Validation Loss'])
data2 = pandas.melt(data2, id_vars = "epochs")
sns.barplot(x="epochs", y="value", data=data2, ax = ax[1], hue = 'variable')


## Steps for Word recognition:

First, prepare a list having all the words from test set mapped to their corresponding vectors (lexicon for Alpha representations).

---


In [None]:
word,a_vector = [], []  # y_test in label for Alpha model
for i,row in test_df.iterrows():
    word.append(row['label'])
    a_vector.append(get_Alpha_vector(row['label']))
alpha_map = zip(word, a_vector)


For every image in the test set we will be doing following step:
1. Predict the output vector representation from the trained model(s) when the image is given as input.

2. Find the word class(from lexicon) for which the similarity of its vector representation will be highest with the output vector.

3. If predicted word = true word, then it is a correct prediction, otherwise incorrect prediction.

---

Let us now perform recognition using trained Alph amodel on the test set. 

First, load the test images and their vector representations.


In [None]:
# code to load test images and its vector labels (1 points)
x_test, y_test = [], [] 
for i,row in test_df.iterrows():
    image = tf.keras.preprocessing.image.load_img(test_dir_path + '/' + row['Image'], target_size = (IMG_HEIGHT, IMG_WIDTH))
    x_test.append(tf.keras.preprocessing.image.img_to_array(image))
    y_test.append(get_Alpha_vector(row['label']))
x_test = np.asarray(x_test)
y_test = np.asarray(y_test)


Now we will load the saved trained Alpha model from the file and predict the labels


In [None]:
# code for loading the saved model from file 
alpha_saved_model = tf.keras.models.load_model(checkpoint_filepath_alpha)

In [None]:
from scipy import spatial
def get_label(result):
    similarity = []
    for label in a_vector:
        similarity.append(1 - spatial.distance.cosine(label, result))
    pos = np.argmax(similarity)
    return word[pos]

In [None]:
# Insert code for predicting word labels of the test set images 
output = alpha_saved_model.predict(x = x_test)
output_labels = [get_label(res) for res in output]
print(output_labels)


Now Let us evaluate the performance of the model. The effective accuracy of model is defined as harmonic mean(HM) of accuracy with seen class images and accuracy with unseen class images.



In [None]:
# code to compute accuracy of images that belong to seen classes 
acc2 = alpha_saved_model.evaluate(x_train, y_train)
print(acc2[1])
# Insert code to compute accuracy of images that belong to unseen classes
acc1 = alpha_saved_model.evaluate(x_test,y_test)
print(acc1[1])
# Insert code to compute effective accuracy
import statistics
final_ac = statistics.harmonic_mean([acc2[1],acc1[1]])
print("Harmonic accuracy : ",final_ac)