# Multi Digit Recognition

This notebook shown the a simply model in keras to recognize a digit sequence in a real world image. This images data is taken from the Street View House Number Dataset. This model is divided into two part.**Preprocessing** notebook consist of converting the images in the dataset to 32x32 greyscale images array and save it in the h5 file.**Multi Digit Recognition** notebook consists of CNN model to predict the multi digit number in the images. 

Lets import the main packages

In [1]:
import h5py
import matplotlib.pyplot as plt
import tensorflow as tf
import seaborn as sns
from PIL import Image
import numpy as np
import time
import os
from keras import backend as K
from keras.models import Model
from keras.layers import Input,Lambda,Dense,Dropout,Activation,Flatten,Conv2D,MaxPooling2D

K.clear_session()

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


Extract the data from the h5 file created in the preprocessing notebook

In [2]:
h5f = h5py.File('data/svhn_multi_grey.h5','r')

# Extract the datasets
x_train = h5f['train_dataset'][:]
y_train = h5f['train_labels'][:]
x_val = h5f['valid_dataset'][:]
y_val = h5f['valid_labels'][:]
x_test = h5f['test_dataset'][:]
y_test = h5f['test_labels'][:]

# Close the file
h5f.close()

print('Training set', x_train.shape, y_train.shape)
print('Validation set', x_val.shape, y_val.shape)
print('Test set     ', x_test.shape, y_test.shape)

Training set (230754, 32, 32, 1) (230754, 5)
Validation set (5000, 32, 32, 1) (5000, 5)
Test set      (13068, 32, 32, 1) (13068, 5)


I merge the validation set into the training set and shuffling

In [3]:
X_train = np.concatenate([x_train, x_val])
Y_train = np.concatenate([y_train, y_val])

from sklearn.utils import shuffle

# Randomly shuffle the training data

X_train, Y_train = shuffle(X_train, Y_train)

Normalizing the data is done for getting the better results and reduce the time to train

In [4]:
def subtract_mean(a):
    """ Helper function for subtracting the mean of every image
    """
    for i in range(a.shape[0]):
        a[i] -= a[i].mean()
    return a


# Subtract the mean from every image
X_train = subtract_mean(X_train)
X_test = subtract_mean(x_test)

Creating a Helper function to convert the number into one hot encoding for each digit and combining the into one array of length 55


In [5]:
#preparing the y data
def y_data_transform(y):
    y_new=np.zeros((y.shape[0],y.shape[1]*11),dtype="int")
    for (i,j),l in np.ndenumerate(y):
        y_new[i,j*11+l]=1
    return y_new
Y_Train=y_data_transform(Y_train)
Y_test=y_data_transform(y_test)

This is the model created using keras input model. The following model summary is the main model for the recognition the number

In [6]:
input_data=Input(name="input",shape=(32,32,1),dtype='float32')
conv1=Conv2D(32,5,padding="same",activation="relu")(input_data)
conv2=Conv2D(32,5,padding="same",activation="relu")(conv1)
max1=MaxPooling2D(pool_size=(2, 2),padding="same")(conv2)
drop1=Dropout(0.75)(max1)

conv3=Conv2D(64,5,padding="same",activation="relu")(drop1)
conv4=Conv2D(64,5,padding="same",activation="relu")(conv3)
max2=MaxPooling2D(pool_size=(2, 2),padding="same")(conv4)
drop2=Dropout(0.75)(max2)

conv5=Conv2D(128,5,padding="same",activation="relu")(drop2)
conv6=Conv2D(128,5,padding="same",activation="relu")(conv5)
conv7=Conv2D(128,5,padding="same",activation="relu")(conv6)
flat=Flatten()(conv7)

fc1=Dense(256,activation="relu")(flat)
drop3=Dropout(0.5)(fc1)
fc2=Dense(253,activation="relu")(drop3)
output=Dense(55,activation="sigmoid")(fc2)

model1=Model(inputs=input_data, outputs=output)
model1.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input (InputLayer)           (None, 32, 32, 1)         0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 32, 32, 32)        832       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 32, 32, 32)        25632     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 16, 16, 32)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 16, 16, 32)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 16, 16, 64)        51264     
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 16, 16, 64)        102464    
__________

**Custom Loss Function**
  This is the custom loss function created to compare the y_predicted to y actual 

In [7]:
_EPSILON=1e-7
def _loss_tensor(y_true, y_pred):
    y_pred = K.clip(y_pred, _EPSILON, 1.0-_EPSILON)
    out = -(y_true * K.log(y_pred) + (1.0 - y_true) * K.log(1.0 - y_pred))
    return K.mean(out, axis=-1)
def loss_func(y):
    y_pred,y_true=y
    loss=_loss_tensor(y_true,y_pred)
    return loss

A Lambda layer with the loss function with the Y_true value to caluculating loss and the output of this layer is the loss value

In [8]:
from keras.callbacks import TensorBoard
y_true = Input(name='y_true', shape=[55], dtype='float32')

loss_out = Lambda(loss_func, output_shape=(1,), name='loss')([output, y_true])

model = Model(inputs=[input_data,y_true], outputs=loss_out)

model.add_loss(K.sum(loss_out,axis=None))

By adding the loss function to the last layer, loss function is kept to none in the compiler so that the value from the  layer is to tend to zero

In [9]:
tensor_board = TensorBoard(log_dir='./Graph', histogram_freq=0, write_graph=True, write_images=True)

model.compile(loss=None, optimizer="adam", loss_weights=None)

model.fit(x=[X_train,Y_Train],y=None, batch_size=1000, epochs=25, verbose=1,callbacks=[tensor_board])

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<keras.callbacks.History at 0x164fc742ef0>

Loss value is seem big because of the custom function created and accuracy caluculated below shows the accuracy in detecting rigth digits 

In [10]:
Accuracy=(1-np.mean(model.predict([X_test[:],Y_test[:]])))*100
print(Accuracy)

98.23456481099129


In [11]:
model.save("MDR_model.h5")
model.save_weights("MDR_model_weights.h5")

This helper function will convert the logits of 55 into number.

In [12]:
def convert_to_num(x):
    num=""
    if len(x)==55:
        for i in range(5):
            c=np.argmax(x[i*11:(i+1)*11])
            if c!=10:
                num+=str(c)
        return num
    else:
        print("This function might not be used that way")

Even thought the accuracy for each digit is high, the accuracy for predicting the full number is lowered.

In [13]:
X1=model1.predict(X_test)
Y1=Y_test
j=0
for i in range(len(X_test)):
    try:
        
        if eval(convert_to_num(X1[i]))!=eval(convert_to_num(Y1[i])):
            j+=1
            #print(i,[convert_to_num(X1[i]),convert_to_num(Y1[i])])
    except:
        j+=1
print("total error",j," out of ",len(X1),"and total accuracy",(1-(j/len(X1)))*100)
                                     

total error 1561  out of  13068 and total accuracy 88.0547903275176
