# Project -Neural Network
Street View Housing Number Digit Recognition

## The Problem Description:
Recognizing multi-digit numbers in photographs captured at street level is an important component of modernday map making. A classic example of a corpus of such street-level photographs is Google’s Street View imagery comprised of hundreds of millions of geo-located 360-degree panoramic images. The ability to automatically transcribe an address number from a geo-located patch of pixels and associate the transcribed number with a known street address helps pinpoint, with a high degree of accuracy, the location of the building it represents. More broadly, recognizing numbers in photographs is a problem of interest to the optical character recognition community. While OCR on constrained domains like document processing is well studied, arbitrary multi-character text recognition in photographs is still highly challenging. This difficulty arises due to the wide variability in the visual appearance of text in the wild on account of a large range of fonts, colours, styles, orientations, and character arrangements. The recognition problem is further complicated by environmental factors such as lighting, shadows, specularities, and occlusions as well as by image acquisition factors such as resolution, motion, and focus blurs. In this project, we will use the dataset with images centred around a single digit (many of the images do contain some distractors at the sides). Although we are taking a sample of the data which is simpler, it is more complex than MNIST because of the distractors.

## Dataset
SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with the minimal requirement on data formatting but comes from a significantly harder, unsolved, real-world problem (recognizing digits and numbers in natural scene images). SVHN is obtained from house numbers in Google Street View images.
Code to load the dataset

## Acknowledgement
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, Andrew Y. Ng Reading Digits in Natural Images with Unsupervised Feature Learning NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011. PDF
http://ufldl.stanford.edu/housenumbers as the URL for this site when necessary

## Steps
The objective of the project is to learn how to implement a simple image classification pipeline based on a deep neural network and understand the basics of Image Classification
1. Read the data from the h5py file and understand the train/test splits (5 points)
2. Reshape and normalize the train and test features (10 points)
3. One hot encode the labels for train and test data (15 points)
4. Define the model architecture using TensorFlow with a flatten layer followed by dense layers with activation as ReLu and softmax (15 points)
5. Compile the model with loss as categorical cross-entropy and adam optimizers. Use accuracy as the metric for evaluation (10 points)
6. Fit and evaluate the model. Print the loss and accuracy for the test data (5 points)

In [41]:
%tensorflow_version 2.x
import tensorflow as tf
tf.__version__

import random, warnings
random.seed(0)
warnings.filterwarnings('ignore')

In [42]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline


# 1. Read the data from the h5py file and understand the train/test splits (5 points)


In [44]:
import h5py
from google.colab import drive
drive.mount('/content/gdrive/')
root_path = '/content/gdrive/MyDrive/Colab/Neural Networks Project/'

h5f = h5py.File(root_path + 'SVHN_single_grey1.h5', 'r')
h5f.keys()

Drive already mounted at /content/gdrive/; to attempt to forcibly remount, call drive.mount("/content/gdrive/", force_remount=True).


<KeysViewHDF5 ['X_test', 'X_train', 'X_val', 'y_test', 'y_train', 'y_val']>

In [46]:
X_train = h5f['X_train'][:]
y_train = h5f['y_train'][:]
X_test = h5f['X_test'][:]
y_test = h5f['y_test'][:]
X_val = h5f['X_val'][:]
y_val = h5f['y_val'][:]

In [47]:
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, X_val.shape, y_val.shape)

(42000, 32, 32) (42000,) (18000, 32, 32) (18000,) (60000, 32, 32) (60000,)


Observations:
1. X_train and X_test are split in 42000:18000 ratio
2. X_val has 60000 items

In [48]:
X_train[0]

array([[ 33.0704,  30.2601,  26.852 , ...,  71.4471,  58.2204,  42.9939],
       [ 25.2283,  25.5533,  29.9765, ..., 113.0209, 103.3639,  84.2949],
       [ 26.2775,  22.6137,  40.4763, ..., 113.3028, 121.775 , 115.4228],
       ...,
       [ 28.5502,  36.212 ,  45.0801, ...,  24.1359,  25.0927,  26.0603],
       [ 38.4352,  26.4733,  23.2717, ...,  28.1094,  29.4683,  30.0661],
       [ 50.2984,  26.0773,  24.0389, ...,  49.6682,  50.853 ,  53.0377]],
      dtype=float32)

In [49]:
X_test[100]

array([[ 75.0527,  74.6398,  72.526 , ..., 108.1588, 110.1586, 111.1585],
       [ 74.0528,  73.6399,  72.1131, ..., 108.1588, 110.1586, 111.1585],
       [ 72.053 ,  72.053 ,  72.8141, ..., 108.1588, 112.1584, 114.1582],
       ...,
       [ 67.7976,  67.7976,  68.2706, ..., 107.2451, 118.945 , 123.9445],
       [ 66.7977,  66.7977,  67.2707, ...,  99.9469, 112.0597, 118.0591],
       [ 66.7977,  66.7977,  67.2707, ...,  96.9472, 109.06  , 115.0594]],
      dtype=float32)

In [50]:
X_val[0]

array([[ 44.299 ,  45.9999,  51.3306, ...,  25.2764,  27.515 ,  27.156 ],
       [ 49.1351,  60.3081,  70.1222, ...,  23.7002,  25.2378,  24.2918],
       [ 60.7595,  83.7141, 102.1961, ...,  24.5044,  24.9712,  22.8512],
       ...,
       [ 67.1072,  93.2464, 109.2017, ...,  26.6444,  24.6015,  22.9607],
       [ 24.7569,  36.6417,  48.9071, ...,  21.9268,  21.5309,  21.5479],
       [ 22.6584,  22.7724,  27.2666, ...,  21.443 ,  20.8191,  20.0812]],
      dtype=float32)

In [51]:
X_train[0].shape

(32, 32)

In [52]:
y_val

array([0, 0, 0, ..., 9, 9, 9], dtype=uint8)

y_val seems to be a sorted target variable

In [53]:
y_test

array([1, 7, 2, ..., 7, 9, 2], dtype=uint8)

In [54]:
y_train

array([2, 6, 7, ..., 7, 0, 4], dtype=uint8)

In [55]:
import time
start_time = time.time()
subset_bool = np.isin(X_train, X_val)
print('Check if X_train cotains anything that is not part of X_val:', np.isin('False', subset_bool))
print('----- %s seconds -----' % (time.time() - start_time))

Check if X_train cotains anything that is not part of X_val: False
----- 18.16865849494934 seconds -----


In [56]:
import time
start_time = time.time()
subset_bool = np.isin(X_test, X_val)
print('Check if X_test cotains anything that is not part of X_val:', np.isin('False', subset_bool))
print('----- %s seconds -----' % (time.time() - start_time))

Check if X_test cotains anything that is not part of X_val: False
----- 11.155433893203735 seconds -----


### Conclusion:
- That means X_train and X_test are simply the subset of X_val
- Similarly y_train and y_test are subsets of y_val

# 2. Reshape and normalize the train and test features (10 points)


### Reshape 32x32 images into 1024 features flat array

In [57]:
print(X_train.shape, X_test.shape)

(42000, 32, 32) (18000, 32, 32)


In [58]:
X_train = np.reshape(X_train, (X_train.shape[0], 1024))
X_test = np.reshape(X_test, (X_test.shape[0], 1024))

In [59]:
print(X_train.shape, X_test.shape)

(42000, 1024) (18000, 1024)


In [60]:
X_val = np.reshape(X_val, (X_val.shape[0], 1024))
print(X_val.shape)

(60000, 1024)


### Standardize the train and test features

In [61]:
print('Min pixel value: %s, Max pixel value: %s' %(X_train.min(),X_train.max()))
print('Min pixel value: %s, Max pixel value: %s' %(X_test.min(),X_test.max()))

Min pixel value: 0.0, Max pixel value: 254.9745
Min pixel value: 0.0, Max pixel value: 254.9745


- To standardize the features, we will simply divide each pixel value by 255 which is the max value.

In [62]:
X_train /= 255.0
print('Min pixel value: %s, Max pixel value: %s' %(X_train.min(),X_train.max()))
X_test /= 255.0
print('Min pixel value: %s, Max pixel value: %s' %(X_test.min(),X_test.max()))

Min pixel value: 0.0, Max pixel value: 0.9999
Min pixel value: 0.0, Max pixel value: 0.9999


In [63]:
X_val /= 255.0


# 3. One hot encode the labels for train and test data (15 points)

In [64]:
from tensorflow.keras.utils import to_categorical
y_train = to_categorical(y_train, num_classes = 10)
y_test = to_categorical(y_test, num_classes = 10)

In [65]:
print(y_train.shape, y_test.shape)

(42000, 10) (18000, 10)


In [66]:
y_train[100]

array([1., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)

In [67]:
y_val = to_categorical(y_val, num_classes = 10)

# 4. Define the model architecture using TensorFlow 
- with a flatten layer followed by dense layers with activation as ReLu and softmax (15 points)

In [79]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import InputLayer, Dense, Flatten, BatchNormalization, Activation
from tensorflow.keras import optimizers

In [100]:
model = Sequential()

In [101]:
# Neural Network with a Flatten layer followed by Relu activation layer, Batch Normalization layer, Hidden layer with 64 neurons, Relu activation, Batch normalization and 
# and hidden layer with 10 nodes followed by softmax activation function which finally gives 10 outputs.
model.add(Flatten( input_shape=(1024,)))
model.add(Activation('relu' ))
model.add(BatchNormalization())
model.add(Dense(64, activation='relu', kernel_initializer='he_normal'))
model.add(BatchNormalization())
model.add(Dense(10, activation='softmax'))

# 5. Compile the model (10 points)
- with loss as categorical cross-entropy and adam optimizers. 
- Use accuracy as the metric for evaluation



In [102]:
# Defining the Adam optimizer with amsgrad set to True and a learning rate of 0.001.
adam = optimizers.Adam(learning_rate=0.001, amsgrad=True)

# Compiling the model with Adam optimizer, Categorical cross entropy loss function and evaluating the model with accuracy metric.
model.compile('adam', loss='CategoricalCrossentropy', metrics='accuracy')

# 6. Fit and evaluate the model. (5 points)
- Print the loss and accuracy for the test data 

In [103]:
# Fitting the model with a batch size of 100 and 100 epochs (iterations). Also printing the validation loss and accuracy
model.fit(X_train, y_train, batch_size=100, epochs=100, validation_data=(X_test, y_test))

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x7f2bda48b0b8>

In [104]:
#The model uses around 68 K trainable parameters
model.summary()

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten_6 (Flatten)          (None, 1024)              0         
_________________________________________________________________
activation_3 (Activation)    (None, 1024)              0         
_________________________________________________________________
batch_normalization_11 (Batc (None, 1024)              4096      
_________________________________________________________________
dense_13 (Dense)             (None, 64)                65600     
_________________________________________________________________
batch_normalization_12 (Batc (None, 64)                256       
_________________________________________________________________
dense_14 (Dense)             (None, 10)                650       
Total params: 70,602
Trainable params: 68,426
Non-trainable params: 2,176
______________________________________________

In [105]:
# Model gives around 88% accuracy with the full dataset X_val (X_train + X_test)
model.evaluate(X_val, y_val)



[0.41098451614379883, 0.8851833343505859]

In [106]:
# Model gives around 84% accuracy with the test data
model.evaluate(X_test, y_test)



[0.5818215608596802, 0.8485000133514404]

# Conclusion:
The neural networks model gives an accuracy of 84.85% over the test dataset.