<a href="https://colab.research.google.com/github/aksekhar/SVHN-is-a-real-world-image/blob/master/SVHN-is-a-real-world-image.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

• **DOMAIN**: Autonomous Vehicles

• **CONTEXT**: A Recognising multi-digit numbers in photographs captured at street level is an important component of modern-day map
making. A classic example of a corpus of such street-level photographs is Google’s Street View imagery composed of hundreds of millions
of geo-located 360-degree panoramic images.
The ability to automatically transcribe an address number from a geo-located patch of pixels and associate the transcribed number with a
known street address helps pinpoint, with a high degree of accuracy, the location of the building it represents. More broadly, recognising
numbers in photographs is a problem of interest to the optical character recognition community.
While OCR on constrained domains like document processing is well studied, arbitrary multi-character text recognition in photographs is
still highly challenging. This difficulty arises due to the wide variability in the visual appearance of text in the wild on account of a large
range of fonts, colours, styles, orientations, and character arrangements.
The recognition problem is further complicated by environmental factors such as lighting, shadows, specularity, and occlusions as well as
by image acquisition factors such as resolution, motion, and focus blurs. In this project, we will use the dataset with images centred around
a single digit (many of the images do contain some distractors at the sides). Although we are taking a sample of the data which is simpler,
it is more complex than MNIST because of the distractors.

• **DATA DESCRIPTION**: The SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with the
minimal requirement on data formatting but comes from a significantly harder, unsolved, real-world problem (recognising digits and
numbers in natural scene images). SVHN is obtained from house numbers in Google Street View images.

Where the labels for each of this image are the prominent number in that image i.e. 2,6,7 and 4 respectively.


In [None]:
!pip install category_encoders

importing modules

In [None]:
import pandas as pd
import numpy as np
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn import model_selection
from sklearn.compose import ColumnTransformer
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.impute import SimpleImputer
import warnings
from sklearn import metrics
from sklearn.metrics import confusion_matrix
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input, Dropout,BatchNormalization,Flatten
import random
from tensorflow.keras import regularizers, optimizers, backend
import matplotlib.pyplot as plt
import h5py
import category_encoders as ce


random.seed(1)
np.random.seed(1)
tf.random.set_seed(1)


warnings.filterwarnings("ignore")


# Hellping Class and functions

In [None]:
def make_confusion_matrix(cf,
                          group_names=None,
                          categories='auto',
                          count=True,
                          percent=True,
                          cbar=True,
                          xyticks=True,
                          xyplotlabels=True,
                          sum_stats=True,
                          figsize=None,
                          cmap='Blues',
                          title=None):
    '''
    This function will make a pretty plot of an sklearn Confusion Matrix cm using a Seaborn heatmap visualization.
    Arguments
    '''


    # CODE TO GENERATE TEXT INSIDE EACH SQUARE
    blanks = ['' for i in range(cf.size)]

    if group_names and len(group_names)==cf.size:
        group_labels = ["{}\n".format(value) for value in group_names]
    else:
        group_labels = blanks

    if count:
        group_counts = ["{0:0.0f}\n".format(value) for value in cf.flatten()]
    else:
        group_counts = blanks

    if percent:
        group_percentages = ["{0:.2%}".format(value) for value in cf.flatten()/np.sum(cf)]
    else:
        group_percentages = blanks

    box_labels = [f"{v1}{v2}{v3}".strip() for v1, v2, v3 in zip(group_labels,group_counts,group_percentages)]
    box_labels = np.asarray(box_labels).reshape(cf.shape[0],cf.shape[1])


    # CODE TO GENERATE SUMMARY STATISTICS & TEXT FOR SUMMARY STATS
    if sum_stats:
        #Accuracy is sum of diagonal divided by total observations
        accuracy  = np.trace(cf) / float(np.sum(cf))



    # SET FIGURE PARAMETERS ACCORDING TO OTHER ARGUMENTS
    if figsize==None:
        #Get default figure size if not set
        figsize = plt.rcParams.get('figure.figsize')

    if xyticks==False:
        #Do not show categories if xyticks is False
        categories=False


    # MAKE THE HEATMAP VISUALIZATION
    plt.figure(figsize=figsize)
    sns.heatmap(cf,annot=box_labels,fmt="",cmap=cmap,cbar=cbar,xticklabels=categories,yticklabels=categories)


    if title:
        plt.title(title)

In [None]:
def mount_drive(fpath):
  try:
    from google.colab import drive
    drive.mount("/content/drive/", force_remount=True)
    google_drive_prefix = "/content/drive/My Drive"
    data_path = f"{google_drive_prefix}/{fpath}"
    print(data_path)
    return data_path
  except ModuleNotFoundError:
    data_prefix = f"{fpath}"

In [None]:
def read_file(file_name):
  print(f"reding the file: {file_name}")
  try:
    file_type = file_name.split('.')[1]
    print(file_type)
    if file_type == 'csv':
      df = pd.read_csv(file_name)
    else:
      pass
    return df
  except Exception as e:
    print("data write")
    raise e

In [None]:
def verify_sample_data(df,records):
  if not(records):
    records = 5
  print(f"verifying first {records} records and last {records} records")
  return pd.concat([df.head(records),df.tail(records)])

In [None]:
def get_percentage_of_missing_values(df):
  print("Checking for missing values and print percentage for each attribute")
  return (round(df.isnull().sum() / (df.isnull().count())*100))

In [None]:
def plot_distribution(data,variable_name):

  target_variable = data[variable_name]
  plt.hist(target_variable)
  plt.xlabel(f"{variable_name}")
  plt.ylabel("Frequency")
  plt.title("Distribution of Target Variable")
  plt.show()

In [None]:
def plot_tring_accuracy(history,t_lebel,v_level,y_level):

  plt.figure(figsize=(5, 3))
  plt.plot(history.history['accuracy'], color='b', label=t_lebel)
  plt.plot(history.history['val_accuracy'], color='r', label=v_level)
  plt.title(f"{t_lebel} and {v_level}")
  plt.xlabel('Epoch')
  plt.ylabel(y_level)
  plt.legend()
  plt.show()

In [None]:
def plot_images(images, labels, num_images=10):

    fig, axes = plt.subplots(nrows=1, ncols=num_images, figsize=(10, 1))

    for ax, image, label in zip(axes, images[:num_images], labels[:num_images]):
        ax.set_axis_off()
        ax.imshow(image, cmap="gray_r", interpolation="nearest")
        ax.set_title(f"Label: {label}")

    plt.show()

# 1. Data import and Understanding

In [None]:
file_path = mount_drive("dataset/Autonomous_Vehicles_SVHN_single_grey1.h5")

***a.Read the .h5 file and assign to a variable.***

In [None]:
files  = h5py.File(file_path, 'r')

**b. Print all the keys from the .h5 file**

In [None]:
with files as f:
  for key in f.keys():
    print(key)

**c. Split the data into X_train, X_test, Y_train, Y_test**

In [None]:
with h5py.File(file_path, 'r') as f:

    # Read all the datasets
    X_test = f['X_test'][:]
    X_train = f['X_train'][:]
    y_test = f['y_test'][:]
    y_train = f['y_train'][:]

# 2. Data Visualisation and preprocessing.

**A. Print shape of all the 4 data split into x, y, train, test to verify if x & y is in sync**

In [None]:
# Print shapes of all 4 variables
print(f"X_train shape: {X_train.shape}")
print(f"X_test shape: {X_test.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"y_test shape: {y_test.shape}")

# Verify if train and test data is in sync
if X_train.shape[0] == y_train.shape[0] and X_test.shape[0] == y_test.shape[0]:
  print("Train and test data are in sync")
else:
  print("Train and test data are not in sync")

**B. Visualise first 10 images in train data and print its corresponding labels. **: **bold text**

In [None]:
plot_images(X_train,y_train)

**C.Reshape all the images with appropriate shape update the data in same variable.**

In [None]:
#Reshape data from 2D to 1D -> 32X32 to 1024
X_train = np.asarray(X_train).reshape(42000,1024)
X_test = np.asarray(X_test).reshape(18000,1024)

**D. Normalise the images i.e. Normalise the pixel values.**

In [None]:
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

X_train /= 255
X_test /= 255

In [None]:
print("X_train shape:", X_train.shape)
print("Images in X_train:", X_train.shape[0])
print("Images in X_test:", X_test.shape[0])
print("Max value in X_train:", X_train.max())
print("Min value in X_train:", X_train.min())

**E. Transform Labels into format acceptable by Neural Network**

In [None]:
y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)

print("Shape of y_train:", y_train.shape)
print("One value of y_train:", y_train[0])

**F. Print total Number of classes in the Dataset**

In [None]:
# Get the number of unique classes
num_classes = len(np.unique(y_train))
print(f"Number of classes: {num_classes}")

# 3. Model Training & Evaluation using Neural Network

**A. Design a Neural Network to train a classifier.**

In [None]:
model = Sequential()
model.add(Flatten())

model.add(Dense(512,activation="relu", kernel_initializer='he_normal',input_shape = (1024, )))
model.add(BatchNormalization())

model.add(Dense(256,activation="relu", kernel_initializer='he_normal'))
model.add(Dense(128,activation="relu", kernel_initializer='he_normal'))
model.add(BatchNormalization())
model.add(Dropout(0.5))

model.add(Dense(64,activation="relu", kernel_initializer='he_normal'))
model.add(Dense(32,activation="relu", kernel_initializer='he_normal'))
model.add(BatchNormalization())
model.add(Dropout(0.5))

model.add(Dense(16,activation="relu", kernel_initializer='he_normal'))

model.add(Dense(10,activation="softmax"))


**B. Train the classifier using previously designed Architecture (Use best suitable parameters).**

In [None]:
def train_and_evaluate_model(model_params, model):

    X_train = model_params['X_train']
    y_train = model_params['y_train']
    X_test = model_params['X_test']
    y_test = model_params['y_test']
    optimizer = model_params['optimizer']
    learning_rate = model_params.get('learning_rate', 0.0001)
    batch_size = model_params.get('batch_size', 50)
    epochs = model_params.get('epochs', 15)
    patience = model_params.get('patience', 5)

    adam = optimizers.Adam(learning_rate=learning_rate)

    model.compile(optimizer = optimizer, loss = 'categorical_crossentropy', metrics = ['accuracy'])

    callbacks = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=patience)

    history = model.fit(X_train, y_train, validation_data=(X_test,y_test), batch_size = batch_size, epochs = epochs, verbose = 1, callbacks=[callbacks])

    accuracy_results = model.evaluate(X_test, y_test)

    print('Test accuracy : ', round((accuracy_results[1]*100),2))

    return history, accuracy_results


In [None]:
# model 1 adam optimizer with defind learning rate
data = {
    'X_train': X_train,
    'y_train': y_train,
    'X_test': X_test,
    'y_test': y_test,
    'optimizer': 'adam',
    'learning_rate': 0.001,
    'batch_size': 32,
    'epochs': 10,
    'patience': 3
}

history, accuracy_results = train_and_evaluate_model(data, model)

In [None]:
model.summary()

**C. Evaluate performance of the model with appropriate metrics.**

In [None]:
y_pred=model.predict(X_test)
y_pred = (y_pred > 0.5)
y_pred

In [None]:
cr=metrics.classification_report(y_test,y_pred)
print(cr)

The model was evaluated on 10 classes, labeled from 0 to 9. The performance metrics, including precision, recall, and F1-score, varied across the classes. Class 4 achieved the highest F1-score of 0.78, while class 6 had the lowest at 0.50.

On average, the model correctly identified 78% of instances (precision), correctly retrieved 53% of instances (recall), and achieved a harmonic mean of precision and recall (F1-score) of 0.63. These averages were calculated in three ways: micro-average, macro-average, and weighted average. The micro-average does not consider class imbalance, while the weighted average does. The samples average, which is the average score for each instance, was 0.53 for all three metrics.

In general, the model demonstrated higher precision than recall, indicating a greater accuracy but less comprehensiveness in its predictions. The significant variation in performance across different classes suggests potential benefits from further model tuning or using more balanced training data.

**D. Plot the training loss, validation loss vs number of epochs and training accuracy, validation accuracy vs number of epochs plot and write your
observations on the same.**

In [None]:
plot_tring_accuracy(history,'Training Loss','Validation loss','loss')
plot_tring_accuracy(history,'Training accuracy','Validation accuracy','accuracy')

First graph depicts the **Training Loss** and **Validation Loss** over a series of epochs. Here's a more elaborate breakdown:

- **Training Loss**: The blue line represents the training loss, which quantifies how well the model fits the training data. As the model learns, the training loss decreases. Essentially, it measures the discrepancy between the predicted values and the actual values during training.

- **Validation Loss**: The red line represents the validation loss, which assesses how well the model generalizes to unseen data (the validation set). A low validation loss indicates that the model performs well on new data. If the training loss decreases while the validation loss increases, it might indicate overfitting (where the model memorizes the training data but fails to generalize).

- **Epochs**: The x-axis represents the number of training iterations (epochs). Each epoch corresponds to one complete pass through the entire training dataset. As the model trains, it adjusts its parameters to minimize the loss function.

- **Convergence**: Around epoch 6, the training and validation losses start to converge. This convergence suggests that further training may not significantly improve the model's performance. It's essential to strike a balance between minimizing training loss and preventing overfitting.

In summary, this graph illustrates the model's learning process: how it reduces loss over time and generalizes to new data.


The second graph that represents the training and validation accuracy of a machine learning model over a series of epochs. It shows how the model's accuracy improves as it learns from the training data, and it also displays the accuracy as the model is validated against a separate set of data. The goal is to see both accuracies increase over time, indicating that the model is learning effectively and generalizing well to new data.

# Experement With different params

In [None]:
# model 2 adam optimizer with defind learning rate with 50 epochs
data = {
    'X_train': X_train,
    'y_train': y_train,
    'X_test': X_test,
    'y_test': y_test,
    'optimizer': 'adam',
    'learning_rate': 0.001,
    'batch_size': 32,
    'epochs': 50,
    'patience': 3
}

history, accuracy_results = train_and_evaluate_model(data, model)

y_pred=model.predict(X_test)
y_pred = (y_pred > 0.5)

cr=metrics.classification_report(y_test,y_pred)
print(cr)


plot_tring_accuracy(history,'Training Loss','Validation loss','loss')
plot_tring_accuracy(history,'Training accuracy','Validation accuracy','accuracy')

In [None]:
# model 3 sgd optimizer with defind learning rate with 100 epochs
data = {
    'X_train': X_train,
    'y_train': y_train,
    'X_test': X_test,
    'y_test': y_test,
    'optimizer': 'sgd',
    'learning_rate': 0.001,
    'batch_size': 32,
    'epochs': 50,
    'patience': 3
}

history, accuracy_results = train_and_evaluate_model(data, model)

y_pred=model.predict(X_test)
y_pred = (y_pred > 0.5)

cr=metrics.classification_report(y_test,y_pred)
print(cr)


plot_tring_accuracy(history,'Training Loss','Validation loss','loss')
plot_tring_accuracy(history,'Training accuracy','Validation accuracy','accuracy')

**classification_report:**

  The classification report shows the precision, recall, and F1-score for each class in a multi-class classification problem. Here's a summary of the performance metrics:

- **Precision**: The proportion of true positive predictions among all positive predictions. It measures how accurate the positive predictions are.
- **Recall**: The proportion of true positive predictions among all actual positive instances. It measures how well the model captures positive instances.
- **F1-score**: The harmonic mean of precision and recall. It balances precision and recall, especially when there's an imbalance between classes.

Overall, the model's performance seems reasonable, but it's essential to consider the specific context and requirements of your problem.  



