## Deep Neural Network for MNIST Classification

We'll apply all the knowledge from the theory in this code to write a deep neural network. The problem we've chosen is referred to as the "Hello World" for machine learning because for most students it is their first example. The dataset is called MNIST and refers to handwritten digit recognition. You can find more about it on Yann LeCun's website (Director of AI Research, Facebook). He is one of the pioneers of what we've been talking about and of more complex approaches that are widely used today, such as covolutional networks. The dataset provides 28x28 images of handwritten digits (1 per image) and the goal is to write an algorithm that detects which digit is written. Since there are only 10 digits, this is a classification problem with 10 classes. In order to exemplify what we've talked about in this section, we will build a network with 2 hidden layers between inputs and outputs.

#Import Libraries

In [2]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

# TensorFLow includes a data provider for MNIST that we'll use. thats why we import tfds.


#Data

In [3]:
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised = True)
#this downloads mnist dataset thats available in tensorflow datasets
#it downloads in C:\users\suman\tensorflow_datasets
#so everytime we run the code it will automatically load the local copy from our pc
#as_supervised true loads the data in a 2 tuple structure [input,target]
#with_info true provides a tuple containing info about version, features, samples and stores it in mnist_info



Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to /root/tensorflow_datasets/mnist/3.0.1...


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Extraction completed...: 0 file [00:00, ? file/s]

Generating splits...:   0%|          | 0/2 [00:00<?, ? splits/s]

Generating train examples...: 0 examples [00:00, ? examples/s]

Shuffling /root/tensorflow_datasets/mnist/incomplete.GJ6ZT0_3.0.1/mnist-train.tfrecord*...:   0%|          | 0…

Generating test examples...: 0 examples [00:00, ? examples/s]

Shuffling /root/tensorflow_datasets/mnist/incomplete.GJ6ZT0_3.0.1/mnist-test.tfrecord*...:   0%|          | 0/…

Dataset mnist downloaded and prepared to /root/tensorflow_datasets/mnist/3.0.1. Subsequent calls will reuse this data.


In [4]:
#setting train, validation and test data
mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']
#by default tf mnist dataset has only training and testing data so we must split data manually to get validation data
#if we browse the tensorflow mnist dataset we see that total 70000 data is split into 60000 for training and 10000 for testing
#we will take 10% of training data as validation data
num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
#contains number of validation samples
#mnist_info.splits['train'] splits training data, .num_examples is the number of training data
#so this code divides number of training data by 10 but we need it in integer form so,
num_validation_samples = tf.cast(num_validation_samples, tf.int64)
#tf.cast is used to change the data type
#lets store number of test data for easier access
num_test_samples = mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)


In [5]:
#scaling the data
def scale(image, label):
  image = tf.cast(image, tf.float32)
  #we know value of a pixel can be from 0 to 255 so if we divide each element by 255 our result will be between 0 and 1
  image /= 255. #dot shows we need float result
  return image, label
#we can transform manually or we have a tensorflow method called dataset.map() but this can transform data that only take input and label and return only input and label
#dataset.map(*function*) applies a custom transformation to a given dataset. it takes as input function which determines the transformation.
#we can scale our data in any way but its function must take label and image also return label and image

scaled_train_and_validation_data = mnist_train.map(scale)
#scales our training and validation data
test_data = mnist_test.map(scale)
#scales our test data


In [6]:
#shuffling the data to remove any order or same data in one batch
BUFFER_SIZE = 10000
#when we are dealing with huge datasets we cant shuffle all data at once, so we will take 10k samples at a time and shuffle
#note: if buffer size is 1, no shuffling
#if buffer size >= no. of samples, shuffling will happen at a time uniformly
#if buffer size < no. of samples, computational power is optimized
shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)
#we had declared how much sample must be given for validation data from the training set now set the validation data
validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
#this will take data from the shuffled train and validation data it will take the number of samples that is specified in num_validation_samples
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)
#this will take data from the shuffled train and validation data it will skip the number of samples that is specified in num_validation_samples

In [7]:
#we will use minibatch gradient descent to train
#batch_size =1, sgd while batch_size=no. of samples, gd also if 1<batch_size<no. of samples, minibatch gd
BATCH_SIZE = 100
#dataset.batch(batch_size) is a method that combines the consecutive elements of a dataset into batches
train_data = train_data.batch(BATCH_SIZE)
#since we will only forward propagate in validation data and not back propagate we dont need to batch the validation data
#for validation or testing we only forward propagate once
#when batching we find aveerage loss and average accuracy but while validation and testing we need exact values
#in forward propagation we dont need much computational power so its easy to calculate exact values
#but the model still expects validation and testing dataset in batch form too so we must override that
validation_data=validation_data.batch(num_validation_samples)
test_data = test_data.batch(num_test_samples)
#this groups all data in 1 batch for validation and testing data
validation_inputs, validation_targets = next(iter(validation_data))
#next loads the next batch/element of an iterable object since here there is only one batch it will load the inputs and the targets
#iter is a python syntax to make validation data an iterator means it will make the dataset iterable
#iter creates an object which can be iterated one element at a time eg. (in for or while loop)


#Model
##784 inputs we have 10 outputs we will have 2 hiddenlayers with 50 nodes each, width and depth here are hyper parameters

In [8]:
input_size = 784
output_size = 10
hidden_layer_size = 50

model = tf.keras.Sequential([
    #sequential() lays down the model used to stack layers
                             tf.keras.layers.Flatten(input_shape=(28,28,1)),
                             #flatten transforms (flattens) a tensor into a vector, here input shape is 28x28x1
                             tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
                             #dense(output_size) takes inputs, provided to the model and calculates the dot product of inputs and weights and adds the bias, this is also where we can apply the activation function
                             tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
                             #both hidden layers are created
                             tf.keras.layers.Dense(output_size, activation='softmax')
                             #output layer we need softmax as values must be converted to probabilities in output
])

  super().__init__(**kwargs)


#Optimizer and loss function

In [9]:
#model.compile(optimizer,loss) configures the model for training
#adam is the best optimizer we got
#cross entropy is normally used in classifiers in tf documentation we see we have binary, categorical and sparse categorical cross entropies
#categorical cross entropy expects we have one hot encoded targets while sparse categorical crossentopy applies the one hot encoding
#metrics here will help us calculate accuracy as mentioned
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

#Training

In [10]:
NUM_EPOCHS = 5 #set to 5 iterations of training
model.fit(train_data, epochs=NUM_EPOCHS, validation_data=(validation_inputs, validation_targets), verbose=2)
#model is trained to mentioned number of epochs using train_data and validated using validation data
#verbose 2 means we receive only the important info of each epoch

#at begining of each epoch, the training loss will be set to 0
#the algorithm will iterate over a preset number of batches all from train_data
#the weights and biases will be updated as many times as there are batches
#we will get a value for the loss function, indicating how the training is going
#we will also see a training accuracy
#at end of epoch the algorithm will forward propagate the whole validation set


Epoch 1/5
540/540 - 12s - 22ms/step - accuracy: 0.8815 - loss: 0.4134 - val_accuracy: 0.9388 - val_loss: 0.2144
Epoch 2/5
540/540 - 5s - 10ms/step - accuracy: 0.9459 - loss: 0.1852 - val_accuracy: 0.9573 - val_loss: 0.1551
Epoch 3/5
540/540 - 7s - 12ms/step - accuracy: 0.9597 - loss: 0.1368 - val_accuracy: 0.9600 - val_loss: 0.1371
Epoch 4/5
540/540 - 4s - 8ms/step - accuracy: 0.9663 - loss: 0.1114 - val_accuracy: 0.9682 - val_loss: 0.1080
Epoch 5/5
540/540 - 4s - 8ms/step - accuracy: 0.9715 - loss: 0.0934 - val_accuracy: 0.9713 - val_loss: 0.0953


<keras.src.callbacks.history.History at 0x7ce04d406a50>

In [11]:
#we can see 540 data each batch is trained in every epoch, took nearly 4 to 12s
#we can see the training loss we can see it decreasing in every epoch
#accuracy shows in what % of cases our outputs were equal to the targets
#we can also see the loss and accuracy of the validation data
#we often see validation loss to check if our model is overfitting
#validation accuracy is the true accuracy of the model while training accuracy is the average accuracy of the batches
#we can change hyper parameters like depth(increase no. of hidden layers) and try to increase accuracy

#Testing

In [12]:
test_loss, test_accuracy = model.evaluate(test_data)
print('Test loss: {0:.2f}, Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1s/step - accuracy: 0.9680 - loss: 0.1067
Test loss: 0.11, Test accuracy: 96.80%


In [13]:
#getting a test accuracy value close to validation accuracy value prooves we have not overfit the model