# Case 0. Learning basics
**Neural Networks for Machine Learning Applications**<br>
24.01.2024<br>
Rabindra Manandhar<br>
[Information Technology, Bachelor's Degree](https://www.metropolia.fi/en/academics/bachelors-degrees/information-technology)<br>
[Metropolia University of Applied Sciences](https://www.metropolia.fi/en)

- **v3**: Simplified version based on discussion with JK.
- **v4**: Added conversion of labels [to_categorical](https://www.tensorflow.org/api_docs/python/tf/keras/utils/to_categorical) and changed the loss function to [categorical crossentropy](https://www.tensorflow.org/api_docs/python/tf/keras/metrics/categorical_crossentropy).
- **v5**: Changes in instructions wordings.

## 1. Introduction


This notebook was created to learn to use basic Tensorflow neural network functions.

The main objectives were to find a simple neural network model and train it to classify the black and white handwritten digits in a small number of epochs.

## 2. Setup

'Tensorflow' library was used here. It was used to build and train a neural network model.

In [None]:
import tensorflow as tf
print(f'tensorflow: {tf.__version__}')

tensorflow: 2.15.0


In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## 3. Dataset

The MNIST dataset comes preloaded in Keras, in the form of a set of four Numpy arrays/tensors. x_train and y_train form the training set, the data that the model will learn from. The model will then be tested on the test set, x_test and y_test. The images are encoded as NumPy arrays, and the labels are an array of digits, ranging from 0 to 9. The images and labels have a one-to-one correspondence.*italicized text*

In [None]:
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [None]:
print(f'x_train: shape {x_train.shape} and ndim {x_train.ndim}')
print(f'x_test:  shape {x_test.shape} and ndim {x_test.ndim}')

print(f'y_train: shape {y_train.shape} and ndim {y_train.ndim}')
print(f'y_test:  shape {y_test.shape} and ndim {y_test.ndim}')

x_train: shape (60000, 28, 28) and ndim 3
x_test:  shape (10000, 28, 28) and ndim 3
y_train: shape (60000,) and ndim 1
y_test:  shape (10000,) and ndim 1


## 4. Preprocessing

1. Normalization of Input Data:
Since the pixel values in images, x_train and x_test in this case, typically range from 0 to 255, dividing by 255.0 scales the pixel values to be in the range [0,1] in data type - float64. Normalizing the data helps in speeding up the training process and improving the convergence of the optimization algorithm.

2. Categorical/Binary representation of Labels:
The labels, y_train and y_test in this case, are each converted into a binary vector of length equal to the number of classes. The element corresponding to the class label is set to 1, and all other elements are set to 0.

    By normalizing the input data and converting the labels to categorical representation, we prepare the dataset for training a neural network model, which can efficiently learn from the data and make predictions on unseen samples. These preprocessing steps are common in many machine learning and deep learning tasks, especially for image classification problems like the MNIST digit recognition task.

In [None]:
x_train = x_train / 255.0
x_test = x_test / 255.0

y_train = tf.keras.utils.to_categorical(y_train)
y_test = tf.keras.utils.to_categorical(y_test)

## 5. Modeling

The core building block of neural networks is the layer. The layer is like a filter for data: some data goes in, and it comes out in a more useful form.

Here, our model is a simple feedforward neural network with linear stack of layers, one input layer (Flatten), one hidden layer (Dense with ReLU activation) and one output layer (Dense with softmax activation). The network is designed to take 28x28 pixel images as input and output the probabilities of each digit class(0 through 9).

Flatten layer is the input layer of the neural network, transform 2D input data (28x28) into a 1D array(28x28=784 elements) and flattens the input while preserving the batch size.

First Dense layer is a fully connected hidden layer with 200 neurons in this case and ReLU activation function. Each neuron in this layer is connected to every neuron in the previous layer. This layer introduces non-linearity to the model and helps in learning complex patterns in the data.

Second Dense layer is the output layer layer of the neural network. It has 10 neurons, each representing one of the possible digit classes (0 through 9). The 'softmax' activation function is used in the output layer for multi-class classification tasks. It converts the raw output values into probabilities, where each value represents the probability of the corresponding class.

In [None]:
model = tf.keras.models.Sequential([ # is a linear stack of layers, allows to build a model by adding layers one by one is sequence.
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(200, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])

#model.summary()

In [None]:
#from tensorflow.keras.utils import plot_model

#plot_model(model, show_shapes = True, show_layer_activations = True)

To make the model ready for training, we need to pick three more things as part of the compilation step:

An optimizer — The mechanism through which the model will update the weights of the neural network based on the training data it sees during training , so as to improve its performance i.e. to minimize the loss function.

A loss function — How the model will be able to measure its performance on the training data, and thus how it will be able to steer itself in the right direction.

Metrics - to monitor and evaluate the performance of the model during training and testing — Here, we’ll only care about accuracy (the fraction of the images that were correctly classified).

In [None]:
model.compile(optimizer = 'adam',
              loss = 'categorical_crossentropy',
              metrics = ['accuracy'])

## 6. Training

In keras, fit() method is used to train the neural network model on a given dataset.

During training, the model will perform the following steps for each epoch:

Forward pass: The model takes the input data (x_train) and passes it through the neural network to generate predictions.

Computation of loss: The model computes the loss, which is a measure of how well the predictions match the true labels (y_train).

Backward pass (Backpropagation): The model calculates the gradients of the loss with respect to the model's parameters (weights and biases) using backpropagation.

Optimization: The optimizer adjusts the model's parameters based on the computed gradients to minimize the loss function.

We require test set accuracy of 0.970. Hence, we first begin training with 1 epoch to see how much test accuracy is. In our case, the training with epoch = 1 gives an accuracy of 0.9307. Since, it did not achieve the desired test accuracy, we repeat the training with epoch = 2, which gives an accuracy of 0.9700. It is important to notice the number of neurons in the hidden layer is set to 200.

In [None]:
model.fit(x_train, y_train, epochs = 2)

Epoch 1/2
Epoch 2/2


<keras.src.callbacks.History at 0x79533aa1fd00>

The training process continued for the specified number of epochs (2 in this case) with 200 neurons in the hidden layer, and after training completed, the model will have learned to make predictions based on the input data. The accuracy of the model on the training data will depend on various factors, including the model architecture, optimizer, and the complexity of the dataset.

## 7. Performance and evaluation

Now that we have trained our model, we check that on average, how good is our model at classifying never-before-seen digits by computing average accuracy over the entire test set using evaluate() method in Keras.


In [None]:
model.evaluate(x_test,  y_test, verbose=2) # The parameter verbose controls the verbosity mode during evaluation. verbose=2 displays one line of output per epoch


313/313 - 0s - loss: 0.0832 - accuracy: 0.9748 - 438ms/epoch - 1ms/step


[0.0831959992647171, 0.9747999906539917]

## 8. Discussion and conclusions

- Which settings were tested before the best model was found

  I tested number of neurons in hidden layer, number of epochs as well as added batch_size.

- Summary of what was your best model and its settings

  Firstly, I tested with 10 neurons in hidden layer and epochs=1. I gruadually increased the number of neurons in the hidden layer to 128 and epochs=2. I subsequently increased the number of neurons to 256 keeping epochs same, achieving test set accuracy of over 0.970. Hence, I reduced the number of neurons to 200, and finally achieved the test set accuracy of 0.970.

- What was the final achieved performance

  loss: 0.0966, accuracy: 0.9686

- What are your main observations and learning points

  a. Preprocessing techniques such as normalization are essential when working with image data.

  b. Experimenting and exploring activation functions, losss functions and optimizers.

  c. The number of neurons in a hidden layer directly affects the capacity of the model to learn complex patterns in the data.

  d. Understanding evaluation metrics such as accuracy.

- Discussion how the model could be improved in future.

  We are in the first steps of understanding deep learning. Hence, I have no points to discuss how this model could be improved.

To learn more, see [Tensorflow tutorials](https://www.tensorflow.org/tutorials).