# Neural Networks

In this activity we are going to build a simple neural network (NN) to detect numbers in images. Our NN will be created using `tensorflow/keras` package from Python. We will also use `matplotlib` for showing some images.

In [None]:
import tensorflow as tf
import matplotlib.pyplot as plt


## Dataset

In this example, we are going to use the MNIST classical dataset. This  dataset is already present in Tensorflow module which can be accessed using the API `tf.keras.dataset.mnist`.

MNIST dataset consists of 60,000 training images and 10,000 test images along with labels representing the digit present in the image. Each image is represented by 28×28 grayscale pixels.

In the next code, the dataset is loaded and see the shape of datasets and also the how our dataset looks like

In [None]:
mnist = tf.keras.datasets.mnist
(train_images, train_labels) , (test_images, test_labels) = mnist.load_data()

# Printing the shapes
print("train_images shape: ", train_images.shape)
print("train_labels shape: ", train_labels.shape)
print("test_images shape: ", test_images.shape)
print("test_labels shape: ", test_labels.shape)
 
 
# Displaying first 9 images of dataset
fig = plt.figure(figsize=(10,10))
 
nrows=3
ncols=3
for i in range(9):
  fig.add_subplot(nrows, ncols, i+1)
  plt.imshow(train_images[i])
  plt.title("Digit: {}".format(train_labels[i]))
  plt.axis(False)
plt.show()


## Preparing the data

You should always preprocess your data before moving it to train a neural network. Preprocessing the dataset makes it ready as input to the machine learning model.

Images in our dataset are made up of grayscale pixels in range 0 – 255. Machine Learning models works better if the range of values dataset is using is small. So we convert its range to 0 – 1 by dividing it by 255.

We also convert our labels from digit labels to one-hot encoded vectors. One-hot encoded vector is a binary vector representation of labels in which all elements are 0 except index of the corresponding label whose value is 1. We will use to_categorical() method to convert labels to one-hot.

For example, for label 2, index 2 will have have 1, rest all will be 0. ( [ 0 0 1 0 0 0 0 0 0 0 ] ).

In [None]:
# Converting image pixel values to 0 - 1
train_images = train_images / 255
test_images = test_images / 255
 
print("First Label before conversion:")
print(train_labels[0])
 
# Converting labels to one-hot encoded vectors
train_labels = tf.keras.utils.to_categorical(train_labels)
test_labels = tf.keras.utils.to_categorical(test_labels)
 
print("First Label after conversion:")
print(train_labels[0])

## Building the Neural Network

Building a neural network takes 2 steps: configuring the layers and compiling the model.

### Setup the layers

This will be the architecture of our model will be composed of three layers (the input layer, a hidden layer and the output layer):

1. **Input layer (flatten layer)**: Our input images are 2D arrays. Flatten layer converts the 2D arrays(of 28 by 28 pixels) into a 1D array(of 28*28=784 pixels) by unstacking the rows one after another. This layer just changes the data shape and no parameters/weights are learned. Code: ` tf.keras.layers.Flatten()`
2. **Hidden Layer**: Our only hidden layer consists of a fully connected (Dense layer) of 512 neurons each with relu activation function. Code: `tf.keras.layers.Dense(units=512, activation='relu')`
3. **Output Layer**: The output layer of the neural network consists of a Dense layer with 10 output neurons which outputs 10 probabilities each for digit 0 – 9 representing the probability of the image being the corresponding digit. The output layer is given softmax activation function to convert input activations to probabilities. Code: `tf.keras.layers.Dense(units=10, activation='softmax')`

Since the output of each layer is input to a single layer only and all the layers are stacked in linear fashion, we will use Sequential() API that takes a list of layers that will come in order one after another.

In [11]:
# Using Sequential() to build layers one after another
model = tf.keras.Sequential([
  # Flatten Layer that converts images to 1D array
  tf.keras.layers.Flatten(),
  # Hidden Layer with 512 units and relu activation
  tf.keras.layers.Dense(units=512, activation='relu'),
  # Output Layer with 10 units for 10 classes and softmax activation
  tf.keras.layers.Dense(units=10, activation='softmax')
])

### Compiling the model

Before we train our model, we need to tell our model a few things. Here are the 3 attributes given to the model during the models compile step:

1. **Loss Function:** This tells our model how to find the error between the actual label and the label predicted by the model.  The choice for a loss function depends on the task that you have at hand: for example, for a regression problem, you’ll usually use the Mean Squared Error (MSE), or for binary classification problem, you'll use the `binary_crossentropy` for the binary classification problem. As you see, we make use of `categorical_crossentropy` since our problem is a multi-class classification.
2. **Optimizer**: This tells our model how to update weights/parameters of the model by looking at the data and loss function value. We will use `adam` optimizer for our model. Some other popular optimization algorithms used are the Stochastic Gradient Descent (SGD) and RMSprop.
3. **Metrics(Optional):** It contains a list of metrics used to monitor the train and test steps. We will use accuracy or the number of images our model classifies correctly.

In [12]:
model.compile(
  loss = 'categorical_crossentropy',
  optimizer = 'adam',
  metrics = ['accuracy']
)

### Training the model

To train our neural network, we will call `fit()` method on model that takes:
1. **Training Data:** In this, we will use `train_images` consisting of images that we will feed to the neural network.
2. **Training Labels:** In this, we will use `train_labels` consisting of labels that represent the output of our training images.
3. **Epochs:** Epochs are the number of times our model will iterate on all training examples. For example, if we specify 10 epochs, then our model will run on all 60,000 training images 10 times.

`fit()` method returns a history object that contains the loss values and metrics specified during compile time after each epoch.

In the next code, our model is trained and some figures about the accuracy/loss is showm

In [None]:
history = model.fit(
  x = train_images,
  y = train_labels,
  epochs = 10
)

# Showing plot for loss
plt.plot(history.history['loss'])
plt.xlabel('epochs')
plt.legend(['loss'])
plt.show()
 
# Showing plot for accuracy
plt.plot(history.history['accuracy'], color='orange')
plt.xlabel('epochs')
plt.legend(['accuracy'])
plt.show()

### Evaluating the model

Now we have trained our neural network, we would like to see how it performs on data our model haven’t seen before. For this we will use our test dataset to see how much accurate it is. For this we will call `evaluate()` method on model.

In [None]:
# Call evaluate to find the accuracy on test images
test_loss, test_accuracy = model.evaluate(
  x = test_images, 
  y = test_labels
)
 
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")

### Making predictions

With our trained model, we can also make predictions on new images and see what our model identifies in the image. We make predictions in 2 steps:

1. **Predicting Probabilities:** We will use `predict()` that will return the probabilities for an image of being it to one of the classes. In our example, for a single image, it will return 10 probabilities for each image representing probabilities of it being a digit 0 – 9.
2. **Predicting Classes:** Now we have 10 probabilities, the class with maximum probability is the one predicted by the model. To find this, we will use `tf.argmax()` that will return the index with maximum value.

Now you can see what our model has predicted. You can change the index to see output for different test images.

In [None]:
predicted_probabilities = model.predict(test_images)
predicted_classes = tf.argmax(predicted_probabilities, axis=-1).numpy()

index=20
 
# Showing image
plt.imshow(test_images[index])
 
# Printing Probabilities
print("Probabilities predicted for image at index", index)
print(predicted_probabilities[index])
 
print()
 
# Printing Predicted Class
print("Probabilities class for image at index", index)
print(predicted_classes[index])


## Experiment and Modify!

In the next cell, you can see the code that we will use in the exercise. It is the same that in the previous sections, but all the images and most of the print are removed.

In [None]:
import tensorflow as tf
import matplotlib.pyplot as plt

# Load dataset 
mnist = tf.keras.datasets.mnist
(train_images, train_labels) , (test_images, test_labels) = mnist.load_data()
 
# Preparing data

# Converting image pixel values to 0 - 1
train_images = train_images / 255
test_images = test_images / 255
 
# Converting labels to one-hot encoded vectors
train_labels = tf.keras.utils.to_categorical(train_labels)
test_labels = tf.keras.utils.to_categorical(test_labels)
 
 
# Defining Model
# Using Sequential() to build layers one after another
model = tf.keras.Sequential([
  # Flatten Layer that converts images to 1D array
  tf.keras.layers.Flatten(),
  # Hidden Layer with 512 units and relu activation
  tf.keras.layers.Dense(units=512, activation='relu'),
  # Output Layer with 10 units for 10 classes and softmax activation
  tf.keras.layers.Dense(units=10, activation='softmax')
])

model.compile(
  loss = 'categorical_crossentropy',
  optimizer = 'adam',
  metrics = ['accuracy']
)

print("Training:")
history = model.fit(
  x = train_images,
  y = train_labels,
  epochs = 10
)
 
print("\nEvaluating:")
# Call evaluate to find the accuracy on test images
test_loss, test_accuracy = model.evaluate(
  x = test_images, 
  y = test_labels
)

print("\n Results: ") 
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")

You’ve successfully built your first model, but you can go even further with this one. Why not try out the following things and see what their effect is? 

Generate other models and test their accuracy. Great starting points to generate a new model are:

* You used 1 hidden layers. Try to use more hidden layers.
* Use hidden layers with more neurons or less neurons.
* Instead of `relu` activation function, try using the `tanh` one (or `sigmoid` or `linear` ...)
* Test other optimizers like `'sgd'` or `'rmsprop'`.
* Try to adapt the [learning rate](https://en.wikipedia.org/wiki/Learning_rate) of the selected optimizer. To do this, instead of:
```python
model.compile(
  loss = 'categorical_crossentropy',
  optimizer = 'adam',
  metrics = ['accuracy']
)
```
use (`Adam` for `'adam'`, `SGD` for `'sgd'`, and `'RMSprop'` for `'rmsprop'`);
```python
model.compile(
  loss = 'categorical_crossentropy',
  optimizer = tf.keras.optimizers.Adam(learning_rate=0.001),
  metrics = ['accuracy']
)
```



In [None]:
# For each model X, you have to:
# 1.- Create the model -> modelX = tf.keras.Sequential([ ...
# 2.- Compile the model -> modelX.compile(...
# 3.- Train the model -> modelX.fit(...
# 4.- Evaluate the model -> modelX.evaluate(...
# 5.- Print the results -> print(f"Test Accuracy: {test_accuracy:.4f}")

Answer: (For each new model, describe how this model is different to the original. Fulfil the table with the accuracy)

* New model 1:
* ...

Results (test accuracy):

| Model | Test accuracy |
| --- | --- |
| Original | 0.9814 |
| New model 1 | |
| ... | |