# A first look at artificial neural networks

## Artificial Neural Networks

Artificial Neural Networks (ANNs) are computational processing systems that are heavily inspired by the way biological nervous systems function. The basic structure of an ANN can be visually conceptualised like:

<a title="Glosser.ca, CC BY-SA 3.0 &lt;https://creativecommons.org/licenses/by-sa/3.0&gt;, via Wikimedia Commons" href="https://commons.wikimedia.org/wiki/File:Colored_neural_network.svg"><img width="256" alt="Colored neural network" src="https://upload.wikimedia.org/wikipedia/commons/thumb/4/46/Colored_neural_network.svg/256px-Colored_neural_network.svg.png"></a>

**Figure 1:** A simple three layered fully connected feedforward neural network (FNN), aka Multi-layer perceptrons (MLPs),comprisedof a input layer, a hidden layer and an output layer.



## Neural network modelling frameworks

#### Tensorflow and Keras 
Keras (https://keras.io) is a highlevel and easy-to-use API built on top of Tensorflow (https://www.tensorflow.org) for developing and evaluating deep learning models. It was developed with a focus on enabling fast experimentation. 

In this worksheet, you will learn how to create a neural network using Tensorflow Keras API. 

Neural network models are trained by gradient descent. As we perform computations, Tensorflow memorises the computation graph that we build up. When it comes to compute the gradient, tensorflow can trace back over the computation graph (using the backpropagation algorithm) and work out all the required gradients (handled in TensorFlow via the [GradientTape](https://www.tensorflow.org/api_docs/python/tf/GradientTape) object). 

When we build models with Keras, we don't usually have to worry about low level computation. High-level deep learning concepts translate to Keras APIs:

- **Layers**, which are combined into a model A loss function, which defines the feedback signal used for learning
- An **optimizer**, which determines how learning proceeds Metrics to evaluate model performance, such as accuracy
- A training loop that performs **mini-batch** stochastic gradient descent

Colab has got Tensorflow installed already. It is recommended to run this notebooke using Colab (https://colab.research.google.com/), or on your own computer after proper libraries including Tensorflow has been installed. In case you use Anaconda or miniconda and haven't got installed Tensorflow yet, you can do it following the instruction here:
https://docs.anaconda.com/anaconda/user-guide/tasks/tensorflow/

If you are not sure if you have got a suitable GPU, you can start with installing Tensorflow for CPU only.

If you have trouble installing Tensorflow in your computer, or your computer does not have sufficient computational power, it would be easier to switch to Colab, or Kaggle kernel, or use some virtual machine in the cloud, where better computational resources (for RAM, CPU, GPU) are available. 

In [None]:
# Import tensorflow, check version
import tensorflow as tf
from tensorflow import keras
print(tf.__version__)

### Tensor
In general, all current machine-learning systems use tensors as their basic data structure — usually for numerical data. 

Tensors are a generalization of matrices to an arbitrary number of dimensions (note that a dimension is often called an axis in a tensor). NumPy arrays can be called tensors.

* Scalar (rank-0 tensor).
* Vectors (rank-1 tensor).
* Matrices: rank-2 tensors.
* Rank-3 tensors and higher-rank tensors (e.g. array x below are Rank-3 tensors)

In [None]:
import numpy as np
x = np.array([[[5, 78, 2, 34, 0],
                   [6, 79, 3, 35, 1],
                   [7, 80, 4, 36, 2]],
                  [[5, 78, 2, 34, 0],
                   [6, 79, 3, 35, 1],
                   [7, 80, 4, 36, 2]],
                  [[5, 78, 2, 34, 0],
                   [6, 79, 3, 35, 1],
                   [7, 80, 4, 36, 2]]])
print(x.ndim)
print(x.shape)

In [None]:
# Constant tensors (Not assignable!!)
# All ones/zero tensors
x = tf.ones(shape=(2, 1))
print(x)
x = tf.zeros(shape=(2, 1))
print(x)

# Random tensors
x = tf.random.normal(shape=(3, 1), mean=0., stddev=1.) 
print(x)

x = tf.random.uniform(shape=(3, 1), minval=-1., maxval=1.) 
print(x)

x = 0 # Scalar
print(x)

In [None]:
# Unlike numpy array, tensor constant is not assignable.
# see e.g. error raised when trying to assign a value to tensor constant
import numpy as np
x = np.ones(shape=(2, 2))
x[0, 0] = 0.

x = tf.ones(shape=(2,2))
x[0, 0] = 0.

In [None]:
# To modify tensors, tensor Variabble is used in tensorflow
v = tf.Variable(initial_value=tf.random.normal(shape=(3, 1)))
print(v)

# Assign values to a tensor variable
v.assign(tf.ones((3, 1)))
print(v)

# Modify element with index [0,0] in tensor variable v to 0
# Your code here
#

## MNIST

 The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. The MNIST data-set is about 12 MB and will be downloaded automatically if it is not located in the given path.

In [None]:
from tensorflow.keras.datasets import mnist

(image_train, label_train), (image_test, label_test) = mnist.load_data()

print("Size of:")
print("- Training-set:\t\t{}".format(image_train.shape))
print("- Test-set:\t\t{}".format(image_test.shape))

In [None]:
# Your code: 
# print label_train, print its shape



The data-set is split into 3 mutually exclusive sub-sets. We will only use the training and test-sets in this tutorial.

Define a simple function to have a look at the images.

## Getting to know our data

The following method is plots 9 images from the dataset in a 3x3 grid.

In [None]:
import matplotlib.pyplot as plt

def plot_images(images, cls_true, cls_pred=None):
    assert len(images) == len(cls_true) == 9
    
    # Create figure with 3x3 sub-plots.
    fig, axes = plt.subplots(3, 3)
    fig.subplots_adjust(hspace=0.3, wspace=0.3)

    for i, ax in enumerate(axes.flat):
        # Plot image.
        #ax.imshow(images[i].reshape(img_shape), cmap='binary')
        ax.imshow(images[i], cmap='binary')

        # Show true and predicted classes.
        if cls_pred is None:
            xlabel = "True: {0}".format(cls_true[i])
        else:
            xlabel = "True: {0}, Pred: {1}".format(cls_true[i], cls_pred[i])

        ax.set_xlabel(xlabel)
        
        # Remove ticks from the plot.
        ax.set_xticks([])
        ax.set_yticks([])
        
    # Ensure the plot is shown correctly with multiple plots
    # in a single Notebook cell.
    plt.show()

Plot a few images to see if data is correct

In [None]:
# We know that MNIST images are 28 pixels in each dimension.
img_size = 28

# Images are stored in one-dimensional arrays of this length. (might want to use this later for input_size...)
img_size_flat = img_size * img_size

# Tuple with height and width of images used to reshape arrays.
img_shape = (img_size, img_size)

# Number of classes, one class for each of 10 digits.
num_classes = 10

# Get the first images from the test-set.
images = image_train[0:9]

# Get the true classes for those images.
cls_true = label_train[0:9]

# Plot the images and labels using our helper-function above.
plot_images(images=images, cls_true=cls_true)

For this first network, we'll flatten everything into a 784-dimensional feature vector.

In [None]:
# In general, you may select between any two indices along each tensor axis. 
# For instance, in order to select 14 × 14 pixels in the bottom-right corner for image with index 10, you do this:
my_slice = image_train[10, 14:, 14:]
print(my_slice.shape)
my_slice

The images we get were stored in an array of shape (60000, 28, 28) of type uint8 with values in the [0, 255] interval. We transform it into a float32 array of shape (60000, 28 * 28) with values between 0 and 1.

In [None]:
# Prepare data, flatten the input and rescale
X_train = image_train.reshape((60000, 28*28)) # or #X_train = image_train.reshape(X_train.shape[0], -1)
X_train = X_train.astype('float32')/255 
X_test = image_test.reshape((10000, 28*28)) # or #X_test = image_test.reshape(X_test.shape[0], -1)
X_test = X_test.astype('float32')/255

print("Size of:")
print("- Training-set:\t\t{}".format(X_train.shape))
print("- Test-set:\t\t{}".format(X_test.shape))

#### To build a fully connected feedforward network (aka Multilayer Perceptron (MLP))
The first model we'll use is a simple fully connected feedforward network. This is called a Dense layer in Keras. -
#### Dimensionality reduction with PCA
Since fully connected layers are a bit heavy on image data (and you're probably running this on your laptop), we'll reduce the dimensionality of the data by [PCA (principal component analysis)](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html). PCA decompose a multivariate dataset in a set of successive orthogonal components that explain a maximum amount of the variance. We will discuss topic on dimensionalilty reduction in a few weeks. 

In [None]:
from sklearn.decomposition import PCA

pca = PCA(n_components=60) # reduce to 60 dimensions
pca.fit(X_train)

X_train = pca.transform(X_train)
X_test = pca.transform(X_test)

print(X_train.shape) 
print(X_test.shape) # NB: none of this is Keras yet. We're just using sklearn on some numpy arrays

The training labels are encoded as integers. We need these as one-hot vectors instead, so we can match them to the ten outputs of the neural network.

In [None]:
from keras.utils import to_categorical

print(label_train.shape, label_test.shape)

y_train = to_categorical(label_train)
y_test = to_categorical(label_test)

print(y_train.shape, y_test.shape)

### Question: 
Look at the one-hot coding given by y_train[0], which digit does it represent?

In [None]:
print(y_train[0]) # print the one-hot vector for the first example

## Keras Model

The core data structure of Keras is a model, a way to organize layers. The simplest type of model is the ```Sequential``` model, a linear stack of **layers**. Before we start take some time to read the ```Sequential``` documentation:

https://keras.io/getting-started/sequential-model-guide/

Note: Keras has two APIs for this: the **Sequential** API and the **Model** API. The sequential API (the simplest) assumes that your model is a simple sequence of operations, usually neural network layers. The input is passed through the first layer, the result of that is passed through the second and so on. 

This is useful for simple NN models where you are only interested in the input and output. If your model gets more complex, you may want to use the Model API (we'll use in the next practicals).

Now, creating a Sequential model. 

We can create a Sequential model incrementally via the add() method.

In [None]:
from keras.models import Sequential
from keras import layers

model = Sequential() # or specify model name: model = Sequential(name="my_first_model")
model.add(keras.Input(60)) # Input layer, input shape setup here should match with the input data when fitting or making prediction.
model.add(layers.Dense(32, activation='relu'))#, input_shape=(60,))) # first dense layer, 32 hidden units,
model.add(layers.Dense(10, activation='softmax'))# second dense layer, output class probabilities with softmax activation
model.summary()

In [None]:
# Check model weights
model.weights

You can also create a Sequential model by passing a list of layers to the Sequential constructor:

In [None]:
model = Sequential(    
    [
     keras.Input(shape=(60)),   
     layers.Dense(32, activation='relu'),
     layers.Dense(10, activation='softmax')
    ]
)

model.summary()

# Get the list of layers in the model
model.layers

**Question**: 
1. how many hidden layers does this model have?

2. how to compute the total number of parameters or weights? 
(Remeber to add the parameters for bias)


Once create the model, you can start training and validating the model. https://keras.io/guides/training_with_built_in_methods/

In [None]:
optimizer = keras.optimizers.SGD(learning_rate=0.01)
#optimizer = keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

## Training

We've also told the compiler that we'd like it to compute accuracy for us during training (since categorical cross-entropy loss is a bit hard to interpret).

We're now ready to start training using the fit() method:
- The **batch size** to use within each epoch of mini-batch gradient descent: the number of training examples considered to compute the gradients for one weight update step.
- The number of **epochs** to train for: how many times the training loop should iterate over the data passed. 

In [None]:
# Train the model, iterating on the data in batches of 32 samples
history = model.fit(X_train, y_train, epochs=5, batch_size=1000);

## Validating##
A deep learning model should never be evaluated on its training data alone - it’s standard practice to use a validation set to monitor the accuracy (or other performance metric) of the model during training. Here, we randomly set apart 10% of training samples from the original training data for validation using validation_split argument in fit() method. Usually, we create a validation set separately, and set "validation_data" in fit(). For more details on keras model training api, check https://keras.io/api/models/model_training_apis/

The call to fit() returns a **History** object. This object contains a history field, which is a dict mapping keys such as "loss" or specific metric names to the list of their per-epoch values.

In [None]:
history = model.fit(X_train, y_train, validation_split=0.1, epochs=5, batch_size=1000);

In [None]:
# Consider increase the number of epochs if the model loss or traininng performance is not good enough or not converged
# 

In [None]:
history.history.keys()

In [None]:
history.history

In [None]:
# Plotting the training and validation loss
import matplotlib.pyplot as plt
history_dict = history.history
loss_values = history_dict["loss"]
val_loss_values = history_dict["val_loss"]
epochs = range(1, len(loss_values) + 1)
plt.plot(epochs, loss_values, "bo", label="Training loss")
plt.plot(epochs, val_loss_values, "b", label="Validation loss")
plt.title("Training and validation loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
plt.show()

In [None]:
# Plotting the training and validation accuracy
# Your code here:
# 


In [None]:
# After the model training is complete
# evaluate the model 
results = model.evaluate(X_test, y_test)
print("test loss, test acc:", results)

#_, accuracy = model.evaluate(X_test, y_test)
#print('Accuracy: %.2f' % (accuracy*100))

##Inference##
To make predictions using a trained model 

In [None]:
# To make predictions and get the confusion matrix
# model.predict will generate probability for all 10 digits, 
#    so we use argmax to pick the class with the highest probability
from sklearn import metrics
y_pred = model.predict(X_test)
print(y_pred[0]) # print the prediction for first image

print('\nConfusion matrix: ')
matrix = metrics.confusion_matrix(y_test.argmax(axis=1), y_pred.argmax(axis=1))
print(matrix)

print('\nClassification report: ')
print(metrics.classification_report(y_test.argmax(axis=1), y_pred.argmax(axis=1)))

### Tasks

As you now have a model to play with (in no particular order):

1. Change the `learning-rate` for the optimiser.
2. Change the ```batch_size``` to e.g. 1 or 1000 (and see how learning rate deals with larger/smaller batch sizes).
3. Change the optimiser: from SGD to Adam
4. Add complexity to the model, being mindful of how "powerful" your computers are.

7. Try and find a sweet-spot between the size and performance of the model, take into account things like the number of iterations/epochs required to train the model in these assumptions.

6. Do you get the exact same results if you run the Notebook multiple times without changing any parameters? Why or why not?

5. Do you think these changes will have the same effect (if any) on other classifiers?

8. Investigate **effect of dimensionality reduction** (with PCA, e.g. you may try to build the model without PCA and compare ...)

9. Try out some **other ML classifiers** that have been discussed in the class, e.g. kNN (with scikit-learn) 

In [None]:
# Your code
#
#

