 # <strong> Introduction to artificial neural networks </strong>
 Spring 2023 - Toulouse INP/ENSEEIHT<br /> 
 by Ruming PAN & Mohamed SAADI<br /> 
 Last update: 20-01-2023<br /> 
 
 This notebook contains three parts:<br />
 <ol>
  <li><strong>Part 1: Building a neural network</strong>, in which a function ```ANN``` builds a DNN (dense neural network) knowing the number of layers (hidden + output), the number of neurons per layer, and given the parameters (weights and biases) for each layer. You are asked to implement an additional activation function (e.g., linear, ReLU) following the example given for the logistic function and test them in new ANN.</li>
  <li><strong>Part 2: Training a neural network</strong>, where a set of data "abalone_data.xlsx" is given to train a DNN by back-propagating the gradient of a loss function with respect to the network parameters. You are required to do three things: (1) define the training and test datasets, (2) scale the features, and most importantly (3) complete the function ```ANN_backpro``` to successfully run the algorithm. </li>
  <li><strong>Part 3: Make your life easier with Keras</strong>, where you are asked to implement a similar DNN using ```keras```, which is a high-level interface for ```TensorFlow```, a famous framework for deep machine learning. The objective here is to present one of the most used libraries in python-based machine learning eco-system. As you will notice, the implementation is very friendly and easy.</li>
</ol> 
 

## <strong>Part 1: Building a neural network</strong>
 In this part, you are required to understand the structure of a neural network and build one given a number of layer ```NL``` and a list of number of neurons per layer ```NpL```.

### Step 1: Import libraries, define global functions and hyperparameters

<strong>Task 1:</strong> Based on the example provided for the logistic function, build new activation functions (tanh, ReLU, and linear) in order to use them later for the neural network.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
import sys
import tensorflow as tf
from tensorflow import keras

## a function that takes a dataframe and returns some basic stastics of its columns
def datasum(df):
    dfmin = df.apply(np.min, 0)
    dfmedian = df.apply(np.median, 0)
    dfmean = df.apply(np.mean,0)
    dfmax = df.apply(np.max, 0)
    dfstd = df.apply(np.std, 0)
    dfsummary = pd.DataFrame(np.array([dfmean, dfstd, dfmin, dfmedian, dfmax]), columns=list(df.columns[:]))
    dfsummary["Statistic"] = ["mean", "std", "min", "median", "max"]
    xcol = dfsummary.shape[1]
    colarr = [dfsummary.columns[xcol-1]]
    for item in dfsummary.columns[0:(xcol-1)]:
        colarr.append(item)
    dfto = dfsummary[colarr]
    return dfto

## activation functions and their derivatives
def logistic(x, mode = "n"):
    '''
    mode = "n" : returns f(x)
    mode = "d": returns f'(x)
    '''
    t = np.exp(-x)
    if mode == "n":
        fx = 1/(1+t)
    if mode == "d":
        fx = t/(1+t)**2
    return fx

## define new functions here


#### Hyperparameters: number of layers, size of each layer, and activation function per layer
<strong>Indication:</strong> The hyperparameters ```NL```, ```NpL``` and ```Nfx``` are very important throughout the code. They define the number of layers, the number of neurons per each layer, and the number of input features. Always make sure that the size of ```NpL``` and ```ActivFun``` is equal to ```NL```.

In [None]:
## NL is the number of layers including the hidden layers AND the output layer
NL = 2
## NpL defines the number of neurones per each layer. The length of NpL should be exactly equal to NL
NpL = [3, 1]
## ActivFun defines the activation functions per each layer
ActivFun = ['logistic', 'logistic']

## Add the number of features x
Nfx = 2
## List of size of layers -1: important for automatic definition of parameters
NpLm1 = [Nfx]
for iL in np.arange(len(NpL)-1):
    NpLm1.append(NpL[iL])
print("Number of neurons per layer: ", NpL)
print("Number of neurons from the previous layer: ", NpLm1)

#### Initialization of parameters
<strong>Task 2:</strong> Initialize the parameters (weights and biases) using the function ```np.random.rand``` (https://numpy.org/doc/stable/reference/random/generated/numpy.random.rand.html). Convenient initial parameter values should be between -1 and 1.

In [None]:
## List of weights and biases
Wts = []
bias = []
for iL in np.arange(len(NpL)):
    ## random initialization
    ## the initial parameters should be between -1 and 1
    ## use the function np.random.rand to make random initializations (but these will be between 0 and 1)
    WL = 
    bL = 
    ## appending
    Wts.append(WL)
    bias.append(bL)
print(Wts[0].shape)
print(bias[0].shape)

### Step 2: Construct the neural network

<strong>Task 3:</strong> Complete this function with the equations for each layer to compute the vectors $z_{L}$ and their activation $y_{L} = \sigma(z_{L})$. The function should return a list ```z``` and a list ```y``` that have ```NpL``` arrays each.

In [None]:
def ANN(x, NpL, Nfx, Wts, bias, ActivFun):
    '''
    This function computes the output of a neural network containing NL layers
    where NL is the length of NpL. 
    o NpL contains the number of neurons per each hidden layer + the output layer. 
    o Nfx is the number of input features.
    o Wts is a list that contains the 2D arrays of weights for each layer.
    Each 2D array of weights has dimensions nL x nL-1, nL being the number of neurons
    of current layer L, and nL-1 the number of neurons (or features) of layer L-1.
    o bias is a list of 1D arrays of weights for each layer.
    Each 1D array has dimensions nL x 1. When there are many data points (or members), say
    n data points, the bias should be repeated using the function np.tile.
    o ActivFun contains the name of the activation function for each layer.
    
    '''
    n = x.shape[1]
    yLm1 = x
    ## z is a list that saves the zL arrays for each hidden and output layer
    z = []
    ## similarly, y is a list that saves the yL arrays. Specifically, yL[NL-1] contains the output.
    y = []
    for iL in np.arange(len(NpL)):
        ## parameters
        WL = Wts[iL]
        bL = bias[iL]
        ## multiplication
        ## make sure that the operation is correct for n individual points.
        ## the dimension of zL should be nL x n. Since bL is nL x 1, use np.tile to overcome this issue.
        zL =  
        z.append(
        ## activation
        ## to call a function given its name, use the function fx = globals()["fun_name"]
        sigma = globals()[ActivFun[iL]]
        yL = 
        y.append(
        ## move to next layer
        yLm1 = 
    return y,z

### Step 3: Verify that the ANN works for a simple example

In [None]:
## Input: 4 data points x 2 features (Nfx = 2)
input_features = np.array([[0,0], 
                           [0,1], 
                           [1,0], 
                           [1,1]])
print(input_features.shape)
print(input_features)

In [None]:
# Output: 4 data points x 1 feature
target_output = np.array([[0], [1], [1], [1]])
print(target_output.shape)
print(target_output)

#### Generate outputs for the 4 individuals
Run this cell to test your neural network. It should deliver an output ysim of shape (n,1), where n is the number of data points (NOT features).

In [None]:
## This where you can test your neural network
x_in = input_features.T
y,z = ANN(x = x_in, NpL = NpL, Nfx = Nfx, Wts = Wts, bias = bias, ActivFun = ActivFun)
ysim = y[NL-1].T
print(ysim.shape)

#### Goodness of fit: MSE and RMSE
<strong>Task 4:</strong> Complete the equation of MSE and RMSE to estimate the errors of the simulation output of the neural network. Use the matrix product to compute the MSE.

In [None]:
n = target_output.shape[0]
MSE = 
print(MSE)
RMSE = 
print(RMSE)

## <strong>Part 2: Training a neural network</strong>
 In this part, a backpropagation algorithm is implemented to train the neural network.

### Step 1: Read the data, define the training and the test datasets

#### Reading the dataset
Any machine learning task involves preparing/preprocessing the data to make it ready for digestion by the machine learning algorithm.

At this stage, the data has a 2D shape (lines = individuals or data points, columns = features). The dimensions of the shape of the data help you define the structure of your neural network.

The "abalone_data.xlsx" dataset that are used for this exercise can be downloaded from https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/. They contain measurements of the following properties of 4177 members of marine snails (<em>haliotis</em>):
<ol>
  <li><em>Type</em>: male (1), immature (0), female (-1); </li>
  <li><em>LongestShell_mm</em>: longest shell measurement in mm; </li>
  <li><em>Diameter_mm</em>: length perpendicular to LongestShell in mm;</li>
  <li><em>Height_mm</em>: height of the member with meat in shell in mm;</li>
  <li><em>WholeWeight_g</em>: mass of the whole abalone in g;</li>
  <li><em>ShuckedWeight_g</em>: weight of meat in the abalone in g;</li>  
  <li><em>VisceraWeight_g</em>: gut weight of the abalone (after bleeding) in g;</li>
  <li><em>ShellWeight_g</em>: weight of the abalone after being dried in g;</li>
  <li><em>Age_yr</em>: age of the abalone in years.</li>
</ol> 
 
The objective of the exercise is to succeed at predicting the age of the abalone knowing its gender, form, and weight.

<strong>Task 5:</strong> Complete the first instruction ```pd.read_excel``` to read the dataset.

In [None]:
df = pd.read_excel(, sheet_name= )
display(df.head(5).style.format("{0:.2f}").set_caption("Few lines of the dataset :"))
dfsum = datasum(df)
pd.options.display.float_format = '{:,.2f}'.format
display(dfsum.style.hide(axis = "index").set_caption("Statistics of the dataset"))

#### Training and test datasets

<strong>Task 6:</strong> After reading the dataset, complete the instructions using ```df.sample``` and ```df.drop``` to split the data into train and test datasets.

In [None]:
## percentage of data to be used for training
percTrain = 0.7

## index of training and test
#trainindex = np.random.rand(len(df)) < percTrain
dftrain = 
dftest =  
print(len(dftrain)/len(df))
print(len(dftest)/len(df))

#### Defining features and output variables

<strong>Task 7:</strong> Complete the instructions to define the target/output/dependent variable (age of the abalone) and the attributes/input/independent variables (remaining variables). 

In [None]:
ytrain = 
xtrain = 
#print(xtrain.head())
#print(ytrain.shape)
ytest = 
xtest = 
 
## printing some information
print('Shape of original data : ', df.shape)
print('xtrain : ',xtrain.shape, 'ytrain : ',ytrain.shape)
print('xtest  : ',xtest.shape,  'ytest  : ',ytest.shape)

#### Scaling the data

<strong>Task 8:</strong> Estimate the mean and standard deviation for each column/feature from the <ins>train</ins> dataset and use them to scale both the train and the test datasets.

In [None]:
## estimate the mean and the standard deviation from the train dataset
xmean = 
xstd =
## scaling
xtrain_scl =
xtest_scl =

In [None]:
x_train_summary = datasum(xtrain)
x_test_summary = datasum(xtest)
x_train_scl_summary = datasum(xtrain_scl)
x_test_scl_summary = datasum(xtest_scl)
## convert to arrays
xtrain_scl, ytrain = np.array(xtrain_scl), np.array(ytrain)
xtest_scl, ytest = np.array(xtest_scl), np.array(ytest)
## print the dataset before and after scaling
display(x_train_summary.style.hide(axis = "index").set_caption("Statistics of the dataset - before scaling"))
display(x_train_scl_summary.style.hide(axis = "index").set_caption("Statistics of the dataset - after scaling"))

### Step 2: Build the backpropagation algorithm

<strong>Task 9:</strong> Complete the following function to train the neural network using the gradient descent method based on backpropagation.

In [None]:
def ANN_backpro(x, ytrue, NpL, Nfx, Wts, bias, ActivFun, lr):
    '''
    o Shape of x: Nfx * n, where n is the number of data points, and Nfx the number of features
    o Shape of ytrue: n * 1, where n is the number of data points.
    
    '''
    ## step 1: feed forward
    n = x.shape[1]
    yLm1 = x
    z = []
    y = []
    for iL in np.arange(len(NpL)):
        ## get the parameters for the current layer
        WL = Wts[iL]
        bL = bias[iL]
        ## estimate zL from yLm1
        zL =  
        z.append(
        ## activation: estimate yL from zL
        sigma = 
        yL = 
        y.append(
        ## move to next layer
        yLm1 = 

    ## step 2: backpropagation
    ytrue = ytrue.T
    dJ_dy = 
    for iL in reversed(np.arange(len(NpL))):
        ## getting zL of the current layer
        zL = 
        ## estimating dJ_dz from dJ_dy
        sigma = 
        dJ_dz = 
        ## getting the parameters of current layer
        WL = 
        bL = 
       
        ## estimating dJ_dW from dJ_dz
        ## dJ_dz : (nL x n)
        ## yLm1 : (nL-1 x n)
        ## getting yL-1
        if(iL == 0):
            yLm1 =
        else:
            yLm1 = 
        dJ_dW = 
        
        ## estimating dJ_db from dJ_dz
        ## dJ_db : nL x 1
        ## dJ_dz : nL x n
        dJ_db =
        
        ## backpropagating the gradient from layer L to layer L-1
        ## WL : nL x nL-1
        ## dJ_dz : nL x n
        ## dJ_dy (L-1) : nL-1 x n
        dJ_dy =
        
        ## Updating the parameters
        WL = 
        bL = 
        Wts[iL] = 
        bias[iL] = 
    
    return Wts, bias

<strong>Task 10:</strong> Complete the initialization of parameters (exactly the same as in part 1).

In [None]:
## NL is the number of layers including the hidden layers AND the output layer
NL = 2
## NpL defines the number of neurones per each layer. The length of NpL should be exactly equal to NL
NpL = [6,1]
## Add the number of features x
Nfx = 8
## ActivFun defines the activation functions per each layer
ActivFun = ['logistic', 'relu']

## List of size of layers -1: important for automatic definition of parameters
NpLm1 = [Nfx]
for iL in np.arange(len(NpL)-1):
    NpLm1.append(NpL[iL])
print("Number of neurons per layer: ", NpL)
print("Number of neurons from the previous layer: ", NpLm1)

## Learning rate
lr = 0.001

## Number of epochs
epochs = 5000

## List of weights and biases
Wts = []
bias = []
for iL in np.arange(len(NpL)):
    ## random initialization
    WL =
    bL = 
    ## appending
    Wts.append(
    bias.append(

<strong>Task 11:</strong> Complete the following lines in order to keep track of (1) epochs, (2) training error (MSE), and (3) test error.

In [None]:
print("Learning rate: ", lr, "  Number of epochs: ", epochs)
MSEtrain = np.array([])
MSEtest = np.array([])
epoch = np.array([])
sys.stdout.write('\r')
for iepoch in np.arange(epochs):
    epoch = np.append(epoch, iepoch)
    ## train the neural network
    x_in = 
    Wts, bias = ANN_backpro(x = x_in, ytrue = ytrain, NpL = NpL, Nfx = Nfx, Wts = Wts, bias = bias, ActivFun = ActivFun, lr = lr)
        
    ## estimate the MSE for the train dataset
    x_in = 
    yout, zout = ANN(x = x_in, NpL = NpL, Nfx = Nfx, Wts = Wts, bias = bias, ActivFun = ActivFun)
    ytrainsim = 
    ntrain = ytrain.shape[0]
    Error_train = 
    
    ## estimate the MSE for the test dataset
    x_in = 
    yout, zout = ANN(x = x_in, NpL = NpL, Nfx = Nfx, Wts = Wts, bias = bias, ActivFun = ActivFun)
    ytestsim = 
    ntest = ytest.shape[0]
    Error_test = 
   
    ## keeping track of the errors
    MSEtrain = np.append(MSEtrain, Error_train[0,0])
    MSEtest = np.append(MSEtest, Error_test[0,0]) 
    
    ## print the evolution
    sys.stdout.write('\r' "Epoch: " + str(int(iepoch + 1)).rjust(5,'0') + "/"
                    + str(int(epochs)).rjust(5,'0') + " " +
                    "Training error: " + str(round(Error_train[0,0],2)) + 
                    "  Test error: " + str(round(Error_test[0,0],2)))

Now, we can see the evolution of the training and test errors across the epochs. Note that only the RMSE (in years) is plotted.

In [None]:
## initialization of the plot
plt.grid(color='black', axis='y', linestyle='-', linewidth=0.5)    
plt.grid(color='black', axis='x', linestyle='-', linewidth=0.5)   
plt.grid(which='minor',color='grey', axis='x', linestyle=':', linewidth=0.5)     
plt.grid(which='minor',color='grey', axis='y', linestyle=':', linewidth=0.5)    
plt.xticks(fontsize=16); plt.yticks(fontsize=16)   
plt.xlabel('epoch',fontsize=16 )
plt.ylabel(r'$RMSE_{train}$ (yr), $RMSE_{test}$ (yr)', size = 16)
## plotting the data
plt.plot(epoch, MSEtrain**0.5, color = "blue", linewidth = 2., label = "Training error")
plt.plot(epoch, MSEtest**0.5, color = "orange", linewidth = 2., label = "Test error")
plt.title("Prediction error", fontsize = 16)
plt.gcf().set_size_inches(10, 5)
plt.legend(loc="upper right", prop={'size': 15})
plt.savefig("fig01.png", dpi = 300,  bbox_inches='tight')
plt.show()

<strong>Task 12:</strong> Complete the code to compute the outputs on the test dataset using the optimized parameters.

In [None]:
## Showing the results for the test dataset
x_in = 
yout, zout = ANN(x = x_in, NpL = NpL, Nfx = Nfx, Wts = Wts, bias = bias, ActivFun = ActivFun)
ytestsim = 
Error_test = 
## Making a scatter plot
## initialization of the plot
plt.grid(color='black', axis='y', linestyle='-', linewidth=0.5)    
plt.grid(color='black', axis='x', linestyle='-', linewidth=0.5)   
plt.grid(which='minor',color='grey', axis='x', linestyle=':', linewidth=0.5)     
plt.grid(which='minor',color='grey', axis='y', linestyle=':', linewidth=0.5)    
plt.xticks(fontsize=16); plt.yticks(fontsize=16)   
plt.xlabel(r'$Age_{obs}$ (yr)',fontsize=16 )
plt.ylabel(r'$Age_{sim}$ (yr)',fontsize=16 )
## plotting the data
plt.scatter(ytest, ytestsim, color = "red", marker = "o")
plt.plot([0., 30.], [0., 30.], color='k', linestyle='-', linewidth=2)
plt.gcf().set_size_inches(6, 6)
plt.savefig("fig02.png", dpi = 300,  bbox_inches='tight')
plt.show()

## <strong>Part 3: Make your life easier with Keras</strong>

In this part, an implementation of a DNN using functions from Keras is shown.

### Step 1: Build the ANN architecture using Keras

<strong>Task 13:</strong> Modify the following function to implement the same ANN as the one implemented in Part 2.

In [None]:
def ANN_keras(shape):
  
  model = keras.models.Sequential()
  model.add(keras.layers.Input(shape, name="InputLayer"))
  model.add(keras.layers.Dense(16, activation='relu', name='HiddenLayer01'))
  model.add(keras.layers.Dense(1, name='Output'))
  
  model.compile(optimizer = 'adam',
                loss      = 'mse',
                metrics   = ['mae', 'mse'] )
  return model

The following lines help create and instantiate an ANN using the function ```ANN_keras```.

<strong>Task 14:</strong> Use ```Nfx``` to create a "copy" of the ANN defined using ```ANN_keras```.

In [None]:
model= ANN_keras( (,) )
model.summary()

### Step 2: Train and evaluate the model

Now, the model is ready for training. The following function launches the training of the ANN network.

In [None]:
history = model.fit(x train,
                    y train,
                    epochs          = 50,
                    batch_size      = length of training dataset,
                    validation_data = (x test, y test))

The test scores are computed using the optimized parameter set on the test dataset.

In [None]:
score = model.evaluate(xtest_scl, ytest, verbose=0)

print('test / loss      : {:5.4f}'.format(score[0]))
print('test / mae       : {:5.4f}'.format(score[1]))
print('test / mse       : {:5.4f}'.format(score[2]))

We can make a prediction with the Keras-built ANN and compare it with that from your own ANN.

In [None]:
mydata = [ 1, 0.3, 0.1, 0.2, 0.2, 0.5, 0.7, -1. ]
mydata = np.array(mydata).reshape(1,8)
print(mydata.shape)
predictions = model.predict(mydata)
print("predicted age using Keras-built network: ", round(predictions[0,0],2), "yr")
yout, zout = ANN(x = mydata.T, NpL = NpL, Nfx = Nfx, Wts = Wts, bias = bias, ActivFun = ActivFun)
print("predicted age using your neural network: ", round(yout[NL-1][0,0], 2), "yr")

Finally, we can use parameters estimated by Keras and your neural network to test whether you get the same estimates.

In [None]:
param_keras = model.get_weights()
len_par = len(param_keras)
wts_keras = [param_keras[i].T for i in np.arange(0, len_par, 2)]
bias_keras = [param_keras[i].reshape(param_keras[i].shape[0], 1) for i in np.arange(1, len_par, 2)]
ActivFun_Keras = ['relu', 'relu', 'relu', 'linear']
NpL_Keras = [16,32,16,1]
Nfx = 8
yout_k, zout_k = ANN(x = mydata.T, NpL = NpL_Keras, Nfx = Nfx, Wts = wts_keras, bias = bias_keras, ActivFun = ActivFun_Keras)
print("predicted age using Keras-built network: ", round(predictions[0,0],2), "yr")
print("predicted age using your neural network and Keras-estimated parameters: ", round(yout_k[len(NpL_Keras)-1][0,0], 2), "yr")