# <span style="color:#0b486b">  FIT5215: Deep Learning (2023)</span>
***
*CE/Lecturer (Clayton):*  **Dr Trung Le** | trunglm@monash.edu <br/>
*Lecturer (Malaysia):*  **Dr Lim Chern Hong** | lim.chernhong@monash.edu <br/>  <br/>
*Tutor:*  **Mr Tuan Nguyen**  \[tuan.ng@monash.edu \] | **Dr Binh Nguyen** \[binh.nguyen1@monash.edu \] | **Dr Qiuhong Ke** \[Qiuhong.Ke@monash.edu  \] 
<br/> <br/>
Faculty of Information Technology, Monash University, Australia
***

# <span style="color:#0b486b">Tutorial 2: Feed-forward Neural Nets with TensorFlow 2.x</span>
**The purpose of this tutorial is to demonstrate how to work with an open source software library for developing deep neural networks apllications, called TensorFlow. In this tutorial, we will focus on**:  
- ***Inspect the common pipeline of deep learning*.**
- ***How to implement a feedforward neural net for a multi-class classfication problem using TF 2.x*.**

***

### <span style="color:#0b486b"> II.1 Feedforward Neural Network </span> <span style="color:red">***** (highly important)</span>
#### <span style="color:#0b486b"> Tutorial objective </span>

In this tutorial we consider a fairly realistic deep NNs with *three* layers plus the *output* layer. Its architecture is specified as: $16 \rightarrow 10 (ReLU) \rightarrow 20 (ReLU) \rightarrow 15 (ReLu) \rightarrow 26$, meaning that:
- Input size is 16
- First layer has 10 hidden units with ReLU activation function
- Second layer has 20 hidden units with 20 ReLU activiation function
- Third layer has 15 hidden units with 15 ReLU activiation function
- And output layer is logit layer with 26 hidden units

This network, for example, can take the `letter` dataset input with $16$ features and with $26$ classes (A-Z). **Our objective in this tutorial is to implement this specific network in `TensorFlow 2.x`.**

#### <span style="color:#0b486b">Specifying the Neural Network Architecture </span>

We can visualize this network as in the figure below. Note that for readability, the number of hidden units in the figure might not equal exactly to the actual size of the hidden units used.

<img src="./images/DNN_Pipeline.PNG" width="1000">

Furthermore, the above figure shows the pipeline of the entire process for feeding a mini-batch of batch size $32$ into the network. Using ***mini-batch*** is a common way to train deep NNs in practice.

Let us denote the mini-batch by $X_b= \{(x_1, y_1),\dots, (x_{32}, y_{32})\}$. The mini-batch can be stored using a $2D$ tensor with the shape $(32, 16)$. Assume that in this network, we use the activation function $ReLu$ where $ReLu(t)= \max\{0, t\}$. The computation in the forward propagation step is as follows:
- Input $X_b$ with mini-batch size of 32
- $h_1= ReLu(X_b \times W^1 + b^1)\in \mathbb{R}^{32 \times 10}$. 
- $h_2= ReLu(h_1 \times W^2 + b^2\in \mathbb{R}^{32 \times 20}$. 
- $h_3= ReLu(h_2 \times W^3 + b^3\in \mathbb{R}^{32 \times 15}$. 
- $logits= h_3 \times W^4 + b^4 \in \mathbb{R}^{32 \times 26}$
- $p = softmax(logits) \in \mathbb{R}^{32 \times 26}$ <br>

where we note that the activation function is perfomed element-wise and the softmax function is used to transform a vector of scalars to a discrete distribution as: 

$$softmax(z)=\big[\frac{\exp(z_i)}{\sum_{j=1}^{26}{\exp(z_j)}}\big]_{i=1}^{26}$$

The $k$-th row $p_k$ of the matrix $p$ can represent the probability distribution to classify the data point $x_k$ to the classes $1,2,\dots,26$. In particular, we have:

$$p_{km}= p(y_k=m \mid x_k)  \text{ for }  m=1,2,\dots,26$$

**<span style="color:red"> Exercise 1</span>** : Explain why the dimension for $h_1$ is $32\times 10$? Similarly, please work out the dimension for $h2, h3, logits$ and $p$.

Answers admitted.

#### <span style="color:#0b486b">Specifying the Loss Function </span>
Essiential to training a deep NN is the concept of the **loss function**. This function tells us how good the network is predicting, and hence we can use this loss to find the network weights in such a way that the loss can be minimized.

For classification task, a common approach is to use the **cross-entropy** loss function. Given a data-label instance $(x_k,y_k)$ where feature $x_k\in \mathbb{R}^{16}$ and the label $y_k \in \{1,2,...,26\}$ is a numeric label (for example if $x_k$ is in the class 2, then $y_k =2 $ and its one-hot vector $1_{y_k}=[0,1,0,...,0]$). The cross-entroty between the classification distribution $p_k$ returned from the NN and true label distribution $y_k$ is defined as:

$$
cross\_entropy(1_{y_k}, p_k)=-\sum_{j=1}^{26}y_{kj}\log{p_{kj}}=-\log{p_{k,y_k}}
$$. 
This loss basically enforces the model to predict the label as close as the true label by minimizing $cross\_entropy(1_{y_k}, p_k)$

The above loss function was applied for each instance. For the entire current mini-batch, our loss function becomes: 
$$\min \sum_{k=1}^{32}cross\_entropy(1_{y_k}, p_k)$$

**<span style="color:red"> Exercise 2: </span>** : **<span style="color:#0b486b">In the corss-entropy equation above, $y_k$ is the class for $x_k$, explain why the end result is $-\log p_{k,y_k}$.</span>**

The reason is that only $y_{km_k}=1$ and others $y_{km_j}=0, \forall j \neq m_k$.

**<span style="color:red"> Exercise 3: </span>** : **<span style="color:#0b486b">Let $p=[0.1, 0.3, 0.6]$ and $q=[0.0, 0.5, 0.5]$ be two discrete distributions, what is the $cross\_entropy(q,p)$ ?</span>**

In [None]:
import numpy as np
p = np.array([0.1, 0.3, 0.6])
q = np.array([0.0, 0.5, 0.5])
epsilon = 1e-6
ce = - np.sum(q * np.log(p + epsilon))
print(ce)

### <span style="color:#0b486b"> II.2 Implementation with TensorFlow 2.x</span> <span style="color:red">***** (highly important)</span>
We now shall implement the aforementioned network with the architecture of $16 \rightarrow 10 (ReLU) \rightarrow 20 (ReLU) \rightarrow 15 (ReLu) \rightarrow 26$ in Tensorflow using the dataset `letter`. 

This letter dataset can be found at [the LIBSVM website](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#letter). Here is the dataset information:
-  *The objective is to identify each of a large number of black-and-white rectangular pixel displays as one of the 26 capital letters in the English alphabet. The character images were based on 20 different fonts and each letter within these 20 fonts was randomly distorted to produce a file of 20,000 unique stimuli. Each stimulus was converted into 16 primitive numerical attributes (statistical moments and edge counts) which were then scaled to fit into a range of integer values from 0 through 15*

A typical pipeline process of implementing a deep learning model is as follows:

1. **Data processing**: 
    - Load the dataset and split into train, valid, and test sets.  
     
2. **Building the model**: 
    - Build the model using keras layers.
     
3. **Compiling the model**: 
    - Compile the model and specify the optimizer, the loss (e.g., cross-entropy loss) you want to optimize, metrics you want to measure. 
    
4. **Training and evalutating**:
    - Train the model with specific training set and validation set in a number of epochs.
    - Predict on the test set and assess its performance.

#### <span style="color:#0b486b">1. Data Processing </span>

We use `sklearn` to load the dataset.

In [None]:
import os
import numpy as np
from sklearn.datasets import load_svmlight_file

In [None]:
data_file_name= "letter_scale.libsvm"
data_file = os.path.abspath("./Data/" + data_file_name)
X_data, y_data = load_svmlight_file(data_file)
X_data= X_data.toarray()
y_data= y_data.reshape(y_data.shape[0],-1)
print("X data shape: {}".format(X_data.shape))
print("y data shape: {}".format(y_data.shape))
print("# classes: {}".format(len(np.unique(y_data))))
print(np.unique(y_data))

We use `sklearn` to split the dataset into the train, validation, and test sets. 


In [None]:
from sklearn.model_selection import train_test_split
from sklearn import preprocessing

def train_valid_test_split(data, target, train_size, test_size):
    valid_size = 1 - (train_size + test_size)
    X1, X_test, y1, y_test = train_test_split(data, target, test_size = test_size, random_state= 33)
    X_train, X_valid, y_train, y_valid = train_test_split(X1, y1, test_size = float(valid_size)/(valid_size+ train_size))
    return X_train, X_valid, X_test, y_train, y_valid, y_test

Next, we would like to encode the label in the form of numeric vector. For example, we want to turn $y\_data=["cat", "dog", "cat", "lion", "dog"]$ to $y\_data=[0,1,0,2,1]$.

To do this, in the following segment of code, we use the object `le` as an instance of the class `preprocessing.LabelEncoder()` which supports us to transform catefgorial labels in `y_data` to numerical vector.

In [None]:
le = preprocessing.LabelEncoder()
le.fit(y_data.ravel())
y_data= le.transform(y_data.ravel())
y_data = y_data.ravel()
print(y_data[:])

We now use the function defined above to prepare our data for training, validating and testing.

In [None]:
X_train, X_valid, X_test, y_train, y_valid, y_test = train_valid_test_split(X_data, y_data, 
                                                                            train_size=0.8, 
                                                                            test_size=0.1)
y_train= y_train.reshape(-1)
y_test= y_test.reshape(-1)
y_valid= y_valid.reshape(-1)
print(X_train.shape, X_valid.shape, X_test.shape)
print(y_train.shape, y_valid.shape, y_test.shape)
print("lables: {}".format(np.unique(y_train)))

In [None]:
train_size= int(X_train.shape[0])
n_features= int(X_train.shape[1])
n_classes= len(np.unique(y_train))

#### <span style="color:#0b486b">2. Build up the model </span>

We build up a feedforward neural network with the architecture: $16 \rightarrow 10 (ReLU) \rightarrow 20 (ReLU) \rightarrow 15 (ReLu) \rightarrow 26$ in TensorFlow 2.x.

In [None]:
import tensorflow as tf
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.models import Sequential

In [None]:
print(tf.__version__)

In [None]:
tf.random.set_seed(1234)

In [None]:
dnn_model = Sequential()
dnn_model.add(Dense(units=10,  input_shape=(16,), activation='relu'))
dnn_model.add(Dense(units=20, activation='relu'))
dnn_model.add(Dense(units=15, activation='relu'))
dnn_model.add(Dense(units=n_classes, activation='softmax'))

In [None]:
dnn_model.build()
dnn_model.summary()

In [None]:
dnn_model.layers

In [None]:
hidden1 = dnn_model.layers[0]
hidden1
print(hidden1.name)

In [None]:
weights, biases = hidden1.get_weights()

In [None]:
weights.shape

In [None]:
biases.shape

#### <span style="color:#0b486b">3. Compiling Model </span>

In [None]:
dnn_model.compile(optimizer='adam', 
                  loss='sparse_categorical_crossentropy', 
                  metrics=['accuracy'])

#### <span style="color:#0b486b">4. Training and Evaluating </span>

##### <span style="color:#0b486b"> Visualizing Training Progress </span>
In this example, we demonstrate two approaches to visualize training progress, using a History object and using TensorBoard.

**Using History object:** 
The `history object` is the output of `fit()` method, which includes the training parameters (history.params), the list of epochs went through (history.epoch), and most importantly a dictionary (history.history) containing the loss and extra metrics measured at the end of each epoch on the training set and on the validation set (if any). The training needs to be finished before we can visualize using the history output. 

**Using TensorBoard:**
To visualize with TensorBoard, we first need to create a `tensorboard callback` method with specific log directory. We then pass the callback method to `model.fit()` method. Unlike the previous method, the callback method writes log data to the log file on-the-fly. Therefore, by opening Tensorboard on a separate browser, we can train a model and parallelly visualize the training progress.

In [None]:
from tensorflow import keras

logdir = "tf_logs/"

# Init a tensorboard_callback 
tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir, histogram_freq=1)

# Call the fit method, passing the tensorboard_callback 
history = dnn_model.fit(x=X_train, y=y_train, batch_size=32, 
                        epochs=20, 
                        validation_data=(X_valid, y_valid), 
                        callbacks=[tensorboard_callback])

We now can evaluate the trained model on the testing set or any subset.

In [None]:
dnn_model.evaluate(X_test, y_test)  #return loss and accuracy

We now use the trained model to predict $11$-th example in the testing set.

In [None]:
X_new = np.reshape(X_test[10, :], (1,-1))
y_prob = dnn_model.predict(X_new)
y_prob.round(2)

In [None]:
y_pred = np.argmax(dnn_model.predict(X_new), axis=-1)
if y_pred[0]==y_test[10]:
    print("Correct predeiction !")
else:
    print("Incorrect prediction !")

#### <span style="color:#0b486b">5. Visualizing the Performance and Loss Objective Function </span>

The `fit()` method returns a `History object` containing the training parameters (`history.params`), the list of epochs it went through (`history.epoch`), and most importantly a dictionary (`history.history`) containing the loss (`sparse_categorical_crossentropy`) and extra metrics (`accuracy`) as set when compiling model.
There are four keys in the history dictionary: `loss` and `val_loss` measure the loss on the training set and the validation set, respectively, while `accuracy` and `val_accuracy` measure the accuracy on the training set and the validation set.  
The following figure visualize all four metrics with two y-axes, losses (blue lines, in descending) and accuracies (red lines, in asending) 

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

his = history.history 
fig = plt.figure(figsize=(8, 5))
ax = fig.add_subplot(111)
ln1 = ax.plot(his['loss'], 'b--',label='loss')
ln2 = ax.plot(his['val_loss'], 'b-',label='val_loss')
ax.set_ylabel('loss', color='blue')
ax.tick_params(axis='y', colors="blue")

ax2 = ax.twinx()
ln3 = ax2.plot(his['accuracy'], 'r--',label='accuracy')
ln4 = ax2.plot(his['val_accuracy'], 'r-',label='val_accuracy')
ax2.set_ylabel('accuracy', color='red')
ax2.tick_params(axis='y', colors="red")

lns = ln1 + ln2 + ln3 + ln4 
labels = [l.get_label() for l in lns]
ax.legend(lns, labels)
plt.grid(True)
plt.show()

To visualize using Tensorboard on the same jupyter notebook, we first need to load the TensorBoard extension. Then just calling the tensorboard with log file directory. 

In [None]:
%load_ext tensorboard
%tensorboard --logdir tf_logs

In [None]:
from tensorboard import notebook
notebook.list() # View open TensorBoard instances

In [None]:
# Control TensorBoard display. If no port is provided, 
# the most recently launched TensorBoard is used
notebook.display(port=6006, height=1000)

#### <span style="color:#0b486b">6. Playing around with different optimizers</span>
In the following code, we try different optimizers to find the optimal one which has the best performance (evaluated on the validation set). 
It can be done easily by passing an specific optimizer when compiling model. 


In [None]:
optimizer_names = ["Nadam", "Adam", "Adadelta", "Adagrad", "RMSprop", "SGD"]
optimizer_list = [keras.optimizers.Nadam(learning_rate=0.001), keras.optimizers.Adam(learning_rate=0.001), keras.optimizers.Adadelta(learning_rate=0.001), 
                  keras.optimizers.Adagrad(learning_rate=0.001), keras.optimizers.RMSprop(learning_rate=0.001), keras.optimizers.SGD(learning_rate=0.001)]
best_acc = 0
best_i = -1
for i in range(len(optimizer_list)):
    print("*Evaluating with {}\n".format(str(optimizer_names[i])))
    dnn_model.compile(optimizer=optimizer_list[i], loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    dnn_model.fit(x=X_train, y=y_train, batch_size=32, epochs=30, validation_data=(X_valid, y_valid), verbose=0)
    acc = dnn_model.evaluate(X_valid, y_valid)[1]
    print("The valid accuracy is {}\n".format(acc))
    if acc > best_acc:
        best_acc = acc
        best_i = i
print("The best valid accuracy is {} with {}".format(best_acc, optimizer_names[best_i]))
        

#### <span style="color:#0b486b">7. Fine-tuning the learning rate</span>
Learning rate plays an important role when training a deep learning model. In the following code, we will try a simple greedy search to find a good learning rate. 

In [None]:
lr = [1e-2, 5e-3, 1e-3, 1e-4, 1e-5]

best_acc = 0
best_i = -1
for i in range(len(lr)):
    print("*Evaluating with learning rate = {}\n".format(str(lr[i])))
    dnn_model.compile(optimizer=keras.optimizers.Adam(learning_rate=lr[i]), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    dnn_model.fit(x=X_train, y=y_train, batch_size=32, epochs=30, validation_data=(X_valid, y_valid), verbose=0)
    acc = dnn_model.evaluate(X_valid, y_valid)[1]
    print("The valid accuracy is {}\n".format(acc))
    if acc > best_acc:
        best_acc = acc
        best_i = i
print("The best valid accuracy is {} with learning rate {}".format(best_acc, lr[best_i]))

#### <span style="color:#0b486b">8. Save and Load Models</span>

There are different ways to save TensorFlow models depending on the API you're using. As we used the Keras model in this tutorial, saving and loading it quite simple. It can be done by calling `model.save()` and `load_model()` methods. 
When calling `model.save()`, the entire model will be saved including: 
- The architecture, or configuration, which specifies what layers the model contain, and how they're connected.
- A set of weights values (the "state of the model").
- An optimizer (defined by compiling the model).
- A set of losses and metrics (defined by compiling the model or calling add_loss() or add_metric()).

In [None]:
# Saving the entire model to a directory
dnn_model.save('models/my_model.h5')

# Loading the model back 
from tensorflow import keras
loaded_model = keras.models.load_model('models/my_model.h5')

# Checking the loaded model 
print(dnn_model.predict(X_new) == loaded_model.predict(X_new))

#### <span style="color:#0b486b">Save model during training</span>

One major disadvantage of the above saving method is that we cannot save the model during training but only when the training is finished. Therefore, it can be the case when the training was stopped/interrupted and we have to retrain again. To save model during training process, we can use the `ModelCheckpoint` callback allows you to continually save the model both during and at the end of training. 
Some important arguments of the `ModelCheckpoint` callback: 
- `filepath`: checkpoin directory 
- `save_weights_only`: if True, then only the model's weights will be saved (i.e., equal with model.save_weights(filepath)), else the full model is saved (i.e., equal with model.save(filepath) which saves: model' weight, model architecture, optimizer, etc.)
- `save_best_only`: if True, it only saves when the model is considered the "best" and the latest best model according to the quantity monitored will not be overwritten. The "best" model is evaluated based on "mode" and "monitor". For example, if `monitor=val_accuracy` it means that validation accuracy is used to monitor the best checkpoint, and `mode` should be set to `max`. If `monitor=val_loss` it means that validation loss is used instead, and `mode` in this case should be `min`. 

More detail can be found in the link: 
https://www.tensorflow.org/tutorials/keras/save_and_load

In [None]:
# Create a tf.keras.callbacks.ModelCheckpoint callback that saves weights only during training
checkpoint_path = "./checkpoints/cp.ckpt"

# Create a callback that saves the model's weights
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
                                                 save_weights_only=True,
                                                 verbose=1)
# Save the weights using the `checkpoint_path` format
#dnn_model.save_weights(checkpoint_path.format(epoch=0))

# Train the model with the new callback
dnn_model.fit(x=X_train, y=y_train, batch_size=32, 
                        epochs=20, 
                        validation_data=(X_valid, y_valid), 
                       callbacks=[tensorboard_callback, # Callback for writing log 
                                 cp_callback]) # Callback for saving model 


#### <span style="color:#0b486b">Save model during training</span>
If we only saved the model's weight, we need to recreate a same architecture before loading the model weight. 

In [None]:
# Because we alreary created a model, therefore, we just need to load the weight 
# dnn_model = create_model() # skip this step
dnn_model.load_weights(checkpoint_path)

If we saved the entire model (by set `save_weights_only=False`), then the pretrained model can be reloaded by `load_model` method

### <span style="color:#0b486b"> II.3 Two Approaches to Build Up Models with TensorFlow 2.x</span> <span style="color:red">*** (moderately important)</span>

There are two approaches to build up a model with tensorflow 2.x, a simple method using **Sequential API** and a more flexible method using **Functional API**.  

#### <span style="color:#0b486b"> Approach 1: Using `Sequential API`</span>

In [None]:
dnn_model = Sequential()
dnn_model.add(Dense(units=10,  input_shape=(16,), activation='relu'))
dnn_model.add(Dense(units=20, activation='relu'))
dnn_model.add(Dense(units=15, activation='relu'))
dnn_model.add(Dense(units=26, activation='softmax'))
dnn_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

#### <span style="color:#0b486b"> Approach 2: Using `Functional API`</span>

In [None]:
X = tf.keras.layers.Input(shape=(16,)) # declare input layer
h = Dense(units=10, activation= 'relu')(X)
h = Dense(units=20, activation= 'relu')(h)
h = Dense(units=15, activation= 'relu')(h)
h = Dense(units=26, activation= 'softmax')(h)
dnn_model = tf.keras.Model(inputs= X, outputs=h)
dnn_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

We can also declare a class inherited from `tf.keras.Model`

In [None]:
class MyDNN(tf.keras.Model):
    def __init__(self, n_classes= 26):
        super(MyDNN, self).__init__()
        self.n_classes = n_classes
        self.dense1 = tf.keras.layers.Dense(units=10, activation= 'relu')
        self.dense2 = tf.keras.layers.Dense(units=20, activation= 'relu')
        self.dense3 = tf.keras.layers.Dense(units=15, activation= 'relu')
        self.dense4 = tf.keras.layers.Dense(units=self.n_classes, activation= 'softmax')
    
    def call(self,X): #X is the input, method call specifies how to compute the output from the input X
        h = self.dense1(X)
        h = self.dense2(h)
        h = self.dense3(h)
        h = self.dense4(h)
        return h
dnn_model = MyDNN(n_classes= 26)
dnn_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### <span style="color:#0b486b"> II.4 Other approaches to Train a Model with TensorFlow 2.x</span> <span style="color:red">*** (moderately important)</span>

There are two main approaches to training a model with Tensorflow 2.x. The simplest method is the `fit` method as we did before. This method automatically helps us to process data when training (e.g., split an entire dataset into multiple mini-batches), applies callback methods such as saving model or writing TensorBoard and monitors validation performance. 

However, some projects require more handly on training process (for example, doing data augmentation in self-supervised learning or training a generative model that we will learn later in this unit). In this case, we need an ability/understanding to train a model manually. In Tensorflow 2.X, we can do that with `train_on_batch` method. Basically, we will need to *(1) manually split entire dataset into mini-batches and applied data augmentaion (if any)* and *(2) feed training data to `train_on_batch` method*. It returns a training loss (which is pre-defined when compiling model) and an updated model. 

The following code is a simple example (without any data-augmentation). 

In [None]:
n_epochs =20
batch_size = 64
for epoch in range(n_epochs):
    for idx_start in range(0, X_train.shape[0], batch_size):
        idx_end = min(X_train.shape[0], idx_start + batch_size)
        X_batch, y_batch = X_train[idx_start:idx_end], y_train[idx_start:idx_end]
        train_loss_batch = dnn_model.train_on_batch(X_batch, y_batch)  #return the batch loss
        
    train_loss, train_acc = dnn_model.evaluate(x= X_train, y= y_train, batch_size= 64, verbose= 0)
    valid_loss, valid_acc = dnn_model.evaluate(x= X_valid, y= y_valid, batch_size= 64, verbose= 0)
    print('Epoch {}: train acc={:.4f}, train loss={:.4f} | valid acc={:.4f}, valid loss= {:.4f}'.format(epoch +1, 
                                                                                                        train_acc, 
                                                                                                        train_loss, 
                                                                                                        valid_acc, 
                                                                                                        valid_loss))

### <span style="color:#0b486b"> Additional Exercises </span> 

1. Write your own code to save a trained model to the hard disk and restore this model, then use the restored model to output the prediction result on the test set.

2. Insert new code to the above code to enable outputting to TensorBoard the values of `training loss`, `training accuracy`, `valid loss`, and `valid accuracy` at the end of epochs. You can refer to the code [here](https://www.tensorflow.org/tensorboard/get_started).

3. Write code to do regression on the dataset `cadata` which can be downloaded [here](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html). Note that for a regression problem, you need to use the `L2` loss instead of the `cross-entropy` loss as in a classification problem. 

4. Using the problem in this tutorial, however, using a much deeper network (i.e., $16 \rightarrow 100 (ReLU) \rightarrow 200 (ReLU) \rightarrow 200 (ReLU) \rightarrow 100 (ReLu) \rightarrow 26$). Applying callback methods to save the model on training and writing a TensorBoard. Visualize `training loss`, `training accuracy`, `valid loss`, and `valid accuracy`. Provide observation and explanation of any issue if happen (hint, overfitting issue). 

5. Build up a more complex feedforward neural network with `Functional API` method  as shown in figure below. The network splits into two branches and then merges in the last layer. The concatenate operation is in the last dimenstion (for example, two arrays [10,15], [10,15] will be concatenated to an array [10,30].

<img src="./images/feed-forward-2branches.PNG" width="1000">

### <span style="color:#FFA500"> Solution for exercise 1

A Keras model can be saved and restored via two functions `save()` and `load_model()`.

In [None]:
from tensorflow.keras import models 
checkpoint_path = "ckpt/tf2"

dnn_model.save(checkpoint_path)
reconstructed_model = models.load_model(checkpoint_path)

print("Model is saved to: {}".format(checkpoint_path))

We now can evaluate the reconstructed model on the test sett.

In [None]:
reconstructed_model.evaluate(X_test, y_test)

### <span style="color:#FFA500"> Solution for exercise 2

Firstly, we define a deep learning model. 

In [None]:
import tensorflow as tf
from tensorflow.keras.layers import Dense
X = tf.keras.layers.Input(shape=(16,)) # declare input layer
h = Dense(units=10, activation= 'relu')(X)
h = Dense(units=20, activation= 'relu')(h)
h = Dense(units=15, activation= 'relu')(h)
h = Dense(units=26, activation= 'softmax')(h)
dnn_model = tf.keras.Model(inputs= X, outputs=h)

Next, let us combine the defined model with an optimizer, loss, and metrics.
Adding a `tf.keras.callbacks.TensorBoard` callback will enable outputting the values of `training loss`, `training accuracy`, `valid loss`, and `valid accuracy` at the end of epochs.

In [None]:
import datetime
dnn_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
log_dir = "logs/tf2/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")  # declare the directory to save logs.
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)
dnn_model.fit(x=X_train, y=y_train, batch_size=32, epochs=100, validation_data=(X_valid, y_valid), callbacks=[tensorboard_callback])

- Open command line, nevigate to the folder of this tute and run **> tensorboard --logdir "logs/tf2"**

### <span style="color:#FFA500"> Solution for exercise 3

In [None]:
# We load and process the dataset
data_file_name= "cadata.libsvm"
data_file = os.path.abspath("./Data/" + data_file_name)
X_data, y_data = load_svmlight_file(data_file)
X_data= X_data.toarray()
y_data= y_data.reshape(y_data.shape[0],-1)
print("X data shape: {}".format(X_data.shape))
print("y data shape: {}".format(y_data.shape))
print("x-min={}, x-max={}".format(np.min(X_data), np.max(X_data)))
print("We need to scale the features of this data into [-1,1]")

In [None]:
# We scale the features of this data into [-1,1]
from sklearn.preprocessing import MinMaxScaler
X_data= MinMaxScaler(feature_range= (-1,1)).fit_transform(X_data)
print("x-min={}, x-max={}".format(np.min(X_data), np.max(X_data)))

In [None]:
print("Before scaling: y-min ={}, y-max ={}".format(np.min(y_data), np.max(y_data)))
y_data= MinMaxScaler(feature_range= (-1,1)).fit_transform(y_data)
print("After scaling: y-min ={}, y-max ={}".format(np.min(y_data), np.max(y_data)))
print("Next step is to split the dataset into train (80%), valid (10%), and test (10%)")

In [None]:
# We split train, valid and test data
X_train, X_valid, X_test, y_train, y_valid, y_test = train_valid_test_split(X_data, y_data, train_size=0.8, test_size=0.1)
y_train= y_train.reshape(-1)
y_test= y_test.reshape(-1)
y_valid= y_valid.reshape(-1)
print(X_train.shape, X_valid.shape, X_test.shape)
print(y_train.shape, y_valid.shape, y_test.shape)
print("Three sets are ready! Next step is to build up a deep neural network.")

In [None]:
regression_model = Sequential()
regression_model.add(Dense(units=1)) # output has only one neuron
regression_model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.0001), loss='mean_squared_error')

In [None]:
history = regression_model.fit(x=X_train, y= y_train, batch_size= 32, epochs= 50, validation_data = (X_valid, y_valid))

In [None]:
# plot losses during training
from matplotlib import pyplot
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='validation')
pyplot.legend()
pyplot.show()

We now can evaluate our regression model on the test set.

In [None]:
regression_model.evaluate(X_test, y_test) 

### <span style="color:#FFA500"> Solution for exercise 4

4. Using the problem in this tutorial, however, using a much deeper network (i.e., $16 \rightarrow 100 (ReLU) \rightarrow 200 (ReLU) \rightarrow 200 (ReLU) \rightarrow 100 (ReLu) \rightarrow 26$). Applying callback methods to save the model on training and writing a TensorBoard. Visualize `training loss`, `training accuracy`, `valid loss`, and `valid accuracy`. Provide observation and explanation of any issue if happen (hint, overfitting issue). 

In [None]:
# Build a new model with much more parameters 
X = tf.keras.layers.Input(shape=(16,)) #declare input layer
h = Dense(units=100, activation= 'relu')(X)
h = Dense(units=200, activation= 'relu')(h)
h = Dense(units=200, activation= 'relu')(h)
h = Dense(units=100, activation= 'relu')(h)
h = Dense(units=26, activation= 'softmax')(h)
dnn_model = tf.keras.Model(inputs= X, outputs=h)
dnn_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

In [None]:
from tensorflow import keras
# Declare a callback for writing a TensorBoard 
logdir = "tf_logs/"

# Init a tensorboard_callback 
tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir)

In [None]:
# Create a tf.keras.callbacks.ModelCheckpoint callback that saves weights only during training
checkpoint_path = "models/cp_deeper.ckpt"

# Create a callback that saves the model's weights
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
                                                 save_weights_only=True,
                                                 verbose=1)

# Train the model with the new callback
history = dnn_model.fit(x=X_train, y=y_train, batch_size=32, 
                        epochs=20, 
                        validation_data=(X_valid, y_valid), 
                       callbacks=[tensorboard_callback, # Callback for writing log 
                                 cp_callback]) # Callback for saving model 

We now can evaluate the trained model on the testing set or any subset.

In [None]:
dnn_model.evaluate(X_test, y_test)  #return loss and accuracy

In [None]:
# # Load the TensorBoard notebook extension.
# %load_ext tensorboard
# %tensorboard --logdir tf_logs/

In [None]:
import pandas as pd
import matplotlib.pyplot as plt


his = history.history 
fig = plt.figure(figsize=(8, 5))
ax = fig.add_subplot(111)
ln1 = ax.plot(his['loss'], 'b--',label='loss')
ln2 = ax.plot(his['val_loss'], 'b-',label='val_loss')
ax.set_ylabel('loss', color='blue')
ax.tick_params(axis='y', colors="blue")

ax2 = ax.twinx()
ln3 = ax2.plot(his['accuracy'], 'r--',label='accuracy')
ln4 = ax2.plot(his['val_accuracy'], 'r-',label='val_accuracy')
ax2.set_ylabel('accuracy', color='red')
ax2.tick_params(axis='y', colors="red")


lns = ln1 + ln2 + ln3 + ln4 
labels = [l.get_label() for l in lns]
ax.legend(lns, labels)
plt.grid(True)
plt.show()

Some observations: 

(1) The training loss has decreased over time while the validation loss seems saturated after epoch 7th. In fact, the validation loss even increases a bit in epoch 8th. 

(2) Analogously, while the training accuracy is increased over time, the validation accuracy is saturated after epoch 7 and a drop in epochs 8th and 16th. 

### <span style="color:#FFA500"> Solution for exercise 5

<img src="./images/feed-forward-2branches.PNG" width="1000">

In [None]:
# Recall the model architecture as figure above 
# Implement using Functional API 
X = tf.keras.layers.Input(shape=(16,)) #declare input layer
h = Dense(units=10, activation= 'relu')(X)

# First branch 
h1 = Dense(units=20, activation= 'relu')(h)
h1 = Dense(units=15, activation= 'relu')(h1)

# Second branch 
h2 = Dense(units=20, activation= 'relu')(h)
h2 = Dense(units=15, activation= 'relu')(h2)

# Concatenate in the last dimention 
h = tf.concat([h1,h2], axis=-1)

# Last layer 
h = Dense(units=26, activation= 'softmax')(h)
dnn_model = tf.keras.Model(inputs= X, outputs=h)
dnn_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

In [None]:
dnn_model.summary()

---
### <span style="color:#0b486b"> <div  style="text-align:center">**THE END**</div> </span>