# DSCI 619 Deep Learning
# Week 2: Perform Regressions Using Deep Learning



**Objectives**

After you complete this module, students will be able to:

+ Perform feature engineering and convert categorical variables to numerical variables
+ Check the model fitting by plotting training loss and validation loss
+ Address overfitting by using dropout in the networks
+ Perform hyperparameter tuning for neuron networks


Linear regression is a classical method in machine learning and deep learning. First, we cover data cleaning and feature engineering in neuron networks.  Next, we will cover the underfitting and overfitting of deep learning models. Third, we will show how to address overfitting by using dropout in the neuron networks. Finally, we will learn how to perform hyperparameter tuning to improve model performance. 

**Readings**

+ Basic regression: Predict fuel efficiency (https://www.tensorflow.org/tutorials/keras/regression)
+ Dropout: A Simple Way to Prevent Neural Networks from Overfitting (https://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf)
+ Introduction to the Keras Tuner (https://www.tensorflow.org/tutorials/keras/keras_tuner)
+ Linear Neural Networks (https://d2l.ai/chapter_linear-networks/index.html)



# Project of Seoul Bike Sharing Demand Using TensorFlow

Let’s look at a real-world project of regression using TensorFlow. By performing this regression model, we will learn more new topics in deep learning.

Our dataset is taken from UCI Machine Learning Repository (at https://archive.ics.uci.edu/ml/datasets/Seoul+Bike+Sharing+Demand).

This dataset containing in SeoulBikeData.csv has the following features/predictors:

+ **Date** : year-month-day
+ **Hour** - Hour of he day
+ **Temperature**-Temperature in Celsius
+ **Humidity** - %
+ **Windspeed** - m/s
+ **Visibility** - 10m
+ **Dew point temperature** - Celsius
+ **Solar radiation** - MJ/m2
+ **Rainfall** - mm
+ **Snowfall** - cm
+ **Seasons** - Winter, Spring, Summer, Autumn
+ **Holiday** - Holiday/No holiday
+ **Functional Day** - NoFunc(Non Functional Hours), Fun(Functional hours)

The corresponding label/target is:

+ **Rented Bike count** - Count of bikes rented at each hour

## Load and Clean the Data

First, we load the data into memory using pandas

In [1]:
import pandas as pd
# df = pd.read_csv('SeoulBikeData.csv', sep = ',')


But Python produces the following error message.

<font color='red'>UnicodeDecodeError </font>: 'utf-8' codec can't decode byte 0xb0 in position 12: invalid start byte

Pandas use a default encoding of 'utf-8' code. But it cannot recognize the data format in the 'utf-8' code. To fix this bug, we can try different encoding formats. Python library supports about one hundred different encodings. Please see the complete list of encoding at https://docs.python.org/3/library/codecs.html#standard-encodings.

Unfortunately, there is no easy way to determine the encoding format. We probably need to guess it. We can always try encoding = "ISO-8859-1", which supports the western European language. If it doesn't work, we should try another encoding from the supported list. (see: https://docs.python.org/3/library/codecs.html#standard-encodings)
    
    

In [2]:
import pandas as pd
df = pd.read_csv('SeoulBikeData.csv', encoding = "ISO-8859-1",sep = ',')
df.head()

Unnamed: 0,Date,Rented Bike Count,Hour,Temperature(°C),Humidity(%),Wind speed (m/s),Visibility (10m),Dew point temperature(°C),Solar Radiation (MJ/m2),Rainfall(mm),Snowfall (cm),Seasons,Holiday,Functioning Day
0,1/12/2017,254,0,-5.2,37,2.2,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
1,1/12/2017,204,1,-5.5,38,0.8,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
2,1/12/2017,173,2,-6.0,39,1.0,2000,-17.7,0.0,0.0,0.0,Winter,No Holiday,Yes
3,1/12/2017,107,3,-6.2,40,0.9,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
4,1/12/2017,78,4,-6.0,36,2.3,2000,-18.6,0.0,0.0,0.0,Winter,No Holiday,Yes


It is always a good idea to check the data type for all columns.

In [3]:
df.dtypes

Date                          object
Rented Bike Count              int64
Hour                           int64
Temperature(°C)              float64
Humidity(%)                    int64
Wind speed (m/s)             float64
Visibility (10m)               int64
Dew point temperature(°C)    float64
Solar Radiation (MJ/m2)      float64
Rainfall(mm)                 float64
Snowfall (cm)                float64
Seasons                       object
Holiday                       object
Functioning Day               object
dtype: object

Next, let's check the missing values in the dataset.

In [4]:
df.isnull().sum(axis = 0)

Date                         0
Rented Bike Count            0
Hour                         0
Temperature(°C)              0
Humidity(%)                  0
Wind speed (m/s)             0
Visibility (10m)             0
Dew point temperature(°C)    0
Solar Radiation (MJ/m2)      0
Rainfall(mm)                 0
Snowfall (cm)                0
Seasons                      0
Holiday                      0
Functioning Day              0
dtype: int64

There is no missing values in all columns. But there are several features are categorical variables. Let's look into it.

In [5]:
catFeatures = ['Seasons', 'Holiday', 'Functioning Day']
df[catFeatures].describe(include='all').loc['unique', :]

Seasons            4
Holiday            2
Functioning Day    2
Name: unique, dtype: object

## Convert Categorical Features to Numerical Features 

We notice that the following features/columns are not numerical variables that include float 64 and int64.
+ Date
+ Seasons
+ Holiday
+ Functioning Day 

They are all object data types. Deep learning can only handle numerical data. We need to convert them to the numerical data such as int64 or float64.

Seasons, Holiday and Functioning Day are categorical variables that have only finite many cases. Let's convert them to numerical variables. We use get_dummies function by specifying drop_first = True to **reduce the redundant feature**.

Let's summarize  the unique values for all the categorical features/variables.

In [6]:
catFeatures = ['Seasons', 'Holiday','Functioning Day']
factors = pd.get_dummies(df[catFeatures],drop_first=True)
factors.head()

Unnamed: 0,Seasons_Spring,Seasons_Summer,Seasons_Winter,Holiday_No Holiday,Functioning Day_Yes
0,False,False,True,True,True
1,False,False,True,True,True
2,False,False,True,True,True
3,False,False,True,True,True
4,False,False,True,True,True


Next, we drop the original categorical variables, then concatenate the numerical features and dummy variables.


In [7]:
df = df.drop(catFeatures,axis=1)
df = pd.concat([df,factors],axis=1)
df.head()

Unnamed: 0,Date,Rented Bike Count,Hour,Temperature(°C),Humidity(%),Wind speed (m/s),Visibility (10m),Dew point temperature(°C),Solar Radiation (MJ/m2),Rainfall(mm),Snowfall (cm),Seasons_Spring,Seasons_Summer,Seasons_Winter,Holiday_No Holiday,Functioning Day_Yes
0,1/12/2017,254,0,-5.2,37,2.2,2000,-17.6,0.0,0.0,0.0,False,False,True,True,True
1,1/12/2017,204,1,-5.5,38,0.8,2000,-17.6,0.0,0.0,0.0,False,False,True,True,True
2,1/12/2017,173,2,-6.0,39,1.0,2000,-17.7,0.0,0.0,0.0,False,False,True,True,True
3,1/12/2017,107,3,-6.2,40,0.9,2000,-17.6,0.0,0.0,0.0,False,False,True,True,True
4,1/12/2017,78,4,-6.0,36,2.3,2000,-18.6,0.0,0.0,0.0,False,False,True,True,True


## Perform Feature Engineering on Date Feature

The Date column is not a categorical variable. It is a DateTime format. Therefore, we need to convert it to datetime format in pandas.

In [8]:

df['Date'] =  pd.to_datetime(df['Date'], infer_datetime_format=True)
df.head()

  df['Date'] =  pd.to_datetime(df['Date'], infer_datetime_format=True)


ValueError: time data "13/12/2017" doesn't match format "%m/%d/%Y", at position 12. You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.

In [9]:
df.dtypes

Date                          object
Rented Bike Count              int64
Hour                           int64
Temperature(°C)              float64
Humidity(%)                    int64
Wind speed (m/s)             float64
Visibility (10m)               int64
Dew point temperature(°C)    float64
Solar Radiation (MJ/m2)      float64
Rainfall(mm)                 float64
Snowfall (cm)                float64
Seasons_Spring                  bool
Seasons_Summer                  bool
Seasons_Winter                  bool
Holiday_No Holiday              bool
Functioning Day_Yes             bool
dtype: object

There may exist some seasonality in the data. Therefore, we want to extract the year, the month from the datetime feature.


In [10]:
#extract the year
df['year'] = df['Date'].dt.year
#extract the month
df['month'] = df['Date'].dt.month


AttributeError: Can only use .dt accessor with datetimelike values

We know that the bike rental may have some differences between weekdays and weekends. Let's extract the day of the week from the Date feature.

In [None]:

df['dayofweek']=df['Date'].dt.dayofweek
df.head()

Next, we need to drop the Date column since we extract its year, month and day in terms of weekend or not.

In [None]:
df = df.drop('Date',axis=1)
df.columns

# Numerically Summarize the Data

Let's numerically summarize continuous variables in the dataset.

In [None]:
import numpy as np
#specify the continuous features
numerics =['Rented Bike Count', 'Hour', 'Temperature(°C)', 'Humidity(%)',
       'Wind speed (m/s)', 'Visibility (10m)', 'Dew point temperature(°C)',
       'Solar Radiation (MJ/m2)', 'Rainfall(mm)', 'Snowfall (cm)', 'dayofweek'
       ]
#summarize it
np.round(df[numerics].describe(), decimals=2)


Let's look at the correlation between all these numerical features.

In [None]:
np.round(df[numerics].corr(), decimals=2)

# Graphically Summarize the Data
Let's summarize the numerical features graphically.

In [None]:
import seaborn as sns
#Pair plot continuous features
#Disable all warnings in Juyter notebook
import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')

sns.pairplot(df[['Rented Bike Count', 'Temperature(°C)', 'Humidity(%)',
       'Wind speed (m/s)', 'Visibility (10m)', 'Dew point temperature(°C)',
       'Solar Radiation (MJ/m2)', 'Rainfall(mm)', 'Snowfall (cm)']], diag_kind='kde')

Let's box plot count against the hour.

In [None]:

sns.boxplot(x='Hour',y='Rented Bike Count',data=df)

Let's box plot count against the month.

In [None]:
sns.boxplot(x='month',y='Rented Bike Count',data=df)

Let's box plot count against the day of week.

In [None]:
sns.boxplot(x='dayofweek',y='Rented Bike Count',data=df)

## Split the Data Into Training and Test Data Set

We need first to combine all features into $X$ and select the label column as $y$.

In [None]:
#Obtain features and label
X = df.drop('Rented Bike Count',axis=1)
y = df['Rented Bike Count']

#Split the data into training and test data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state = 2021)

## Data Normalization for the Neural Networks

It is a good practice to normalize all features in deep learning.

In [None]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
#Fit and transform the training data
X_train= scaler.fit_transform(X_train)
#Only transform the test data
X_test = scaler.transform(X_test)

## Create the Model

Let's use the sequential model for this linear regression. The sequential model is a simple deep learning model. 

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

We specify the neural network architecture.
Let's build a neuron network with the following layers:
+ Input layer with 100 neurons
+ First hidden layer with 50 neurons
+ Second hidden layer with 25 neurons
+ Output layer with one neuron since it is a regression problem

In [None]:
model = keras.Sequential()
# Input layer has 100 neurons
model.add(layers.Dense(100, activation='relu'))
# First hidden layer with 50 neurons
model.add(layers.Dense(50, activation='relu'))
# Second hidden layer with 50 neurons
model.add(layers.Dense(25, activation='relu'))
# Output layer has one and only one neuron
model.add(layers.Dense(1))

# Configure the Model by Seting Optimizer and Loss function

Next, we need to configure the training model by setting the optimization algorithms to find the optimal weights and specify the loss/error function.

+ There are many optimization algorithms available in TensorFlow. We use one of the popular gradient-based algorithms Adam method.
+ **For the regression problem, the loss/error function is the mean squared error**.


In [None]:
#Configure the model by choosing optimizer and loss function
model.compile(optimizer='adam',loss='mse')

## Train the Model

https://www.tensorflow.org/api_docs/python/tf/keras/Model?hl=ja


We train the model by calling the fit method and specifying the following parameters;

+ x is the features
+ y is the label/target
+ batch_size is the number of samples per gradient update. If unspecified, batch_size will default to 32. 
+ epochs is the number of epochs to train the model. An epoch is an iteration over **the entire x and y data provided**. 
+ validation_data is the data on which to evaluate the loss and any model metrics at the end of each epoch. This is the validation/test error since **the model doesn't see this data when it is trained based on x and y**.
+ verbose: 0, 1, or 2. Verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch. Note that the progress bar is not particularly useful when logged to a file, so verbose=2 is recommended when not running interactively (eg, in a production environment).

We run the magic command of **%% time** in the cell. It must be the first line of the code in the cell. It will print out the wall time for this cell. 

We save the model results in history. Therefore, we set verbose = 0 to avoid the printing out training info. 

In [None]:
%%time
#Fix the seed 
tf.random.set_seed(1)
#Fit the model and save the results in history
history = model.fit(x=X_train,y=y_train,batch_size=64,epochs=100,
          validation_data=(X_test,y_test), verbose=0
          )

In [None]:
#Convert the train and validation loss to a df
trainhist = pd.DataFrame(history.history)
#Add the epoch index
trainhist['epoch'] = history.epoch
#Look at the latest performance
trainhist.tail()

Let's visualize the train and validation loss/error on the same plot.

In [None]:
import matplotlib.pyplot as plt
#Plot train loss
sns.lineplot(x='epoch', y ='loss', data =trainhist)
#Plot validation loss
sns.lineplot(x='epoch', y ='val_loss', data =trainhist)
#Add legends
plt.legend(labels=['train_loss', 'val_loss'])

# Underfit and Overfit of the Deep Learning Algorithms

The deep learning model typically has many weights (parameters) to estimate. For example, a CNN model to cover in the later week may have several millions of parameters to estimate. Find the optimal weights in high-dimensional space is very challenging. The algorithm may trap in local minimum instead of the global minimum. It leads to the underfitting of the model.

+ The training loss may steadily decrease with a negative slope
+ The validation loss steadily decreases with a negative slope.

It means the loss function has the opportunity to improve. 
The above graphs show that both the training loss and Val loss decrease dramatically in the beginning. They continuously decrease with negative slopes. 

They can be improved by increasing the number of epochs or using a different number of layers or neurons.

In [None]:

%%time
#Fix the seed
tf.random.set_seed(1)
#Increase the epochs to 10K
history = model.fit(x=X_train,y=y_train,batch_size=64,epochs=10000,
          validation_data=(X_test,y_test), verbose=0
          )

In [None]:
#Convert the train and validation loss to a df
trainhist = pd.DataFrame(history.history)
#Add the epoch index
trainhist['epoch'] = history.epoch
#Look at the latest performance
trainhist.tail()

In [None]:
import matplotlib.pyplot as plt
#Plot train loss
sns.lineplot(x='epoch', y ='loss', data =trainhist)
#Plot validation loss
sns.lineplot(x='epoch', y ='val_loss', data =trainhist)
#Add legends
plt.legend(labels=['train_loss', 'val_loss'])

Deep learning models may overfit the model too. They have many weights to estimate. The model may not only fit the trend but also fit the noises.

By looking at the train and validation loss above, we find that it is overfitting the data due to the following reasons.

+ The training loss  steadily decrease with a negative slope
+ The validation loss steadily increases with a positive slope.
+ The differences between train loss and validation loss are huge.

# Dropout to present Neural Networks from Overfitting

To address the overfitting in deep learning, Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov proposed a smart solution by using dropout.

"Deep neural nets with a large number of parameters are very powerful machine learning
systems. However, overfitting is a serious problem in such networks. Large networks are also
slow to use, making it difficult to deal with overfitting by combining the predictions of many
different large neural nets at test time. Dropout is a technique for addressing this problem.
The key idea is to randomly drop units (along with their connections) from the neural
network during training. This prevents units from co-adapting too much. During training,
dropout samples from an exponential number of different “thinned” networks. At test time,
it is easy to approximate the effect of averaging the predictions of all these thinned networks
by simply using a single unthinned network that has smaller weights. This significantly
reduces overfitting and gives major improvements over other regularization methods. We
show that dropout improves the performance of neural networks on supervised learning
tasks in vision, speech recognition, document classification and computational biology,
obtaining state-of-the-art results on many benchmark data sets."

Source: Dropout: A Simple Way to Prevent Neural Networks from
Overfitting (https://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf)

The dropout can be summarized in the following graph taken from the paper above:

<img src="dropout.jpg">

They also proposed the following practical guide for training dropout neural networks in their paper:
+ **Network Size**: Due to the dropout rate of $p$, if there are $n$ neurons in a given layer, then only $np$ neurons  will keep in the networks after dropout. "Therefore, if an n-sized layer is optimal for a standard neural net on any given task, a good dropout net should have at least $n/p$ units."
+ **Learning Rate and Momentum**: We should use a high learning rate such as 10-100 times the learning rate for a standard neural net without dropout and/or momentum around 0.95 to 0.99 to significantly improve the performance and speed of learning.
+ **Dropout Rate**: Typical dropout rates for hidden layers are in the range of 0.5 to 0.8. For input layers, the typical dropout rate is 0.8.


Let's add dropout and regularization on the weights in Tensorflow to address overfitting:

In [None]:
%%time
from tensorflow.keras.layers import Dropout
from tensorflow.keras.constraints import max_norm
model = keras.Sequential()
# Input layer has 200 neurons
model.add(layers.Dense(200, activation='relu'))
# Add dropout rate of 50%
model.add(Dropout(0.5))
# First hidden layer with 50 neurons
model.add(layers.Dense(100, activation='relu'))
# Add dropout rate of 50%
model.add(Dropout(0.5))
# Second hidden layer with 50 neurons
model.add(layers.Dense(50, activation='relu'))
# Add dropout rate of 50%
model.add(Dropout(0.5))
# Output layer has one and only one neuron
model.add(layers.Dense(1))

#Configure the model
model.compile(optimizer='adam',loss='mse')

#Fix the seed
tf.random.set_seed(1)
#Fit the Model
history = model.fit(x=X_train,y=y_train,batch_size=64,epochs=10000,
          validation_data=(X_test,y_test), verbose=0
          )



In [None]:

#Convert the train and validation loss to a df
trainhist = pd.DataFrame(history.history)
#Add the epoch index
trainhist['epoch'] = history.epoch

import matplotlib.pyplot as plt
#Plot train loss
sns.lineplot(x='epoch', y ='loss', data =trainhist)
#Plot validation loss
sns.lineplot(x='epoch', y ='val_loss', data =trainhist)
#Add legends
plt.legend(labels=['train_loss', 'val_loss'])

It seems that the dropout helps the convergence of the algorithms. Both the train losss and validation loss decay exponetially in the beginning and almost become constant in the end.

Althoug the performance improves, there still have some opportunities to improve. We will learn how to improve it using Keras Tuner

# Hyperparameter Tuning in TensorFlow


There are two types of parameters in  deep learning. (see: https://www.tensorflow.org/tutorials/keras/keras_tuner.)
+ **Model Parameters** that can be estimated from the given data by finding the optimal values. For example, the weights and bias are model parameters in neuron networks
+ **Hyperparamters** that must be specified by data scientists  before training the models.  They are typically two types of hyperparameters in deep learning:
    + **Model Hyperparameters**: for example, the number of hidden layers and the number of neurons in each layer.
    + **Algorithm Hyperparameters**: for example, the learning rate in different optimizers.

Let's look at how to tune the hyperparameters in TensorFlow.


## Install Keras Tuner

We first check whether the keras-tuner was installed or not. If we did not install it, then we should install it.

In [None]:
import sys
#Check the existence of the keras-tuner library
if ( 'keras-tuner' not in sys.modules):
    #If it was not installed, then install it
    !pip install -q -U keras-tuner
#Import the library of keras-tuner
import kerastuner as kt

## Define the Model by Specifying Hyperameter Range

We need to define our model by specifying the hyperparameters. For illustration purposes, we use the same model as before. We specify the following hyperameters:

+ Search the number of neurons from 50-500 with a stepsize of 50 in the first input layer.
+ Search the dropout rate in the first input layer in the range of 0.2-0.8 with a stepsize of 0.1.
+ Search the lerning rate from 0.01, 0.001, or 0.0001.



In [None]:
def model_builder(hp):
  model = keras.Sequential()
 
  # Tune the number of units in the first input layer
  # Search the number of neurons from 50-500 with a stepsize of 50 in the first input layer.
  hp_units1 = hp.Int('units', min_value = 50, max_value = 500, step = 50)
  model.add(layers.Dense(units = hp_units1, activation = 'relu'))
  # Tune the dropout rate in the first input layer
  # Search the dropout rate in the first input layer in the range of 0.2-0.8 with a stepsize of 0.1.
  hp_dropout1 = hp.Float('rate', min_value = 0.2, max_value = 0.8, step = 0.1)
  model.add(Dropout(rate = hp_dropout1))
  # first hidden layer with 100 neurons
  model.add(layers.Dense(100, activation='relu'))
  # add dropout rate of 50%
  model.add(Dropout(0.5))
  # second hidden layer with 50 neurons
  model.add(layers.Dense(50, activation='relu'))
  # add dropout rate of 50%
  model.add(Dropout(0.5))
  # output layer has one and only one neuron
  model.add(layers.Dense(1))
  
 # Tune the learning rate for the optimizer 
 # Search the lerning rate from 0.01, 0.001, or 0.0001.
  hp_learning_rate = hp.Choice('learning_rate', values = [1e-2, 1e-3, 1e-4]) 

  model.compile(optimizer = keras.optimizers.Adam(learning_rate = hp_learning_rate),
                loss = 'mse', 
                metrics = [tf.keras.metrics.MeanSquaredError()])

  return model


## Instantiate the Tuner and Perform Hypertuning

To tune the models, we need to instantiate the model first. We can choose the following four tuners in keras:
+ **BayesianOptimization** (see https://keras-team.github.io/keras-tuner/documentation/tuners/#bayesianoptimization-class)
+ **Hyperband** (see https://keras-team.github.io/keras-tuner/documentation/tuners/#hyperband-class)
+ **RandomSearch** (see https://keras-team.github.io/keras-tuner/documentation/tuners/#randomsearch-class)
+ **Sklearn**   (see https://keras-team.github.io/keras-tuner/documentation/tuners/#sklearn-class)

For illustration purposes, we try Hyperband tuner.

In [None]:
tuner = kt.Hyperband(model_builder, #Specify the model
                     objective = 'val_loss', #Specify the objective funciton
                     max_epochs = 100, #Specify the maximum epochs
                     directory = 'my_dir', #Specify the file path
                     project_name = 'tuningRegression')

We don't want to be overwhelmed by all the outputs of the tuners. It is a good practice to define a callback to clear the training outputs at the end of every training step.  

In [None]:
import IPython
#Clear all the training outputs
class ClearTrainingOutput(tf.keras.callbacks.Callback):
  def on_train_end(*args, **kwargs):
    IPython.display.clear_output(wait = True)

Next, we perform the search on the defined hyperparameter space.

In [None]:
#Perform the search on the defined hyperparameter space by specifying the callback to clear the training outputs
tuner.search(X_train, y_train, epochs = 100, validation_data = (X_test,y_test), callbacks = [ClearTrainingOutput()])


Finally, print out the optimal parameters.

In [None]:
# Get the optimal hyperparameters
best_hps = tuner.get_best_hyperparameters(num_trials = 1)[0]
#Use f-strings to format the outputs
print(f"""
The optimal number of units in the input layer =  {best_hps.get('units')}. 
The optimal droupout rate in the input layer = {best_hps.get('rate')}
The optimal learning rate for the optimizer of Adam = {best_hps.get('learning_rate')}.
""")

## Retrain the Model with the Optimal Hyperparameters

After we get the optimal hyperparameters, we need to retrain the model using them.

In [None]:
# Build the model with the optimal hyperparameters and train it on the data
model = tuner.hypermodel.build(best_hps)
model.fit(X_train, y_train, epochs = 100, validation_data = (X_test,y_test))

This optimal model may be improved again by tuning other hidden layers. You may try it by yourself.

# Save and Load Models in TensorFlow
 
It is very time-consuming to train the model. It may take a couple of days to train and tune the hyperparameters using a laptop and PC. Once a model is trained, the data scientists/researchers like to share the model with codes and optimal weights. Therefore other people don't need to retrain the model. They can load the optimal weights into memory and then forecast the new data, significantly reducing the running time.


## Install h5py and pyyaml Packages

To save the model weights, we need to install two require packages:
+ pyyaml: YAML is a data serialization format designed for human readability and interaction with scripting languages. PyYAML is a YAML parser and emitter for Python. To install this package with conda run:
**conda install -c anaconda pyyaml**
+ h5py: The h5py package provides both a high- and low-level interface to the HDF5 library from Python. The HDF5 can help store and organize large amounts of data very efficiently. To install this package with conda run: 
**conda install -c anaconda h5py**

Let's create a model with the optimal hyperparameters with the same network architecture.

In [None]:
def model_create():
  model = keras.Sequential()
  #Set the optimal units of 450 found by tuner
  model.add(layers.Dense(units = 450, activation = 'relu'))
  #Set he optimal dropout rate of 0.2 found by tuner
  model.add(Dropout(rate = 0.2))
  #First hidden layer with 100 neurons
  model.add(layers.Dense(100, activation='relu'))
  #Add dropout rate of 50%
  model.add(Dropout(0.5))
  #Second hidden layer with 50 neurons
  model.add(layers.Dense(50, activation='relu'))
  #Add dropout rate of 50%
  model.add(Dropout(0.5))
  #Output layer has one and only one neuron
  model.add(layers.Dense(1))
  #Set the optimal learning rate of 0.01 found by tuner
  model.compile(optimizer = keras.optimizers.Adam(learning_rate = 0.01),
                loss = 'mse', 
                metrics = [tf.keras.metrics.MeanSquaredError()])
  return model
# Create the model
model = model_create()

## Save Checkpoints During Training


We want to save the model weights; then we can reuse the model instead of retraining the model. To save the model weights during the training, we need to create a **tf.keras.callbacks.ModelCheckpoint callback** (see https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint
) that saves weights only during training.


In [None]:
# Specify the directory to save the weights
import os
cp_path = "training/cp.regr"
cp_dir = os.path.dirname(cp_path)

# Create a callback to save the model's weights
# We only save the best weights
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=cp_path, save_best_only=True,
                                                 save_weights_only=True,
                                                 verbose=1)

# Train the model and specif the defined callback using callbacks=[]
model.fit(X_train, y_train, epochs = 100, validation_data = (X_test,y_test),
         callbacks=[cp_callback])


## Load the Weights and Evaluate the Model 

Suppose you save the model weights only; then, you can evaluate the model on the new dataset without training this model.
But you still need to specify the model architecture. The model architecture must be the same as that of the model trained. Then you can load the weights in the memory and evaluate them.

In [None]:
#First, create the model
model = model_create()
#Second, load the weights
model.load_weights(cp_path)
#Note here, we don't train the model at all
#Third evaluate the model on the new dataset
loss = model.evaluate(X_test, y_test, verbose=2)

print("Reloaded model from file with loss: {:5.2f}".format(loss[1]))

# Evaluate the Model on the Test Data

Finally, let's evaluate the model. Since it is a regression problem, we can look at the following metrics:

+ Mean squared error
+ Root of mean squared error
+ Mean absolute error

We  may use other metrics. Please consult the scikit-learn document. (see https://scikit-learn.org/stable/modules/model_evaluation.html)

In [None]:
from sklearn.metrics import mean_squared_error,mean_absolute_error

In [None]:
y_pred = model.predict(X_test)

In [None]:
'The mean square error is {0:.4f}'.format(mean_squared_error(y_test,y_pred))

In [None]:
'The root of mean square error is {0:.4f}'.format(mean_squared_error(y_test,y_pred,squared = False))

In [None]:
'The mean absolute error is {0:.4f}'.format(mean_absolute_error(y_test,y_pred))