## **Heston hyper-parameter Depth tuning**

In [1]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:90% !important; }</style>"))

Load the libraries

In [2]:
import pandas as pd
import datetime, os
import numpy as np
import numpy.random as npr
from pylab import plt, mpl

import time, timeit

from scipy.stats import norm
from scipy import optimize
import scipy.integrate as integrate
import scipy.special as special 

import tensorflow as tf
from tensorflow.keras.callbacks import EarlyStopping, TensorBoard
from tensorboard.plugins.hparams import api as hp
from tensorflow import keras
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler

import matplotlib.pyplot as plt
import seaborn as sns

# Load the TensorBoard notebook extension
%load_ext tensorboard

Load the Heston data

In [3]:
#To read the import the csv-file, use:
raw_Options_input_train = pd.read_csv (r"/Users/Marcklein/Desktop/Master Thesis/Option pricing using Neural Networks/Python/Heston/Heston_data_input_deep_train.csv")
raw_Options_output_train = pd.read_csv (r"/Users/Marcklein/Desktop/Master Thesis/Option pricing using Neural Networks/Python/Heston/Heston_data_output_deep_train.csv")
raw_Options_input_test = pd.read_csv (r"/Users/Marcklein/Desktop/Master Thesis/Option pricing using Neural Networks/Python/Heston/Heston_data_input_deep_test.csv")
raw_Options_output_test = pd.read_csv (r"/Users/Marcklein/Desktop/Master Thesis/Option pricing using Neural Networks/Python/Heston/Heston_data_output_deep_test.csv")

#Creates some unnamed column in the beginning, delete it:
del raw_Options_input_train['Unnamed: 0']
del raw_Options_output_train['Unnamed: 0']
del raw_Options_input_test['Unnamed: 0']
del raw_Options_output_test['Unnamed: 0']


Copy it so we dont mess anything up

In [4]:
Options_input_train = raw_Options_input_train.copy()
Options_output_train = raw_Options_output_train.copy()
Options_input_test = raw_Options_input_test.copy()
Options_output_test = raw_Options_output_test.copy()

Since the standard deviation is calculated by taking the sum of the squared deviations from the mean, a zero standard deviation can only be possible when all the values of a variable are the same (all equal to the mean). In this case, those variables have no discriminative power so they can be removed from the analysis. They cannot improve any classification, clustering or regression task. Many implementations will do it for you or throw an error about a matrix calculation.

### **Data preparation**

We split our dataset into a training set and a test set (validation set is taken from the training set during model.fit).

In [5]:
# 1/3 of total train-data-set for training and validating and 0.5 of total test-data-set for testting
train_dataset = Options_input_train.sample(frac=0.3333333333333333, random_state=42)
test_dataset = Options_input_test.sample(frac=0.5, random_state=42)

train_labels = Options_output_train.sample(frac=0.3333333333333333, random_state=42)
test_labels = Options_output_test.sample(frac=0.5, random_state=42)

Check the overall statistics

In [6]:
train_stats = train_dataset.describe().T
test_stats = test_dataset.describe().T

In [7]:
#normalize the data
def norm(x):
    return (x - train_stats['min']) / (train_stats['max']-train_stats['min'])
normed_train_data = norm(train_dataset).values

def norm_test(x):
    return (x - train_stats['min']) / (train_stats['max']-train_stats['min'])
normed_test_data = norm_test(test_dataset).values

#make the labels into numpy array just like the normed training data
train_labels = np.asarray(train_labels)
test_labels = np.asarray(test_labels)

#check the shapes
print("Input train data:", normed_train_data.shape, " Output train data:", train_labels.shape)
print("Input test data:", normed_test_data.shape, " Output test data:", test_labels.shape)

Input train data: (199651, 7)  Output train data: (199651, 10)
Input test data: (9998, 7)  Output test data: (9998, 10)


### **The hyperparameter testing-model**

We start by initializing all the hyperparameters that we want to asses. We then set the metrics of the model to "mean squared error". Since Tensorboard works with log files that are created during the training process we create logs for the training process that records the losses, metrics and other measures during training.

In [8]:
#The hyperparameters & their values to be tested are stored in a special type called HParam
HP_LAYERS = hp.HParam('layers', hp.Discrete([2, 3, 4, 5, 6]))

#Setting the Metric to MSE (Mean Squared Error)
METRIC_MSE = 'mean_squared_error'

# Clear any logs from previous runs
!rm -rf ./logs/ 

#Creating & configuring log files
with tf.summary.create_file_writer('logs/hparam_tuning').as_default():
    hp.hparams_config(
        hparams=[HP_LAYERS],
        metrics=[hp.Metric(METRIC_MSE, display_name='mean_squared_error')],
        )

Now we create a function to train and validate the model which will take the hyperparameters as arguments. Each combination of hyperparameters will run for # epochs and the hyperparameters are provided in an hparams dictionary and used throughout the training function. In this notebook we will only focus on the depth parameter. There will therefore only be one hyper-parameter to tune!

**DEPTH:**
- https://math.stackexchange.com/questions/3335072/how-many-parameters-does-the-neural-network-have
- https://towardsdatascience.com/counting-no-of-parameters-in-deep-learning-models-by-hand-8f1716241889

In Oosterlee et al. they say that for the Heston model, a DNN with 4 hidden layers and 400 neurons in each layer is the optimal choice. We will check the validity. In order to compare the depth effects we need the complexity to be approximately the same. The number of parameters (weights and biases) should therefore be as close as possible across the models. We will use their best guess as the one to go from. Recall inputs=7 and outputs=10, so we get that the # of parameters for the base model is 7*400+400*400+400*400+400*400+400*10=486,800 weights and 400+400+400+400+10=1,610 biases. In total we get #params=486,800+1,610=488,410

Calculate number of neurons for each layers (for one layer we will need 27133.333333 neurons)

In [9]:
'''from scipy import optimize

def f(x):
    #return((7*x+x*x+x*10) + (x+x+10) - 488410) # 2 layers
    #return((7*x+x*x+x*x+x*10) + (x+x+x+10) - 488410) # 3 layers
    return((7*x+x*x+x*x+x*x+x*10) + (x+x+x+x+10) - 488410) # 4 layers
    #return((7*x+x*x+x*x+x*x+x*x+x*10) + (x+x+x+x+x+10) - 488410) # 5 layers
    #return((7*x+x*x+x*x+x*x+x*x+x*x+x*10) + (x+x+x+x+x+x+10) - 488410) # 6 layers

root = optimize.brentq(f, 0, 1000)
root''';

In [10]:
#weight and bias initializers
weights_initializer = keras.initializers.GlorotUniform(seed=42)
bias_initializer = keras.initializers.Zeros()

# Display training progress by printing a single dot for each completed epoch
class PrintDot(keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs):
        if epoch % 100 == 0: print('')
        print('.', end='')


#A function that trains and validates the model on a variety of hyper-parameters and returns the MSE
def train_val_model(hparams):
    #Keras sequential model with Hyperparameters passed from the argument
    if hparams[HP_LAYERS] == 2:
         model = keras.models.Sequential([
             #Layer to be used as an entry point into a Network (true number of neurons 689.4207751955869)
             keras.layers.InputLayer(input_shape=[len(train_dataset.keys())]),
             #Dense layer 1   
             keras.layers.Dense(689, kernel_initializer = weights_initializer,
                                activation = 'relu', bias_initializer = bias_initializer, name='Layer_1'),
             #Dense layer 2
             keras.layers.Dense(689, kernel_initializer = weights_initializer,
                                activation = 'relu', bias_initializer = bias_initializer, name='Layer_2'),
             #Output layer
             keras.layers.Dense(10, activation='linear', name='Output_layer')])
            
            
    elif hparams[HP_LAYERS] == 3:
        model = keras.models.Sequential([
             #Layer to be used as an entry point into a Network (true number of neurons 489.1912585224469)
             keras.layers.InputLayer(input_shape=[len(train_dataset.keys())]),                
             #Dense layer 1   
             keras.layers.Dense(489, kernel_initializer = weights_initializer,
                                activation = 'relu', bias_initializer = bias_initializer, name='Layer_1'),
             #Dense layer 2
             keras.layers.Dense(489, kernel_initializer = weights_initializer,
                                activation = 'relu', bias_initializer = bias_initializer, name='Layer_2'),
            #Dense layer 3
             keras.layers.Dense(489, kernel_initializer = weights_initializer,
                                activation = 'relu', bias_initializer = bias_initializer, name='Layer_3'),
             #Output layer
             keras.layers.Dense(10, activation='linear', name='Output_layer')])
        
        
    elif hparams[HP_LAYERS] == 4:
        model = keras.models.Sequential([
             #Layer to be used as an entry point into a Network (base number of neurons)
             keras.layers.InputLayer(input_shape=[len(train_dataset.keys())]), 
             #Dense layer 1   
             keras.layers.Dense(400, kernel_initializer = weights_initializer,
                                activation = 'relu', bias_initializer = bias_initializer, name='Layer_1'),
             #Dense layer 2
             keras.layers.Dense(400, kernel_initializer = weights_initializer,
                                activation = 'relu', bias_initializer = bias_initializer, name='Layer_2'),
            #Dense layer 3
             keras.layers.Dense(400, kernel_initializer = weights_initializer,
                                activation = 'relu', bias_initializer = bias_initializer, name='Layer_3'),
            #Dense layer 4
             keras.layers.Dense(400, kernel_initializer = weights_initializer,
                                activation = 'relu', bias_initializer = bias_initializer, name='Layer_4'),
             #Output layer
             keras.layers.Dense(10, activation='linear', name='Output_layer')])
        
        
    elif hparams[HP_LAYERS] == 5:
        model = keras.models.Sequential([
             #Layer to be used as an entry point into a Network (true number of neurons 346.68892527879603)
             keras.layers.InputLayer(input_shape=[len(train_dataset.keys())]), 
             #Dense layer 1   
             keras.layers.Dense(347, kernel_initializer = weights_initializer,
                                activation = 'relu', bias_initializer = bias_initializer, name='Layer_1'),
             #Dense layer 2
             keras.layers.Dense(347, kernel_initializer = weights_initializer,
                                activation = 'relu', bias_initializer = bias_initializer, name='Layer_2'),
            #Dense layer 3
             keras.layers.Dense(347, kernel_initializer = weights_initializer,
                                activation = 'relu', bias_initializer = bias_initializer, name='Layer_3'),
            #Dense layer 4
             keras.layers.Dense(347, kernel_initializer = weights_initializer,
                                activation = 'relu', bias_initializer = bias_initializer, name='Layer_4'),
            #Dense layer 5
             keras.layers.Dense(347, kernel_initializer = weights_initializer,
                                activation = 'relu', bias_initializer = bias_initializer, name='Layer_5'),
             #Output layer
             keras.layers.Dense(10, activation='linear', name='Output_layer')])
        
        
    elif hparams[HP_LAYERS] == 6:
        model = keras.models.Sequential([
             #Layer to be used as an entry point into a Network (true number of neurons 310.2464605462682)
             keras.layers.InputLayer(input_shape=[len(train_dataset.keys())]), 
             #Dense layer 1   
             keras.layers.Dense(310, kernel_initializer = weights_initializer,
                                activation = 'relu', bias_initializer = bias_initializer, name='Layer_1'),
             #Dense layer 2
             keras.layers.Dense(310, kernel_initializer = weights_initializer,
                                activation = 'relu', bias_initializer = bias_initializer, name='Layer_2'),
            #Dense layer 3
             keras.layers.Dense(310, kernel_initializer = weights_initializer,
                                activation = 'relu', bias_initializer = bias_initializer, name='Layer_3'),
            #Dense layer 4
             keras.layers.Dense(310, kernel_initializer = weights_initializer,
                                activation = 'relu', bias_initializer = bias_initializer, name='Layer_4'),
            #Dense layer 5
             keras.layers.Dense(310, kernel_initializer = weights_initializer,
                                activation = 'relu', bias_initializer = bias_initializer, name='Layer_5'),
            #Dense layer 6
             keras.layers.Dense(310, kernel_initializer = weights_initializer,
                                activation = 'relu', bias_initializer = bias_initializer, name='Layer_6'),
             #Output layer
             keras.layers.Dense(10, activation='linear', name='Output_layer')])
        
    else:
        raise ValueError("unexpected layer number: %r" % (hparams[HP_LAYERS],))
        
            
    
    optimizer = tf.keras.optimizers.Adam(learning_rate=0.0001, beta_1=0.9, beta_2=0.999,
                             epsilon=1e-07, amsgrad=False, name='Adam')
    #Compiling the model
    model.compile(optimizer=optimizer, 
                  loss='mean_squared_error', #Computes the mean of squares of errors between labels and predictions
                  metrics=['mean_squared_error']) #Computes the mean squared error between y_true and y_pred
    
    #Training the network
    model.fit(normed_train_data, train_labels, 
         batch_size=32, 
         epochs=50,
         verbose=0,
         validation_split=0.2,
         callbacks=[PrintDot()])
    
    _, mse = model.evaluate(normed_test_data, test_labels)
    return mse

The following function will initiate the training process with the hyperparameters to be assessed and will create a summary based on the MSE value returned by the train_test_model function and writes the summary with the hyperparameters and final accuracy(MSE) in logs.

In [11]:
def run(run_dir, hparams):
    with tf.summary.create_file_writer(run_dir).as_default():
        hp.hparams(hparams)  # record the values used in this trial
        mse = train_val_model(hparams)
        tf.summary.scalar(METRIC_MSE, mse, step=10)

We will now train the model for each combination of the hyperparameters

In [12]:
%%time

#A unique number for each training session
session_num = 0

#Nested for loop training with all possible  combinathon of hyperparameters
for layers in HP_LAYERS.domain.values:
    hparams = {
        HP_LAYERS: layers
        }
    run_name = "run-%d" % session_num
    print('--- Starting trial: %s' % run_name)
    print({h.name: hparams[h] for h in hparams})
    start = timeit.default_timer()
    run('logs/hparam_tuning/' + run_name, hparams)
    elapsed_time = timeit.default_timer() - start
    print('Time used for trial: {}, took {:.2f} seconds\n'.format(run_name, elapsed_time))
    session_num += 1


--- Starting trial: run-0
{'layers': 2}

Time used for trial: run-0, took 1155.29 seconds

--- Starting trial: run-1
{'layers': 3}

Time used for trial: run-1, took 1210.04 seconds

--- Starting trial: run-2
{'layers': 4}

Time used for trial: run-2, took 975.83 seconds

--- Starting trial: run-3
{'layers': 5}

Time used for trial: run-3, took 1133.82 seconds

--- Starting trial: run-4
{'layers': 6}

Time used for trial: run-4, took 1036.56 seconds

CPU times: user 3h 55min 29s, sys: 16min 59s, total: 4h 12min 28s
Wall time: 1h 31min 51s


It’s time to launch TensorBoard. Use the following commands to launch tensorboard.

In [13]:
%tensorboard --logdir logs/hparam_tuning

Reusing TensorBoard on port 6006 (pid 1992), started 14:32:42 ago. (Use '!kill 1992' to kill it.)

Once it is launched, you will see a beautiful dashboard. Click on the HPARAMS tab to see the hyperparameter logs.

In "Table View" all the hyperparameter combinations and the respective accuracy will be displayed in a beautiful table as. The left side of the dashboard provides a number of filtering capabilities such as sorting based on the metric, filtering based on specific type or value of hyperparameter, filtering based on status etc.

The Parallel Coordinates View shows each run as a line going through an axis for each hyperparameter and metric. The interactive plot allows us to mark a region which will highlight only the runs that pass through it. The units if each hyperparameter can also be changed between linear, logarithmic and quantile values. This is extremely useful in understanding the relationships between the hyperparameters. We can select the optimum hyperparameters just by selecting the least MSE (run your mouse over the line)

The Scatter Plot View plots each of the hyperparameter and the given metric against the metric.This helps us understand how different values of each parameter correlates to the metric.

LINKS:

https://analyticsindiamag.com/parameter-tuning-tensorboard/

https://www.tensorflow.org/tensorboard/hyperparameter_tuning_with_hparams

https://medium.com/ml-book/neural-networks-hyperparameter-tuning-in-tensorflow-2-0-a7b4e2b574a1

https://github.com/tensorflow/tensorboard/blob/master/tensorboard/plugins/hparams/summary_v2.py



IDEAS: 

- HP_LEARNING_RATE = hp.HParam("learning_rate", hp.RealInterval(1e-5, 1e-1))

- HP_L2 = hp.HParam('l2 regularizer', hp.RealInterval(.001,.01))

- HP_DROPOUT = hp.HParam('dropout', hp.RealInterval(0.3, 0.8))