# Hyperparameter optimization

This notebook explores the effect of various hyperparameters on the performance of SCINet when training on cryptocurrency data, particularly Bitcoin. We will here only address those hyperparameters that are directly accessible via the 'train_scinet' or 'preprocess' functions. Of course, we are aware that effects of different hyperparameters may correlate which means the per parameter sweep as performed here is suboptimal. That said, this approach does grant a lot of insight into the way the different hyperparameters influence the model.

First the necessary modules en scripts are imported.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import os
import sys
from tqdm import tqdm

cwd = os.getcwd()
BASE_DIR = os.path.dirname(os.path.dirname(cwd))

sys.path.insert(0, BASE_DIR) #add base to path for relative imports
os.chdir('../..')

### Per sample versus per column normalisation

The first thing we will investigate is the effect of the method of normalisation on the data. That is, we can either normalize every sample individually or the column as a whole. In the latter case, we apply a log function before normalising to deal with the different orders of magnitude in some features (this is the case although it is not explicitly shown here). Also, an option to skip the training (and use precomputed models instead) is available in the form of 'train'. Set it to to 'true' in case you want to train explicitly.

In [3]:
from base.train_scinet import train_scinet
from utils.data_loading import load_data
from utils.preprocess_data import preprocess

data_format=["open","high","low","close","Volume BTC","Volume USDT","tradecount"]

X_len = 256
Y_len = 24

sampling = [True,False]

means = []
stds = []

data_test = {}

for normalize_per_sample in sampling:

    standardization_settings = {'per_sample': normalize_per_sample,
                            'leaky': False,
                            'mode': 'log', #only if per sample is false, choose from log, sqrt or lin
                            'sqrt_val': 2, #of course only if mode is sqrt
                            'total mean': [],
                            'total std': []}

    pairs = ["HUFL", "HULL", "MUFL", "MULL", "LUFL", "LULL", "OT"]
 
    data, mean ,std = load_data('Binance_BTCUSDT_minute', pairs)

    

    data_proc = preprocess(   data = data,
                        symbols = pairs,
                        data_format = data_format,
                        fraction = 1,
                        train_frac = .7,
                        val_frac = .15,
                        test_frac = .15,
                        X_LEN = X_len,
                        Y_LEN = Y_len,
                        OVERLAPPING = True,
                        STANDARDIZE = True,
                        standardization_settings = standardization_settings
                        )

    means.append(data_proc['mean'])
    stds.append(data_proc['std'])

    data_test[str(normalize_per_sample)]= {'X_test': data_proc['X_test'],'y_test_unnormalized': data_proc['y_test_unnormalized']}

    train = False
    if train:
        results = train_scinet( X_train = data_proc["X_train"].astype('float32'),
                            y_train = data_proc["y_train"].astype('float32'),
                            X_val = data_proc["X_val"].astype('float32'),
                            y_val = data_proc["y_val"].astype('float32'),
                            X_test = data_proc["X_test"].astype('float32'),
                            y_test = data_proc["y_test"].astype('float32'),
                            epochs = 1,
                            batch_size = 128,
                            X_LEN = X_len,
                            Y_LEN = [Y_len],
                            output_dim = [data_proc["X_train"].shape[2]],
                            selected_columns = [[]],
                            hid_size= 8,
                            num_levels= 4,
                            kernel = 5,
                            dropout = .5,
                            loss_weights= [1],
                            learning_rate = 0.001,
                            probabilistic = False)

        results[0].save_weigths(f'/exp/hyperparams/saved_models/model_sample_{str(normalize_per_sample)}.h5')

        

            

Starting data preprocessing...
   48740.22  48745.96  48727.47  48727.47.1   2.27206  110730.9135  136.0
0  48763.11  48763.12  48736.70    48736.73   5.33108  259880.1205  427.0
1  48778.58  48778.58  48750.37    48763.12   6.87389  335219.0368  389.0
2  48760.37  48778.58  48746.39    48778.58  10.58951  516291.2896  425.0
3  48799.99  48800.00  48756.93    48760.37  12.24525  597357.8390  535.0
4  48795.99  48800.00  48795.99    48800.00   7.55759  368810.1891  423.0 (49997, 7)
Making train/validation/test splits...
Making samples...


100%|██████████| 34717/34717 [00:25<00:00, 1387.09it/s]
  samples = np.array(samples)


Making samples...


100%|██████████| 7220/7220 [00:05<00:00, 1219.02it/s]


Making samples...


100%|██████████| 7220/7220 [00:03<00:00, 1865.79it/s]


Making X-y splits...
Starting data preprocessing...
   48740.22  48745.96  48727.47  48727.47.1   2.27206  110730.9135  136.0
0  48763.11  48763.12  48736.70    48736.73   5.33108  259880.1205  427.0
1  48778.58  48778.58  48750.37    48763.12   6.87389  335219.0368  389.0
2  48760.37  48778.58  48746.39    48778.58  10.58951  516291.2896  425.0
3  48799.99  48800.00  48756.93    48760.37  12.24525  597357.8390  535.0
4  48795.99  48800.00  48795.99    48800.00   7.55759  368810.1891  423.0 (49997, 7)
Making train/validation/test splits...
Making samples...


100%|██████████| 34717/34717 [00:20<00:00, 1673.17it/s]
  samples = np.array(samples)


Making samples...


100%|██████████| 7220/7220 [00:03<00:00, 2033.10it/s]


Making samples...


100%|██████████| 7220/7220 [00:03<00:00, 1988.75it/s]


Making X-y splits...


In order to properly compare the performances, we should denormalize the predictions. Otherwise, the best performer will be the one where the normalisations yielded the smallest values. Here, we will only predict the first column as it shortens training considerably. In addition, since we have to denormalize we cannot fairly calculate a mae over all columns.

In [5]:
from sklearn.metrics import mean_absolute_error as mae
from base.SCINet import scinet_builder


def denormalize(x, per_sample,means,stds):

    if per_sample:
        return(x*stds[0]+means[0])
    else:
        x_temp = x*stds[1]+means[1]
        return(np.exp(x_temp)-1) #luckily there are no negative values in the dataset

maes = []

for normalize_per_sample in sampling:

    model = scinet_builder(
                    output_len=  [Y_len],
                    input_len = X_len,
                    output_dim = [data_proc["X_train"].shape[2]],
                    input_dim = data_proc["X_train"].shape[2],
                    selected_columns = [[0]], 
                    loss_weights = [1],
                    hid_size = 8,
                    num_levels = 4,
                    kernel = 5,
                    dropout = .5,
                    learning_rate = 0.001,)

    model.load_weights('exp/hyperparams/saved_models/model_model_sample_{}.h5'.format(normalize_per_sample))
    prediction = model.predict(data_proc['X_test'])
    print(np.shape(prediction))
    for i in tqdm(range(data_test[str(normalize_per_sample)]['X_test'].shape[2])):
        maes.append(mae(data_proc[str(normalize_per_sample)]['y_test_unnormalized'][:,:i],prediction[:,:,i]))
    
    for i in range(data_proc['X_test'].shape[2]):
        print('Mae per sample column {} = {}, mae per column column {} = {}'.format(i,np.round(maes[i],3),i,np.round(maes[i+data_proc['X_test'].shape[2]],3)))




OSError: No file or directory found at exp/hyperparams/saved_models/model_sample_True


## Model hyperparameters

Now that we have establhed that normalising per ... is superior, we can go ahead and vary some other model hyperparameters. Particilarly, we will concern ourselves with the learning rate, hidden size and number of levels of the SCI_tree, however with few adjustments it can be used for other hyperparameters such as which columns to keep as well. We will first fix the preprocessing:

In [None]:
from base.train_scinet import train_scinet
from utils.data_loading import load_data
from utils.preprocess_data import preprocess

data_format=["open","high","low","close","Volume BTC","Volume USDT","tradecount"]

X_len = 256
Y_len = 24


standardization_settings = {'per_sample': normalize_per_sample,
                            'leaky': False,
                            'mode': 'log', #only if per sample is false, choose from log, sqrt or lin
                            'sqrt_val': 2, #of course only if mode is sqrt
                            'total mean': [],
                            'total std': []}

pairs = ["HUFL", "HULL", "MUFL", "MULL", "LUFL", "LULL", "OT"]
 
data, mean ,std = load_data('Binance_BTCUSDT_minute', pairs)

data_proc = preprocess(   data = data,
                        symbols = pairs,
                        data_format = data_format,
                        fraction = 1,
                        train_frac = .7,
                        val_frac = .15,
                        test_frac = .15,
                        X_LEN = X_len,
                        Y_LEN = Y_len,
                        OVERLAPPING = True,
                        STANDARDIZE = True,
                        standardization_settings = standardization_settings
                        )

We then design a loop such that the influence on a parameter of choice can be evaluated:

In [None]:
settings = {'hid_size': 16,
            'num_levels': 4,
            'learning_rate': 0.001}

variational_parameter = 'hid_size'
parameter_values = [4,8,16,32]

for value in parameter_values:

    settings[variational_parameter] = value

    results = train_scinet( X_train = data_proc["X_train"].astype('float32'),
                            y_train = data_proc["y_train"].astype('float32'),
                            X_val = data_proc["X_val"].astype('float32'),
                            y_val = data_proc["y_val"].astype('float32'),
                            X_test = data_proc["X_test"].astype('float32'),
                            y_test = data_proc["y_test"].astype('float32'),
                            epochs = 1,
                            batch_size = 128,
                            X_LEN = X_len,
                            Y_LEN = [Y_len],
                            output_dim = [data_proc["X_train"].shape[2]],
                            selected_columns = [[]],
                            hid_size= settings['hid_size'],
                            num_levels= settings['num_levels'],
                            kernel = 5,
                            dropout = .5,
                            loss_weights= [1],
                            learning_rate = settings['learning_rate'],
                            probabilistic = False)

    results[0].save_weigths('/exp/hyperparams/saved_models/model_{}_{}.h5'.format(variational_parameter,str(value)))


Next, we will evaluate the performance of the SCINets trained with different hyperparameters. Beware that all models here all loaded so make sure they exist.

In [None]:
from sklearn.metrics import mean_absolute_error as mae
from base.SCINet import scinet_builder

settings = {'hid_size': 16,
            'num_levels': 4,
            'learning_rate': 0.001}


maes = []

variational_parameter = 'hid_size'
parameter_values = [4,8,16,32]

for value in parameter_values:

    settings[parameter_values] = value

    model = scinet_builder(
                    output_len=  [Y_len],
                    input_len = X_len,
                    output_dim = [data_proc["X_train"].shape[2]],
                    input_dim = data_proc["X_train"].shape[2],
                    selected_columns = [[0]], 
                    loss_weights = [1],
                    hid_size = settings['hid_size'],
                    num_levels = settings['num_levels'],
                    kernel = 5,
                    dropout = .5,
                    learning_rate = settings['learning_rate'],)

    model.load_weights('exp/hyperparams/saved_models/model_{}_{}.h5'.format(variational_parameter,value))
    prediction = model.predict(data_proc['X_test'])
    
    maes.append(mae(data_proc['y_test'],prediction))
    
plot_barplot(variational_parameter, parameter_values ,maes)

## Training

Then the model can be trained. At first the hyperparameters which are not being optimized are defined.

In [7]:
EPOCHS = 10
BATCH_SIZE = 8
HID_SIZE = 4
NUM_LEVELS = 3
KERNEL_SIZE = 5
DROPOUT = 0.5
PROBABILISTIC = False

Then some values of the parameter to be tuned (in this case the learning rate) are defined. For each value of this parameter a model is trained and it's performance on the validation set is saved for plotting later on.

In [8]:
LEARNING_RATES = [0.001]

train_losses = np.zeros((len(LEARNING_RATES), EPOCHS))
val_losses = np.zeros((len(LEARNING_RATES), EPOCHS))
for idx, LEARNING_RATE in enumerate(LEARNING_RATES):

    model, history, X_train , y_train, X_val, y_val, X_test, y_test = train_scinet( X_train = data_proc["X_train"].astype('float32'),
                                                                                    y_train = data_proc["y_train"].astype('float32'),
                                                                                    X_val = data_proc["X_val"].astype('float32'),
                                                                                    y_val = data_proc["y_val"].astype('float32'),
                                                                                    X_test = data_proc["X_test"].astype('float32'),
                                                                                    y_test = data_proc["y_test"].astype('float32'),
                                                                                    epochs = EPOCHS,
                                                                                    batch_size = BATCH_SIZE,
                                                                                    X_LEN = X_len,
                                                                                    Y_LEN = [Y_len],
                                                                                    output_dim = [data_proc["X_train"].shape[2]],
                                                                                    selected_columns = None,
                                                                                    hid_size= HID_SIZE,
                                                                                    num_levels= NUM_LEVELS,
                                                                                    kernel = KERNEL_SIZE,
                                                                                    dropout = DROPOUT,
                                                                                    loss_weights= [1],
                                                                                    learning_rate = LEARNING_RATE,
                                                                                    probabilistic = PROBABILISTIC)

    train_loss = history.history['loss']
    train_losses[idx] = train_loss

    val_loss = history.history['val_loss']
    val_losses[idx] = val_loss
    
    #model.save(f'saved_models/model_learning_rate_{LEARNING_RATE}')

Initializing training with data:
X_train: (34925, 48, 7), y_train: (34925, 24, 7)
X_val: (7428, 48, 7), y_val: (7428, 24, 7)
X_test: (7428, 48, 7), y_test: (7428, 24, 7)
Building model...
Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 48, 7)]           0         
                                                                 
 Block_0 (SCINet)            (None, 24, 7)             97332     
                                                                 
Total params: 97,332
Trainable params: 97,332
Non-trainable params: 0
_________________________________________________________________
None
Is null X: 0
Is null y: 0
Epoch 1/10
Epoch 2/10
Epoch 3/10

KeyboardInterrupt: 

## Plotting

Here the performance of each model on the validation set is compared using a plot. The hyperparamer of the model with the lowest loss in the validation set can be selected as the optimal value.

In [7]:

from utils.plotting import plot_barplot


hyperparameter_type='LearningRate'



plot_barplot(LEARNING_RATES, val_losses, hyperparameter_type)

ModuleNotFoundError: No module named 'utils.plotting'