<a href="https://colab.research.google.com/github/Rezaie/Kire_mama/blob/master/tansformer_optuna.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Multivariate time series prediction using transformer with hyperparameter optimization

In this notebook, we use **Optuna** to find the optimum values of hyperparameters. In this notebook we specifically optimize the values of **learning rate, weight decay, positional encoding dropout** and **encoder layer dropout**.

Optuna is a python package specifially designed for hyperparameter tuning. We need to define a range of possible values for each of the hyperparameters. And optuna will try different parameter values with the model to minimize the validation loss after for specified number of experiments.


In [None]:
!pip install optuna

Collecting optuna
  Downloading optuna-4.4.0-py3-none-any.whl.metadata (17 kB)
Collecting alembic>=1.5.0 (from optuna)
  Downloading alembic-1.16.4-py3-none-any.whl.metadata (7.3 kB)
Collecting colorlog (from optuna)
  Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Downloading optuna-4.4.0-py3-none-any.whl (395 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m395.9/395.9 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading alembic-1.16.4-py3-none-any.whl (247 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m247.0/247.0 kB[0m [31m17.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading colorlog-6.9.0-py3-none-any.whl (11 kB)
Installing collected packages: colorlog, alembic, optuna
Successfully installed alembic-1.16.4 colorlog-6.9.0 optuna-4.4.0


In [None]:
import torch
import numpy as np
import math

import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import optuna

In [None]:
path = 'final_data.csv'

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}')

# Implement determinism. Set a fixed value for random seed so that when the parameters are initialized, they are initialized same across all experiments.
torch.manual_seed(42)

# If you are using CUDA, also set the seed for it
if torch.cuda.is_available():
    torch.cuda.manual_seed(42)
    torch.cuda.manual_seed_all(42)

# Set the seed for NumPy
np.random.seed(42)

Using device: cpu


Here we define **RiverData** a custom Dataset class to load the dataset we have. It extends the Pytorch Dataset class.  
- We need to define \_\_init__() function which can be used for loading data from the file and optionally for data preprocessing.
- Thereafter we define \_\_len__() function which gives the length of dataset.
- Then we define \_\_getitem__() function which returns an instance of (feature, label) tuple which can be used for model training.
  For our time series data, feature means the past values to be used for training and label means the future values to be predicted.

In [None]:
class RiverData(torch.utils.data.Dataset):

    def __init__(self, df, target, datecol, seq_len, pred_len):
        self.df = df
        self.datecol = datecol
        self.target = target
        self.seq_len = seq_len
        self.pred_len = pred_len
        self.setIndex()


    def setIndex(self):
        self.df.set_index(self.datecol, inplace=True)


    def __len__(self):
        return len(self.df) - self.seq_len - self.pred_len


    def __getitem__(self, idx):
        if len(self.df) <= (idx + self.seq_len+self.pred_len):
            raise IndexError(f"Index {idx} is out of bounds for dataset of size {len(self.df)}")
        df_piece = self.df[idx:idx+self.seq_len].values
        feature = torch.tensor(df_piece, dtype=torch.float32)
        label_piece = self.df[self.target][idx + self.seq_len:  idx+self.seq_len+self.pred_len].values
        label = torch.tensor(label_piece, dtype=torch.float32)
        return (feature, label)

### Normalize the data

In [None]:
df = pd.read_csv(path)
raw_df = df.drop('DATE', axis=1, inplace=False)
scaler = MinMaxScaler()

# Apply the transformations
df_scaled = scaler.fit_transform(raw_df)

df_scaled = pd.DataFrame(df_scaled, columns=raw_df.columns)
df_scaled['DATE'] = df['DATE']
df = df_scaled

Some advanced Python syntax has been used here. \
*common_args : it's used to pass arguments to a function, where common_args represents a python list \
**common_args: it's used to pass arguments to a function, where common_args represents a python dictionary

In [None]:

train_size = int(0.7 * len(df))
test_size = int(0.2 * len(df))
val_size = len(df) - train_size - test_size

seq_len = 13
pred_len = 1
num_features = 7
num_layers = 1


common_args = ['gauge_height', 'DATE', seq_len, pred_len]
train_dataset = RiverData(df[:train_size], *common_args)
val_dataset = RiverData(df[train_size: train_size+val_size], *common_args)
test_dataset = RiverData(df[train_size+val_size : len(df)], *common_args)


In [None]:
# Important parameters

BATCH_SIZE = 128 # keep as big as can be handled by GPU and memory
SHUFFLE = False # we don't shuffle the time series data
DATA_LOAD_WORKERS = 1 # it depends on amount of data you need to load
learning_rate = 1e-3

In [None]:
from torch.utils.data import DataLoader

common_args = {'batch_size': BATCH_SIZE, 'shuffle': SHUFFLE}
train_loader = DataLoader(train_dataset, **common_args)
val_loader = DataLoader(val_dataset, **common_args)
test_loader = DataLoader(test_dataset, **common_args)

### Here we define our PyTorch model.

BasicTransformerNetwork is the model class, it extends the Module class provided by Pytorch. \
- We define \_\_init__() function. It sets up layers and defines the model parameters.
- Also, we define forward() function which defines how the forwared pass computation occurs
- We also implement PositionalEncoding class which is an important part of transformer

In [None]:
# The transformer implementation in pytorch doesn't implement the
# positional encoding which is an essential part of the transforemer model

# Provide more description of positional encoding
class PositionalEncoding(torch.nn.Module):
    def __init__(self, d_model, pos_enc_dropout, max_len=5000):
        super().__init__();
        self.dropout = torch.nn.Dropout(p=pos_enc_dropout)

        Xp = torch.zeros(max_len, d_model) # max_len x d_model
        position = torch.arange(0, max_len).unsqueeze(1) # max_len x 1

        # Generates an exponentially decreasing series of numbers
        div_term = torch.exp(torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model)) #length: d_model/2

        #Applying sine to even indices in the array; 2i
        Xp[:, 0::2] = torch.sin(position.float() * div_term)

        #Applying cosine to odd indices in the array; 2i + 1
        Xp[:, 1::2] = torch.cos(position.float() * div_term)

        Xp = Xp.unsqueeze(1)
        self.register_buffer('Xp', Xp)

    def forward(self, x):
        x  = x + self.Xp[:x.size(0)]
        return self.dropout(x)



class BasicTransformerNetwork(torch.nn.Module):

    def __init__(self, seq_len, pred_len, enc_layer_dropout, pos_enc_dropout):
        # call the constructor of the base class
        super().__init__()
        self.model_type = 'Transformer'
        self.seq_len = seq_len
        self.pred_len = pred_len
        self.num_features = num_features

        # I don't think the embedding size should be this big. We will see.
        self.embedding_size = 128 #The features are converted to 128 embeddings
        self.num_layers = num_layers
        self.pos_encoder = PositionalEncoding(self.embedding_size, pos_enc_dropout, 10000)

        # dim_feedforward = 4 * d_model
        # layer_norm_eps: A very small number (epsilon) added to the denominator during the Layer Normalization calculation.
        self.encLayer = torch.nn.TransformerEncoderLayer(d_model=self.embedding_size, nhead=8,
                                                 dim_feedforward=256, dropout=enc_layer_dropout, activation="relu",
                                                 layer_norm_eps=1e-05, batch_first=True)

        self.transformerEnc = torch.nn.TransformerEncoder(self.encLayer, num_layers=self.num_layers)

        self.input_fc = torch.nn.Linear(self.num_features, self.embedding_size)

        self.prediction_head = torch.nn.Linear(self.embedding_size, self.pred_len)

        # Create causal mask
        self.register_buffer('causal_mask', self._generate_causal_mask(seq_len))


    def _generate_causal_mask(self, seq_len):
        """
        Generate causal mask for transformer encoder.
        Returns upper triangular matrix with -inf in upper triangle (excluding diagonal)
        """
        mask = torch.triu(torch.full((seq_len, seq_len), float('-inf')), diagonal=1)
        return mask


    def forward(self, x):
        x = self.input_fc(x) * np.sqrt(self.embedding_size)
        x = self.pos_encoder(x)
        out = self.transformerEnc(x, mask=self.causal_mask)
        last_embedding = out[:, -1, :]
        prediction = self.prediction_head(last_embedding)
        prediction = prediction.squeeze(-1)
        return prediction
# Note that the gradients are stored inside the FC layer objects
# For each training example we need to get rid of these gradients

In [None]:
print(torch.__version__)

2.2.1+cu121


In [None]:
loss = torch.nn.MSELoss()


In [None]:
for i, (f,l) in enumerate(train_loader):
    print('features shape: ', f.shape)
    print('labels shape: ', l.shape)
    break

features shape:  torch.Size([512, 13, 7])
labels shape:  torch.Size([512, 1])


In [None]:
# define metrics
import numpy as np
epsilon = np.finfo(float).eps

def wape_function(y, y_pred):
    """Weighted Average Percentage Error metric in the interval [0; 100]"""
    y = np.array(y)
    y_pred = np.array(y_pred)
    nominator = np.sum(np.abs(np.subtract(y, y_pred)))
    denominator = np.add(np.sum(np.abs(y)), epsilon)
    wape = np.divide(nominator, denominator) * 100.0
    return wape

def nse_function(y, y_pred):
    y = np.array(y)
    y_pred = np.array(y_pred)
    return (1-(np.sum((y_pred-y)**2)/np.sum((y-np.mean(y))**2)))


def evaluate_model(model, data_loader):
    # following line prepares the model for evaulation mode. It disables dropout and batch normalization if they have
    # are part of the model. For our simple model it's not necessary. Still I'm going to use it.

    model.eval()
    all_outputs = torch.empty(0, pred_len)
    all_labels = torch.empty(0, pred_len)
    for inputs, labels in data_loader:
        inputs = inputs.to(device)
        with torch.no_grad():
            outputs = model(inputs).detach().cpu().unsqueeze(1)
        all_outputs = torch.vstack((all_outputs, outputs))
        all_labels = torch.vstack((all_labels, labels))

    avg_val_loss = loss(all_outputs, all_labels)
    nse = nse_function(all_labels.numpy(), all_outputs.numpy())
    wape = wape_function(all_labels.numpy(), all_outputs.numpy())

    print(f'NSE : {nse}', end=' ')
    print(f'WAPE : {wape}', end=' ')
    print(f'Validation Loss: {avg_val_loss}')
    model.train()
    return avg_val_loss


In [None]:
from optuna.samplers import TPESampler
def objective(trial):
    # Here we define the search space of the hyper-parameters. Optuna uses byaesian optimization to find the optimal values of the hyperparameters.
    learning_rate = trial.suggest_loguniform('lr', 1e-4, 1e-2)
    weight_decay = trial.suggest_loguniform('weight_decay', 1e-5, 1e-2)
    pos_enc_dropout = trial.suggest_uniform('pos_enc_dropout', 0.05, 0.3)
    enc_layer_dropout = trial.suggest_uniform('enc_layer_dropout', 0.1, 0.5)


    model = BasicTransformerNetwork(seq_len, pred_len, pos_enc_dropout, enc_layer_dropout)
    model = model.to(device)
    optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate, weight_decay=weight_decay)

    num_epochs = 10
    best_val_loss = float('inf')
    patience = 1

    for epoch in range(num_epochs):
        model.train()
        epoch_loss = []
        for batch_idx, (inputs, labels) in enumerate(train_loader):
            inputs = inputs.to(device)
            labels = labels.to(device)
            outputs = model(inputs).unsqueeze(1)
            loss_val = loss(outputs, labels)

            # calculate gradients for back propagation
            loss_val.backward()

            # update the weights based on the gradients
            optimizer.step()

            # reset the gradients, avoid gradient accumulation
            optimizer.zero_grad()
            epoch_loss.append(loss_val.item())

        avg_train_loss = sum(epoch_loss)/len(epoch_loss)
        print(f'Epoch {epoch+1}: Traning Loss: {avg_train_loss}', end=' ')
        avg_val_loss = evaluate_model(model, val_loader)

        # Check for improvement
        if avg_val_loss < best_val_loss:
            best_val_loss = avg_val_loss
            epochs_no_improve = 0
            # Save the best model
            torch.save(model.state_dict(), 'best_model_trial.pth')
        else:
            epochs_no_improve += 1
            if epochs_no_improve == patience:
                print('Early stopping!')
                # Load the best model before stopping
                model.load_state_dict(torch.load('best_model_trial.pth'))
                break

        # Report intermediate objective value
        trial.report(best_val_loss, epoch)

        # Handle pruning based on the intermediate value
        if trial.should_prune():
            raise optuna.exceptions.TrialPruned()

    return best_val_loss

# Default sampler is TPESampler (Tree-structured Parzen Estimator).
# This sampler is based on independent sampling and uses a Bayesian optimization approach to efficiently explore
# the hyperparameter search space by building probability models of objective values.

study = optuna.create_study(direction='minimize', sampler=TPESampler())

# normally you run 100s of trials.
study.optimize(objective, n_trials=20)

print('Number of finished trials:', len(study.trials))
print('Best trial:')
trial = study.best_trial

print('  Value (Best Validation Loss):', trial.value)
print('  Params:')
for key, value in trial.params.items():
    print(f'    {key}: {value}')


[I 2024-11-19 17:19:04,438] A new study created in memory with name: no-name-2213175a-c3b5-4b56-bf90-164e77b99d0f
  learning_rate = trial.suggest_loguniform('lr', 1e-4, 1e-2)
  weight_decay = trial.suggest_loguniform('weight_decay', 1e-5, 1e-2)
  pos_enc_dropout = trial.suggest_uniform('pos_enc_dropout', 0.05, 0.3)
  enc_layer_dropout = trial.suggest_uniform('enc_layer_dropout', 0.1, 0.5)


Epoch 1: Traning Loss: 0.02526580304560975 NSE : -2.6605618000030518 WAPE : 117.91830153052135 Validation Loss: 0.06249501183629036
Epoch 2: Traning Loss: 0.01687834267906353 NSE : -0.08283412456512451 WAPE : 59.316866371845336 Validation Loss: 0.018486706539988518
Epoch 3: Traning Loss: 0.017058147678889222 

[I 2024-11-19 17:19:30,536] Trial 0 finished with value: 0.018486706539988518 and parameters: {'lr': 0.009710522346682626, 'weight_decay': 8.829487000121852e-05, 'pos_enc_dropout': 0.15459847164742846, 'enc_layer_dropout': 0.41149718189368745}. Best is trial 0 with value: 0.018486706539988518.


NSE : -0.22743380069732666 WAPE : 65.16451114665495 Validation Loss: 0.020955391228199005
Early stopping!


  learning_rate = trial.suggest_loguniform('lr', 1e-4, 1e-2)
  weight_decay = trial.suggest_loguniform('weight_decay', 1e-5, 1e-2)
  pos_enc_dropout = trial.suggest_uniform('pos_enc_dropout', 0.05, 0.3)
  enc_layer_dropout = trial.suggest_uniform('enc_layer_dropout', 0.1, 0.5)


Epoch 1: Traning Loss: 0.04963418806201027 NSE : -4.414577960968018 WAPE : 145.94704308856683 Validation Loss: 0.09244049340486526
Epoch 2: Traning Loss: 0.016938189946074832 NSE : -0.0033941268920898438 WAPE : 51.86400010423687 Validation Loss: 0.017130468040704727
Epoch 3: Traning Loss: 0.015962898387304192 

[I 2024-11-19 17:19:55,149] Trial 1 finished with value: 0.017130468040704727 and parameters: {'lr': 0.00959028332535089, 'weight_decay': 0.00025349829631454116, 'pos_enc_dropout': 0.09234553759662469, 'enc_layer_dropout': 0.43978325259555706}. Best is trial 1 with value: 0.017130468040704727.


NSE : -0.4637782573699951 WAPE : 72.45283457396346 Validation Loss: 0.02499038726091385
Early stopping!
Epoch 1: Traning Loss: 0.026059553017697964 NSE : -1.1913225650787354 WAPE : 91.19737182209582 Validation Loss: 0.0374113954603672
Epoch 2: Traning Loss: 0.014388464744614088 NSE : -0.5602973699569702 WAPE : 77.03471440152185 Validation Loss: 0.0266382098197937
Epoch 3: Traning Loss: 0.009605657655585008 NSE : 0.5431891083717346 WAPE : 39.05955623262572 Validation Loss: 0.007798912934958935
Epoch 4: Traning Loss: 0.005576367418489745 

[I 2024-11-19 17:20:28,084] Trial 2 finished with value: 0.007798912934958935 and parameters: {'lr': 0.0004627292644511695, 'weight_decay': 0.0002285926689252311, 'pos_enc_dropout': 0.13278603860443067, 'enc_layer_dropout': 0.4602154056633878}. Best is trial 2 with value: 0.007798912934958935.


NSE : 0.426383376121521 WAPE : 42.19183476623824 Validation Loss: 0.009793080389499664
Early stopping!
Epoch 1: Traning Loss: 0.020971636429693796 NSE : -1.461517095565796 WAPE : 96.21684404543319 Validation Loss: 0.04202430322766304
Epoch 2: Traning Loss: 0.01558543510645212 NSE : -0.7702598571777344 WAPE : 81.70125147639563 Validation Loss: 0.03022279590368271
Epoch 3: Traning Loss: 0.010361221221838203 NSE : 0.49836844205856323 WAPE : 41.10559671034051 Validation Loss: 0.008564114570617676
Epoch 4: Traning Loss: 0.004785359134486295 NSE : 0.7702738344669342 WAPE : 25.32768587794566 Validation Loss: 0.00392200518399477
Epoch 5: Traning Loss: 0.0033664225646753226 

[I 2024-11-19 17:21:09,205] Trial 3 finished with value: 0.00392200518399477 and parameters: {'lr': 0.0004011011231298408, 'weight_decay': 7.927339617381719e-05, 'pos_enc_dropout': 0.20359146672452283, 'enc_layer_dropout': 0.21567483436359192}. Best is trial 3 with value: 0.00392200518399477.


NSE : 0.7633612900972366 WAPE : 24.957746208559676 Validation Loss: 0.004040019121021032
Early stopping!
Epoch 1: Traning Loss: 0.019950345940982477 NSE : -2.238368272781372 WAPE : 110.75636531644697 Validation Loss: 0.055287111550569534
Epoch 2: Traning Loss: 0.020843465119183092 NSE : -1.67460036277771 WAPE : 99.85345210702874 Validation Loss: 0.04566216468811035
Epoch 3: Traning Loss: 0.017882670460979164 NSE : -0.20463204383850098 WAPE : 51.03282546027976 Validation Loss: 0.020566105842590332
Epoch 4: Traning Loss: 0.014825216950706873 NSE : -0.10779905319213867 WAPE : 50.29839566769002 Validation Loss: 0.018912922590970993
Epoch 5: Traning Loss: 0.015162449618470547 NSE : -0.10207593441009521 WAPE : 50.292637988848064 Validation Loss: 0.018815215677022934
Epoch 6: Traning Loss: 0.01523772915494321 NSE : -0.09840452671051025 WAPE : 50.293122197313366 Validation Loss: 0.018752532079815865
Epoch 7: Traning Loss: 0.015105192755174062 

[I 2024-11-19 17:22:06,690] Trial 4 finished with value: 0.018752532079815865 and parameters: {'lr': 0.002467172895532683, 'weight_decay': 0.00010992903999061927, 'pos_enc_dropout': 0.29392054943457946, 'enc_layer_dropout': 0.2617260318544786}. Best is trial 3 with value: 0.00392200518399477.


NSE : -0.10489296913146973 WAPE : 50.29801710470806 Validation Loss: 0.018863309174776077
Early stopping!
Epoch 1: Traning Loss: 0.019332222694807052 NSE : -1.3092961311340332 WAPE : 92.42037675996572 Validation Loss: 0.039425499737262726
Epoch 2: Traning Loss: 0.015417831471966353 

[I 2024-11-19 17:22:23,131] Trial 5 pruned. 


NSE : -0.7202922105789185 WAPE : 79.15980851403994 Validation Loss: 0.02936972863972187
Epoch 1: Traning Loss: 0.018195879468363402 NSE : -2.0512161254882812 WAPE : 107.16848623404135 Validation Loss: 0.05209195241332054
Epoch 2: Traning Loss: 0.017421404407111464 

[I 2024-11-19 17:22:39,563] Trial 6 pruned. 


NSE : -1.6650009155273438 WAPE : 99.71624503554266 Validation Loss: 0.045498281717300415
Epoch 1: Traning Loss: 0.016603792007648513 NSE : -1.281212329864502 WAPE : 92.65081597049814 Validation Loss: 0.03894604369997978
Epoch 2: Traning Loss: 0.01440567997618706 NSE : -0.2599024772644043 WAPE : 68.07679176500501 Validation Loss: 0.02150970697402954
Epoch 3: Traning Loss: 0.008887054433521394 NSE : 0.7162939310073853 WAPE : 28.47475257827802 Validation Loss: 0.004843578208237886
Epoch 4: Traning Loss: 0.0037551364429451073 NSE : 0.7737729549407959 WAPE : 23.766421710005968 Validation Loss: 0.0038622659631073475
Epoch 5: Traning Loss: 0.0031668238027427855 

[I 2024-11-19 17:23:20,665] Trial 7 finished with value: 0.0038622659631073475 and parameters: {'lr': 0.0005950865555327729, 'weight_decay': 0.0001651786938362955, 'pos_enc_dropout': 0.11007064752567254, 'enc_layer_dropout': 0.1947265400415874}. Best is trial 7 with value: 0.0038622659631073475.


NSE : 0.7404330968856812 WAPE : 25.383519515890136 Validation Loss: 0.004431460984051228
Early stopping!
Epoch 1: Traning Loss: 0.019630525303711444 NSE : -1.2078213691711426 WAPE : 91.89380445584476 Validation Loss: 0.0376930758357048
Epoch 2: Traning Loss: 0.014951303791547865 

[I 2024-11-19 17:23:37,046] Trial 8 pruned. 


NSE : -1.0560908317565918 WAPE : 89.03857680039272 Validation Loss: 0.03510265424847603
Epoch 1: Traning Loss: 0.038268693547866064 

[I 2024-11-19 17:23:45,332] Trial 9 pruned. 


NSE : -3.0509161949157715 WAPE : 127.9386435683382 Validation Loss: 0.06915935128927231
Epoch 1: Traning Loss: 0.01977360012425762 NSE : -0.4256035089492798 WAPE : 72.73078784062746 Validation Loss: 0.024338649585843086
Epoch 2: Traning Loss: 0.014318636849021472 NSE : -0.06326889991760254 WAPE : 62.67010243386064 Validation Loss: 0.01815267838537693
Epoch 3: Traning Loss: 0.01100135541658111 

[I 2024-11-19 17:24:09,935] Trial 10 pruned. 


NSE : 0.34411829710006714 WAPE : 48.90324581661494 Validation Loss: 0.01119755394756794
Epoch 1: Traning Loss: 0.028589519468812167 NSE : -0.6511213779449463 WAPE : 78.2417316562465 Validation Loss: 0.028188802301883698
Epoch 2: Traning Loss: 0.01573601959812157 NSE : -0.18914413452148438 WAPE : 66.0924042306086 Validation Loss: 0.020301686599850655
Epoch 3: Traning Loss: 0.012110012208778165 NSE : 0.19674944877624512 WAPE : 53.870956155011505 Validation Loss: 0.013713511638343334


[I 2024-11-19 17:24:36,326] Trial 11 pruned. 


Epoch 1: Traning Loss: 0.019434022550532584 NSE : -1.3488309383392334 WAPE : 94.11339845320926 Validation Loss: 0.04010046273469925
Epoch 2: Traning Loss: 0.014339354662768316 NSE : -0.1458975076675415 WAPE : 64.84525929979583 Validation Loss: 0.019563360139727592
Epoch 3: Traning Loss: 0.009131759358451792 

[I 2024-11-19 17:25:01,610] Trial 12 pruned. 


NSE : 0.2692875862121582 WAPE : 48.76901882813804 Validation Loss: 0.012475104071199894
Epoch 1: Traning Loss: 0.024047508215304787 

[I 2024-11-19 17:25:09,817] Trial 13 pruned. 


NSE : -1.8886244297027588 WAPE : 104.09810274797988 Validation Loss: 0.049316104501485825
Epoch 1: Traning Loss: 0.01431838270743037 NSE : -1.3968610763549805 WAPE : 94.24099178571151 Validation Loss: 0.04092046245932579
Epoch 2: Traning Loss: 0.016574616530947956 NSE : -0.046845436096191406 WAPE : 57.42133989462215 Validation Loss: 0.017872290685772896
Epoch 3: Traning Loss: 0.01621873386181787 

[I 2024-11-19 17:25:34,466] Trial 14 finished with value: 0.017872290685772896 and parameters: {'lr': 0.002370441266274233, 'weight_decay': 4.1407921189782365e-05, 'pos_enc_dropout': 0.2544278343992412, 'enc_layer_dropout': 0.3226205188745241}. Best is trial 7 with value: 0.0038622659631073475.


NSE : -0.08631289005279541 WAPE : 50.31096748020731 Validation Loss: 0.018546098843216896
Early stopping!
Epoch 1: Traning Loss: 0.0191222709759616 NSE : 0.17147666215896606 WAPE : 54.29679549076904 Validation Loss: 0.014144981279969215
Epoch 2: Traning Loss: 0.011029868673883038 NSE : 0.17495042085647583 WAPE : 54.56627070739511 Validation Loss: 0.014085676521062851
Epoch 3: Traning Loss: 0.00878416749218545 

[I 2024-11-19 17:25:59,109] Trial 15 pruned. 


NSE : 0.29518556594848633 WAPE : 50.09412132186446 Validation Loss: 0.012032958678901196
Epoch 1: Traning Loss: 0.027422770276325305 

[I 2024-11-19 17:26:07,308] Trial 16 pruned. 


NSE : -1.6002135276794434 WAPE : 99.40964423531334 Validation Loss: 0.04439220204949379
Epoch 1: Traning Loss: 0.018561938666469102 NSE : -0.11188352108001709 WAPE : 62.233975869163 Validation Loss: 0.018982650712132454
Epoch 2: Traning Loss: 0.016781917063467357 NSE : 0.03336364030838013 WAPE : 57.55290373653988 Validation Loss: 0.01650291681289673
Epoch 3: Traning Loss: 0.014454824327571591 

[I 2024-11-19 17:26:31,926] Trial 17 pruned. 


NSE : 0.11953943967819214 WAPE : 55.82810916418224 Validation Loss: 0.015031680464744568
Epoch 1: Traning Loss: 0.024789026397077113 NSE : 0.05429583787918091 WAPE : 54.14845162458103 Validation Loss: 0.016145553439855576
Epoch 2: Traning Loss: 0.018053230623889544 

[I 2024-11-19 17:26:48,337] Trial 18 finished with value: 0.016145553439855576 and parameters: {'lr': 0.00022163253578322888, 'weight_decay': 0.0004269280238541515, 'pos_enc_dropout': 0.18892590096062278, 'enc_layer_dropout': 0.33171824077582357}. Best is trial 7 with value: 0.0038622659631073475.


NSE : -0.039455533027648926 WAPE : 49.55015713004888 Validation Loss: 0.017746126279234886
Early stopping!
Epoch 1: Traning Loss: 0.018440998612600498 NSE : -0.2747478485107422 WAPE : 51.922105120073134 Validation Loss: 0.021763157099485397
Epoch 2: Traning Loss: 0.01293874411905175 

[I 2024-11-19 17:27:04,721] Trial 19 finished with value: 0.021763157099485397 and parameters: {'lr': 0.0007367874301143976, 'weight_decay': 5.759965462648816e-05, 'pos_enc_dropout': 0.1237335792328222, 'enc_layer_dropout': 0.21557865621541583}. Best is trial 7 with value: 0.0038622659631073475.


NSE : -0.6221939325332642 WAPE : 78.81908422269223 Validation Loss: 0.02769494242966175
Early stopping!
Number of finished trials: 20
Best trial:
  Value (Best Validation Loss): 0.0038622659631073475
  Params:
    lr: 0.0005950865555327729
    weight_decay: 0.0001651786938362955
    pos_enc_dropout: 0.11007064752567254
    enc_layer_dropout: 0.1947265400415874


In [None]:
# Plot the results with the metrics inside it

In [None]:
import optuna.visualization as vis

# Optimization history
fig1 = vis.plot_optimization_history(study)
fig1.write_html("optimization_history_transformer.html")