# TabNet CODE

### Introduce TabNET
---
* 고스트 배치 정규화 (GBN)

* Sparsemax

* 


### code_source
---
* TabNet Torch code: https://ichi.pro/ko/pytorcheseo-tabnet-guhyeon-277727554318969
* Sparsemax code: https://github.com/gokceneraslan/SparseMax.torch
* paper: https://arxiv.org/pdf/1908.07442v4.pdf
* pytorch-TabNet1: https://pypi.org/project/pytorch-tabnet/
* pytorch-TabNet2: https://wsshin.tistory.com/5
* pytorch-TabNet-Regressor: https://www.kaggle.com/rapela/tps-02-21-tabnet-regressor


### Parameter Tuning
---
* Source: https://dreamquark-ai.github.io/tabnet/generated_docs/README.html#model-parameters

#### Model parameters

1. n_d: int(default = 8)
    - Width of the decision prediction layer. Bigger values gives more capacity to the model with the risk of overfitting. Values typically range from 8 to 64.
<br/> 
2. n_a: int(default = 8)
    - Width of the attention embedding for each mask. According to the paper n_d=n_a is usually a good choice. (default=8)
<br/> 
3. n_steps: int(default = 3)
    - Number of steps in the architecture (usually between 3 and 10)
<br/> 
4. gamma: float (default = 1.3)
    - This is the coefficient for feature reusage in the masks. A value close to 1 will make mask selection least correlated between layers. Values range from 1.0 to 2.0.
<br/> 
5. cat_idxs: list of int (default=[] - Mandatory for embeddings)
    - List of categorical features indices.
<br/> 
6. cat_dims: list of int (default=[] - Mandatory for embeddings)
    - List of categorical features number of modalities (number of unique values for a categorical feature) /!\ no new modalities can be predicted
<br/> 
7. cat_emb_dim : list of int (optional)
    - List of embeddings size for each categorical features. (default =1)
<br/> 
8. n_independent : int (default=2)
    - Number of independent Gated Linear Units layers at each step. Usual values range from 1 to 5.
<br/> 
9. n_shared : int (default=2)
    - Number of shared Gated Linear Units at each step Usual values range from 1 to 5
<br/> 
10. epsilon : float (default 1e-15)
    - Should be left untouched
<br/> 
11. seed : int (default=0)
    - Random seed for reproducibility
<br/> 
12. momentum : float
    - Momentum for batch normalization, typically ranges from 0.01 to 0.4 (default=0.02)
<br/> 
13. clip_value : float (default None)
    - If a float is given this will clip the gradient at clip_value.
<br/> 
14. lambda_sparse : float (default = 1e-3)
    - This is the extra sparsity loss coefficient as proposed in the original paper. The bigger this coefficient is, the sparser your model will be in terms of feature selection. Depending on the difficulty of your problem, reducing this value could help.
<br/>
15. optimizer_fn : torch.optim (default=torch.optim.Adam)
    - Pytorch optimizer function
<br/>     
16. optimizer_params: dict (default=dict(lr=2e-2))
    - Parameters compatible with optimizer_fn used initialize the optimizer. Since we have Adam as our default optimizer, we use this to define the initial learning rate used for training. As mentionned in the original paper, a large initial learning of 0.02 with decay is a good option.
<br/> 
17. scheduler_fn : torch.optim.lr_scheduler (default=None)
    - Pytorch Scheduler to change learning rates during training.
<br/> 
18. model_name : str (default = ‘DreamQuarkTabNet’)
    - Name of the model used for saving in disk, you can customize this to easily retrieve and reuse your trained models.
<br/> 
19. saving_path : str (default = ‘./’)
    - Path defining where to save models.
<br/> 
20. verbose : int (default=1)
    - Verbosity for notebooks plots, set to 1 to see every epoch, 0 to get None.
<br/> 
21. device_name : str (default=’auto’) 
    - ‘cpu’ for cpu training, ‘gpu’ for gpu training, ‘auto’ to automatically detect gpu.
<br/> 
22. mask_type: str (default=’sparsemax’) 
    - Either “sparsemax” or “entmax” : this is the masking function to use for selecting features
<br/> 
<br/> 

#### Fit parameters
1. X_train : np.array
<br/> 
2. y_train : np.array
<br/> 
3. eval_set: list of tuple
<br/> 
4. eval_name: list of str
<br/> 
5. eval_metric : list of str
<br/> 
6. max_epochs : int (default = 200)
<br/> 
7. patience : int (default = 15)
<br/> 
8. weights : int or dict (default=0)
    -  Only for TabNetClassifier Sampling parameter 0 : no sampling 1 : automated sampling with inverse class occurrences dict : keys are classes, values are weights for each class
<br/> 
9. loss_fn : torch.loss or list of torch.loss
    - Loss function for training (default to mse for regression and cross entropy for classification) When using TabNetMultiTaskClassifier you can set a list of same length as number of tasks, each task will be assigned its own loss function
<br/> 
10. batch_size : int (default=1024)
    - Number of examples per batch, large batch sizes are recommended
<br/>     
11. virtual_batch_size : int (default=128)
    - Size of the mini batches used for “Ghost Batch Normalization”. /!\ virtual_batch_size should divide batch_size
<br/> 
12. num_workers : int (default=0)
    - Number or workers used in torch.utils.data.Dataloader
<br/> 
13. drop_last : bool (default=False)
    - Whether to drop last batch if not complete during training
<br/> 
14. callbacks : list of callback function
    - List of custom callbacks
<br/> 
15. pretraining_ratio : float
      - /!\ TabNetPretrainer Only : Percentage of input features to mask during pretraining.
      - Should be between 0 and 1. The bigger the harder the reconstruction task is.

In [2]:
import warnings

warnings.filterwarnings( 'ignore' )

In [1]:
import os
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split

import torch
from torch import nn
from pytorch_tabnet.tab_model import TabNetRegressor

from tqdm.notebook import tqdm
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import KFold

from sklearn.metrics import mean_squared_error

import random

import optuna 
from optuna import Trial, visualization
from optuna.samplers import TPESampler
SEED = 42

os.environ["CUDA_LAUNCH_BLOCKING"] = "1"


def seed_everything(seed_value):
    random.seed(seed_value)
    np.random.seed(seed_value)
    torch.manual_seed(seed_value)
    os.environ['PYTHONHASHSEED'] = str(seed_value)

    if torch.cuda.is_available(): 
        torch.cuda.manual_seed(seed_value)
        torch.cuda.manual_seed_all(seed_value)
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = False

seed_everything(SEED)


train_data = pd.read_csv("train.csv") 
test_data = pd.read_csv("test.csv")

x_data = train_data.loc[:, 'f0':'f99']
y_data = train_data.loc[:, 'loss']

x_train, x_test, y_train, y_test=train_test_split(x_data,
                                                  y_data,
                                                  test_size=0.3,   #전체 중 20%를 테스트용으로 분할
                                                                   #나머지 80%는 훈련용
                                                  shuffle=True,    #무작위로 섞어서 추출
                                                  random_state=SEED) #무작위 추출 시 일정한 기준으로
x_val, x_test, y_val, y_test = train_test_split(x_test,
                                                y_test,
                                                test_size=0.5,
                                                shuffle=True,
                                                random_state=SEED)

y_train = y_train.values
y_val = y_val.values


for c in x_train.columns:
    if x_train[c].dtype == 'object':
        lbl = LabelEncoder()
        lbl.fit(list(x_train[c].values) + list(x_test[c].values))

        x_train[c] = lbl.transform(x_train[c].values)
        x_test[c] = lbl.transform(x_test[c].values)

columns = x_test.columns

In [3]:
def Optuna_TabNet(trial):
    N_D = 16
    N_A = N_D
    N_STEPS = trial.suggest_int("N_STEPS", 3, 5)
    GAMMA = trial.suggest_uniform("GAMMA", 1.0, 2.0)
    N_INDEPENDENT = trial.suggest_int("N_INDEPENDENT", 1, 3)
    N_SHARED = trial.suggest_int("N_SHARED", 1, 3)
    LAMBDA_SPARSE =  trial.suggest_uniform("LAMBDA_SPARSE", 0, 1e-2)
    OPT_LR = trial.suggest_categorical('OPT_LR', [1e-1 ,5e-2, 1e-2, 1e-3, 1e-4])
    OPT_WEIGHT_DECAY = trial.suggest_categorical('OPT_WEIGHT_DECAY', [1e-8, 1e-6, 1e-5, 1e-4, 1e-3])
    OPT_MOMENTUM = trial.suggest_uniform("OPT_MOMENTUM", 0.01, 0.4)
#     MASK_TYPE = trial.suggest_categorical('MASK_TYPE',  ["sparsemax", "entmax"])
    MASK_TYPE = "entmax"
    
    SCHEDULER_MIN_LR = 1e-6
    SCHEDULER_FACTOR = 0.9
    
    tabnet_params = dict(n_d=N_D, 
                         n_a=N_A, 
                         n_steps=N_STEPS, 
                         gamma=GAMMA,
                         n_independent = N_INDEPENDENT,
                         n_shared = N_SHARED,
                         lambda_sparse=LAMBDA_SPARSE, 
                         optimizer_fn=torch.optim.SGD,
                         optimizer_params=dict(lr=OPT_LR, 
                                               weight_decay=OPT_WEIGHT_DECAY, 
                                               momentum=OPT_MOMENTUM),
                         mask_type=MASK_TYPE,
                         scheduler_params=dict(mode="min",
                                               patience=20,
                                               min_lr=SCHEDULER_MIN_LR,
                                               factor=SCHEDULER_FACTOR,),
                         scheduler_fn=torch.optim.lr_scheduler.ReduceLROnPlateau,
                         verbose=1,
                         seed=SEED
                         )
    print(tabnet_params)
    
    return TabNetRegressor(**tabnet_params)

def objective(trial):
    
    MAX_EPOCH = trial.suggest_categorical("MAX_EPOCH", [1000, 3000, 5000])
    BATCH_SIZE = 512
    
    train_df, val_df = x_train.iloc[:][columns], x_val.iloc[:][columns]
    
    train_df = train_df.to_numpy()
    train_target = y_train.reshape(-1, 1)
    
    val_df = val_df.to_numpy()
    val_target = y_val.reshape(-1, 1)
    
    model = Optuna_TabNet(trial)
    
    #with pruning
    for step in range(5):
        model.fit(X_train=train_df,
              y_train=train_target,
              max_epochs=3,
              batch_size=BATCH_SIZE,
              num_workers= 4 * torch.cuda.device_count(),
              drop_last=False)
        test_score = mean_squared_error(model.predict(x_test.to_numpy()), y_test)
        
        trial.report(test_score, step)
        # Handle pruning based on the intermediate value.
        if trial.should_prune():
            raise optuna.exceptions.TrialPruned()
        
        
    model.fit(X_train=train_df,
              y_train=train_target,
              eval_set=[(val_df, val_target)],
              eval_name = ["val"],
              eval_metric = ['mse'],
              max_epochs=MAX_EPOCH,
              patience=10, 
              batch_size=BATCH_SIZE,
              num_workers= 4 * torch.cuda.device_count(),
              drop_last=False)

    score = mean_squared_error(model.predict(x_test.to_numpy()), y_test)
    print(score)
    
    return score

In [5]:
TRIAL_NUM = 100

study = optuna.create_study(direction='minimize',
                            sampler=TPESampler(),
                            pruner=optuna.pruners.MedianPruner())
study.optimize(lambda trial : objective(trial), n_trials=TRIAL_NUM)

import joblib
from optuna.trial import TrialState

joblib.dump(study, "study_TABNET16_2.pkl")

pruned_trials = study.get_trials(deepcopy=False, states=[TrialState.PRUNED])
complete_trials = study.get_trials(deepcopy=False, states=[TrialState.COMPLETE])



print("Study statistics: ")
print("  Number of finished trials: ", len(study.trials))
print("  Number of pruned trials: ", len(pruned_trials))
print("  Number of complete trials: ", len(complete_trials))

print("Best trial:")
trial = study.best_trial

print("  Value: ", trial.value)

print("  Params: ")
for key, value in trial.params.items():
    print("    {}: {}".format(key, value))


# print('Best trial: score {},\nparams {}'.format(study.best_trial.value,study.best_trial.params))
# best_param = study.best_trial.params

[32m[I 2021-08-23 17:27:53,892][0m A new study created in memory with name: no-name-c9dd28b4-633e-4b81-9a84-9afc8663c7fa[0m


{'n_d': 16, 'n_a': 16, 'n_steps': 5, 'gamma': 1.958297534323594, 'n_independent': 2, 'n_shared': 1, 'lambda_sparse': 0.0005835697725547651, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.01, 'weight_decay': 1e-06, 'momentum': 0.38855612095763653}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 113.39919|  0:00:24s
epoch 1  | loss: 93.44884|  0:00:48s
epoch 2  | loss: 83.28976|  0:01:12s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 78.22799|  0:00:24s
epoch 1  | loss: 75.33061|  0:00:48s
epoch 2  | loss: 73.827  |  0:01:12s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 72.66759|  0:00:24s
epoch 1  | loss: 72.11

epoch 121| loss: 65.00466| val_mse: 64.95821|  0:51:49s
epoch 122| loss: 65.05924| val_mse: 64.89676|  0:52:14s
epoch 123| loss: 64.87992| val_mse: 64.86951|  0:52:40s
epoch 124| loss: 64.94381| val_mse: 64.90343|  0:53:05s
epoch 125| loss: 64.96312| val_mse: 64.85564|  0:53:30s
epoch 126| loss: 64.92979| val_mse: 64.81557|  0:53:55s
epoch 127| loss: 64.90713| val_mse: 64.81686|  0:54:20s
epoch 128| loss: 64.9406 | val_mse: 64.81484|  0:54:45s
epoch 129| loss: 64.91181| val_mse: 64.7639 |  0:55:10s
epoch 130| loss: 64.78753| val_mse: 64.69126|  0:55:35s
epoch 131| loss: 64.80199| val_mse: 64.65711|  0:56:00s
epoch 132| loss: 64.90514| val_mse: 64.66295|  0:56:25s
epoch 133| loss: 64.79712| val_mse: 64.65936|  0:56:50s
epoch 134| loss: 64.74765| val_mse: 64.51369|  0:57:15s
epoch 135| loss: 64.68934| val_mse: 64.62234|  0:57:41s
epoch 136| loss: 64.73228| val_mse: 64.50825|  0:58:06s
epoch 137| loss: 64.66332| val_mse: 64.64063|  0:58:31s
epoch 138| loss: 64.59025| val_mse: 64.54278|  0

epoch 268| loss: 63.34352| val_mse: 63.15535|  1:53:35s
epoch 269| loss: 63.32034| val_mse: 63.1798 |  1:54:01s
epoch 270| loss: 63.30631| val_mse: 63.15886|  1:54:27s
epoch 271| loss: 63.31644| val_mse: 63.1555 |  1:54:52s
epoch 272| loss: 63.33259| val_mse: 63.1455 |  1:55:18s
epoch 273| loss: 63.27819| val_mse: 63.16482|  1:55:44s

Early stopping occurred at epoch 273 with best_epoch = 263 and best_val_mse = 63.13254
Best weights from best epoch are automatically used!


[32m[I 2021-08-23 19:30:47,058][0m Trial 0 finished with value: 63.86604804999591 and parameters: {'MAX_EPOCH': 3000, 'N_STEPS': 5, 'GAMMA': 1.958297534323594, 'N_INDEPENDENT': 2, 'N_SHARED': 1, 'LAMBDA_SPARSE': 0.0005835697725547651, 'OPT_LR': 0.01, 'OPT_WEIGHT_DECAY': 1e-06, 'OPT_MOMENTUM': 0.38855612095763653}. Best is trial 0 with value: 63.86604804999591.[0m


63.86604804999591
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.4461063105212684, 'n_independent': 1, 'n_shared': 1, 'lambda_sparse': 0.006074077172986856, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.001, 'weight_decay': 0.001, 'momentum': 0.22458437755775237}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 117.47556|  0:00:13s
epoch 1  | loss: 108.14484|  0:00:26s
epoch 2  | loss: 100.32496|  0:00:39s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 94.13184|  0:00:13s
epoch 1  | loss: 89.13276|  0:00:26s
epoch 2  | loss: 85.18364|  0:00:39s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 82.12722|  0:00:13s
e

epoch 121| loss: 63.83717| val_mse: 63.67811|  0:28:52s
epoch 122| loss: 63.79435| val_mse: 63.68439|  0:29:05s
epoch 123| loss: 63.81703| val_mse: 63.67145|  0:29:19s
epoch 124| loss: 63.79658| val_mse: 63.68815|  0:29:33s
epoch 125| loss: 63.71506| val_mse: 63.67262|  0:29:47s
epoch 126| loss: 63.72335| val_mse: 63.63227|  0:30:01s
epoch 127| loss: 63.67111| val_mse: 63.60761|  0:30:15s
epoch 128| loss: 63.71891| val_mse: 63.6286 |  0:30:29s
epoch 129| loss: 63.71612| val_mse: 63.59725|  0:30:43s
epoch 130| loss: 63.73874| val_mse: 63.60611|  0:30:57s
epoch 131| loss: 63.68907| val_mse: 63.52208|  0:31:11s
epoch 132| loss: 63.66834| val_mse: 63.5446 |  0:31:25s
epoch 133| loss: 63.70216| val_mse: 63.54725|  0:31:40s
epoch 134| loss: 63.59991| val_mse: 63.50119|  0:31:54s
epoch 135| loss: 63.62388| val_mse: 63.50319|  0:32:08s
epoch 136| loss: 63.57818| val_mse: 63.5028 |  0:32:22s
epoch 137| loss: 63.56292| val_mse: 63.45344|  0:32:36s
epoch 138| loss: 63.61981| val_mse: 63.46022|  0

epoch 268| loss: 62.98932| val_mse: 62.89773|  1:03:44s
epoch 269| loss: 62.98962| val_mse: 62.90235|  1:03:59s
epoch 270| loss: 62.97992| val_mse: 62.89533|  1:04:13s
epoch 271| loss: 62.95244| val_mse: 62.9031 |  1:04:28s
epoch 272| loss: 62.95272| val_mse: 62.88817|  1:04:42s
epoch 273| loss: 62.96172| val_mse: 62.89632|  1:04:57s
epoch 274| loss: 62.95989| val_mse: 62.88649|  1:05:11s
epoch 275| loss: 62.98253| val_mse: 62.87409|  1:05:25s
epoch 276| loss: 62.95285| val_mse: 62.89662|  1:05:39s
epoch 277| loss: 62.9801 | val_mse: 62.88087|  1:05:54s
epoch 278| loss: 62.94051| val_mse: 62.87687|  1:06:09s
epoch 279| loss: 62.95909| val_mse: 62.88646|  1:06:23s
epoch 280| loss: 62.96047| val_mse: 62.88659|  1:06:38s
epoch 281| loss: 62.93001| val_mse: 62.88446|  1:06:52s
epoch 282| loss: 62.93806| val_mse: 62.8889 |  1:07:06s
epoch 283| loss: 62.9257 | val_mse: 62.88326|  1:07:20s
epoch 284| loss: 62.94919| val_mse: 62.89322|  1:07:35s
epoch 285| loss: 62.92942| val_mse: 62.89065|  1

[32m[I 2021-08-23 20:42:38,044][0m Trial 1 finished with value: 63.66390757345737 and parameters: {'MAX_EPOCH': 5000, 'N_STEPS': 3, 'GAMMA': 1.4461063105212684, 'N_INDEPENDENT': 1, 'N_SHARED': 1, 'LAMBDA_SPARSE': 0.006074077172986856, 'OPT_LR': 0.001, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.22458437755775237}. Best is trial 1 with value: 63.66390757345737.[0m


63.66390757345737
{'n_d': 16, 'n_a': 16, 'n_steps': 5, 'gamma': 1.8866278723494483, 'n_independent': 3, 'n_shared': 2, 'lambda_sparse': 0.00530752821947752, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.0001, 'weight_decay': 0.0001, 'momentum': 0.21210871782874902}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 123.56118|  0:00:30s
epoch 1  | loss: 123.427 |  0:01:00s
epoch 2  | loss: 123.25354|  0:01:31s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 123.21429|  0:00:30s
epoch 1  | loss: 123.11383|  0:01:00s
epoch 2  | loss: 123.04739|  0:01:31s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 122.93105|  0:00:3

epoch 117| loss: 112.0039| val_mse: 110.24121|  1:06:02s
epoch 118| loss: 111.84073| val_mse: 110.0841|  1:06:35s
epoch 119| loss: 111.81557| val_mse: 110.35288|  1:07:08s
epoch 120| loss: 111.71866| val_mse: 110.45695|  1:07:41s
epoch 121| loss: 111.63105| val_mse: 110.26782|  1:08:14s
epoch 122| loss: 111.51516| val_mse: 110.0462|  1:08:48s
epoch 123| loss: 111.43489| val_mse: 110.12043|  1:09:21s
epoch 124| loss: 111.31692| val_mse: 109.7732|  1:09:55s
epoch 125| loss: 111.31433| val_mse: 109.86336|  1:10:28s
epoch 126| loss: 111.23994| val_mse: 109.58563|  1:11:02s
epoch 127| loss: 111.15095| val_mse: 110.11457|  1:11:35s
epoch 128| loss: 110.98434| val_mse: 109.26739|  1:12:09s
epoch 129| loss: 110.95449| val_mse: 109.80354|  1:12:41s
epoch 130| loss: 110.86259| val_mse: 109.55516|  1:13:15s
epoch 131| loss: 110.84559| val_mse: 109.42321|  1:13:48s
epoch 132| loss: 110.73862| val_mse: 109.27628|  1:14:21s
epoch 133| loss: 110.63484| val_mse: 109.30906|  1:14:54s
epoch 134| loss: 1

epoch 259| loss: 101.82944| val_mse: 101.63822|  2:25:42s
epoch 260| loss: 101.90819| val_mse: 101.33908|  2:26:14s
epoch 261| loss: 101.7461| val_mse: 101.2929|  2:26:47s
epoch 262| loss: 101.60446| val_mse: 101.26396|  2:27:19s
epoch 263| loss: 101.59468| val_mse: 101.31469|  2:27:52s
epoch 264| loss: 101.46965| val_mse: 101.27827|  2:28:24s
epoch 265| loss: 101.55589| val_mse: 101.29465|  2:28:57s
epoch 266| loss: 101.37842| val_mse: 101.34848|  2:29:29s
epoch 267| loss: 101.32987| val_mse: 100.8683|  2:30:02s
epoch 268| loss: 101.27802| val_mse: 101.10866|  2:30:34s
epoch 269| loss: 101.20763| val_mse: 101.11824|  2:31:07s
epoch 270| loss: 101.22568| val_mse: 101.06462|  2:31:39s
epoch 271| loss: 101.07025| val_mse: 101.12202|  2:32:12s
epoch 272| loss: 101.1196| val_mse: 101.04323|  2:32:44s
epoch 273| loss: 101.00667| val_mse: 100.75703|  2:33:16s
epoch 274| loss: 101.00248| val_mse: 101.05029|  2:33:49s
epoch 275| loss: 100.90099| val_mse: 100.79376|  2:34:21s
epoch 276| loss: 1

epoch 405| loss: 94.03004| val_mse: 94.69767|  3:44:39s
epoch 406| loss: 93.97656| val_mse: 94.45705|  3:45:12s
epoch 407| loss: 93.89343| val_mse: 94.72976|  3:45:44s
epoch 408| loss: 93.91356| val_mse: 94.41301|  3:46:16s
epoch 409| loss: 93.88525| val_mse: 94.35608|  3:46:48s
epoch 410| loss: 93.76397| val_mse: 94.38715|  3:47:21s
epoch 411| loss: 93.78659| val_mse: 94.92946|  3:47:53s
epoch 412| loss: 93.61195| val_mse: 94.67037|  3:48:25s
epoch 413| loss: 93.65746| val_mse: 94.3259 |  3:48:58s
epoch 414| loss: 93.69783| val_mse: 94.41725|  3:49:30s
epoch 415| loss: 93.56014| val_mse: 94.24644|  3:50:02s
epoch 416| loss: 93.44187| val_mse: 94.12773|  3:50:34s
epoch 417| loss: 93.51264| val_mse: 94.30538|  3:51:07s
epoch 418| loss: 93.4485 | val_mse: 94.23064|  3:51:39s
epoch 419| loss: 93.3399 | val_mse: 94.09196|  3:52:11s
epoch 420| loss: 93.31493| val_mse: 94.37896|  3:52:43s
epoch 421| loss: 93.17928| val_mse: 94.26864|  3:53:16s
epoch 422| loss: 93.327  | val_mse: 93.91839|  3

epoch 552| loss: 88.22819| val_mse: 89.8032 |  5:04:40s
epoch 553| loss: 88.1819 | val_mse: 89.60836|  5:05:12s

Early stopping occurred at epoch 553 with best_epoch = 543 and best_val_mse = 89.55376
Best weights from best epoch are automatically used!


[32m[I 2021-08-24 01:56:48,709][0m Trial 2 finished with value: 90.62631389231468 and parameters: {'MAX_EPOCH': 5000, 'N_STEPS': 5, 'GAMMA': 1.8866278723494483, 'N_INDEPENDENT': 3, 'N_SHARED': 2, 'LAMBDA_SPARSE': 0.00530752821947752, 'OPT_LR': 0.0001, 'OPT_WEIGHT_DECAY': 0.0001, 'OPT_MOMENTUM': 0.21210871782874902}. Best is trial 1 with value: 63.66390757345737.[0m


90.62631389231468
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.3087121113691618, 'n_independent': 2, 'n_shared': 2, 'lambda_sparse': 0.008204107267937725, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 1e-05, 'momentum': 0.24434697181612922}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 71.41493|  0:00:19s
epoch 1  | loss: 64.87335|  0:00:38s
epoch 2  | loss: 63.61269|  0:00:57s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.30479|  0:00:18s
epoch 1  | loss: 63.19136|  0:00:36s
epoch 2  | loss: 63.09363|  0:00:53s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.00336|  0:00:17s
epoch 

[32m[I 2021-08-24 02:09:35,528][0m Trial 3 finished with value: 63.319762280407474 and parameters: {'MAX_EPOCH': 5000, 'N_STEPS': 3, 'GAMMA': 1.3087121113691618, 'N_INDEPENDENT': 2, 'N_SHARED': 2, 'LAMBDA_SPARSE': 0.008204107267937725, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 1e-05, 'OPT_MOMENTUM': 0.24434697181612922}. Best is trial 3 with value: 63.319762280407474.[0m


63.319762280407474
{'n_d': 16, 'n_a': 16, 'n_steps': 5, 'gamma': 1.758964441523823, 'n_independent': 3, 'n_shared': 3, 'lambda_sparse': 0.009208396979930852, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.05, 'weight_decay': 1e-05, 'momentum': 0.24949367375016357}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 111.56666|  0:00:34s
epoch 1  | loss: 91.48942|  0:01:09s
epoch 2  | loss: 82.85606|  0:01:44s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 78.64453|  0:00:33s
epoch 1  | loss: 76.68391|  0:01:07s
epoch 2  | loss: 75.35688|  0:01:40s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 74.5866 |  0:00:33s
epoc

[32m[I 2021-08-24 03:06:34,866][0m Trial 4 finished with value: 63.614465805319625 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 5, 'GAMMA': 1.758964441523823, 'N_INDEPENDENT': 3, 'N_SHARED': 3, 'LAMBDA_SPARSE': 0.009208396979930852, 'OPT_LR': 0.05, 'OPT_WEIGHT_DECAY': 1e-05, 'OPT_MOMENTUM': 0.24949367375016357}. Best is trial 3 with value: 63.319762280407474.[0m


63.614465805319625
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.0530696329296187, 'n_independent': 2, 'n_shared': 2, 'lambda_sparse': 0.006438218828431734, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.001, 'weight_decay': 1e-06, 'momentum': 0.22728353460999565}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 92.42578|  0:00:17s
epoch 1  | loss: 88.41781|  0:00:34s
epoch 2  | loss: 85.51311|  0:00:52s


[32m[I 2021-08-24 03:07:35,556][0m Trial 5 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 5, 'gamma': 1.8983621482858033, 'n_independent': 1, 'n_shared': 1, 'lambda_sparse': 0.008994253427278024, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.001, 'weight_decay': 1e-06, 'momentum': 0.3466009579645691}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 74.43592|  0:00:19s
epoch 1  | loss: 74.27374|  0:00:39s
epoch 2  | loss: 74.22418|  0:00:58s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 74.28905|  0:00:19s
epoch 1  | loss: 74.05426|  0:00:39s
epoch 2  | loss: 74.08107|  0:00:58s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 74.07802|  0:00:19s
epoch 1  | loss: 73.874

[32m[I 2021-08-24 03:12:10,060][0m Trial 6 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 4, 'gamma': 1.5589724102623155, 'n_independent': 2, 'n_shared': 2, 'lambda_sparse': 0.008563750364486852, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.05, 'weight_decay': 1e-06, 'momentum': 0.24137907808434333}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 81.04125|  0:00:22s
epoch 1  | loss: 70.22161|  0:00:44s
epoch 2  | loss: 68.71951|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 67.46718|  0:00:22s
epoch 1  | loss: 66.64848|  0:00:44s
epoch 2  | loss: 65.76526|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 65.19821|  0:00:21s
epoch 1  | loss: 64.723

[32m[I 2021-08-24 03:34:50,009][0m Trial 7 finished with value: 63.40996335304223 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 4, 'GAMMA': 1.5589724102623155, 'N_INDEPENDENT': 2, 'N_SHARED': 2, 'LAMBDA_SPARSE': 0.008563750364486852, 'OPT_LR': 0.05, 'OPT_WEIGHT_DECAY': 1e-06, 'OPT_MOMENTUM': 0.24137907808434333}. Best is trial 3 with value: 63.319762280407474.[0m


63.40996335304223
{'n_d': 16, 'n_a': 16, 'n_steps': 4, 'gamma': 1.81273258464264, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.007609644919298512, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.0001, 'weight_decay': 0.001, 'momentum': 0.3469530392597058}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 100.09451|  0:00:22s
epoch 1  | loss: 99.79245|  0:00:44s
epoch 2  | loss: 99.5975 |  0:01:06s


[32m[I 2021-08-24 03:36:06,028][0m Trial 8 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 4, 'gamma': 1.981077726431145, 'n_independent': 3, 'n_shared': 2, 'lambda_sparse': 0.0060132832946219105, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.001, 'weight_decay': 0.001, 'momentum': 0.25587704243414433}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 124.42811|  0:00:24s
epoch 1  | loss: 122.14156|  0:00:49s
epoch 2  | loss: 120.06226|  0:01:14s


[32m[I 2021-08-24 03:37:31,282][0m Trial 9 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.1343091935125278, 'n_independent': 1, 'n_shared': 3, 'lambda_sparse': 0.00420194756153557, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 1e-05, 'momentum': 0.046507231827225465}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 70.46749|  0:00:17s
epoch 1  | loss: 64.95214|  0:00:34s
epoch 2  | loss: 63.8171 |  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.40553|  0:00:17s
epoch 1  | loss: 63.30005|  0:00:34s
epoch 2  | loss: 63.16523|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.04891|  0:00:17s
epoch 1  | loss: 62.9969

[32m[I 2021-08-24 03:47:21,194][0m Trial 10 finished with value: 63.485203373193656 and parameters: {'MAX_EPOCH': 5000, 'N_STEPS': 3, 'GAMMA': 1.1343091935125278, 'N_INDEPENDENT': 1, 'N_SHARED': 3, 'LAMBDA_SPARSE': 0.00420194756153557, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 1e-05, 'OPT_MOMENTUM': 0.046507231827225465}. Best is trial 3 with value: 63.319762280407474.[0m


63.485203373193656
{'n_d': 16, 'n_a': 16, 'n_steps': 4, 'gamma': 1.4145548829829768, 'n_independent': 2, 'n_shared': 2, 'lambda_sparse': 0.009782151228067564, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 1e-08, 'momentum': 0.11122110883944247}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 75.91773|  0:00:22s
epoch 1  | loss: 68.48592|  0:00:44s
epoch 2  | loss: 66.39994|  0:01:05s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 65.33013|  0:00:22s
epoch 1  | loss: 64.51223|  0:00:44s
epoch 2  | loss: 64.03236|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.78145|  0:00:21s
epoch

[32m[I 2021-08-24 04:12:00,300][0m Trial 11 finished with value: 63.26620134708621 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 4, 'GAMMA': 1.4145548829829768, 'N_INDEPENDENT': 2, 'N_SHARED': 2, 'LAMBDA_SPARSE': 0.009782151228067564, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 1e-08, 'OPT_MOMENTUM': 0.11122110883944247}. Best is trial 11 with value: 63.26620134708621.[0m


63.26620134708621
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.3003148302060161, 'n_independent': 2, 'n_shared': 3, 'lambda_sparse': 0.003116861297779136, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 1e-08, 'momentum': 0.11275915662462735}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 74.93988|  0:00:19s
epoch 1  | loss: 65.45455|  0:00:39s
epoch 2  | loss: 63.98345|  0:00:59s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.51179|  0:00:19s
epoch 1  | loss: 63.38721|  0:00:39s
epoch 2  | loss: 63.20329|  0:00:59s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.15799|  0:00:19s
epoch 

[32m[I 2021-08-24 04:25:34,700][0m Trial 12 finished with value: 63.32554771231474 and parameters: {'MAX_EPOCH': 5000, 'N_STEPS': 3, 'GAMMA': 1.3003148302060161, 'N_INDEPENDENT': 2, 'N_SHARED': 3, 'LAMBDA_SPARSE': 0.003116861297779136, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 1e-08, 'OPT_MOMENTUM': 0.11275915662462735}. Best is trial 11 with value: 63.26620134708621.[0m


63.32554771231474
{'n_d': 16, 'n_a': 16, 'n_steps': 4, 'gamma': 1.303033072350065, 'n_independent': 2, 'n_shared': 2, 'lambda_sparse': 0.00992749387096117, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 1e-08, 'momentum': 0.12712226000573382}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 75.51927|  0:00:22s
epoch 1  | loss: 67.93419|  0:00:44s
epoch 2  | loss: 65.88359|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 64.69494|  0:00:22s
epoch 1  | loss: 64.08054|  0:00:44s
epoch 2  | loss: 63.79329|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.55862|  0:00:22s
epoch 1 

[32m[I 2021-08-24 04:50:14,812][0m Trial 13 finished with value: 63.29301032713057 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 4, 'GAMMA': 1.303033072350065, 'N_INDEPENDENT': 2, 'N_SHARED': 2, 'LAMBDA_SPARSE': 0.00992749387096117, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 1e-08, 'OPT_MOMENTUM': 0.12712226000573382}. Best is trial 11 with value: 63.26620134708621.[0m


63.29301032713057
{'n_d': 16, 'n_a': 16, 'n_steps': 4, 'gamma': 1.585002115419822, 'n_independent': 2, 'n_shared': 2, 'lambda_sparse': 0.009993324558691553, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 1e-08, 'momentum': 0.12756076277503292}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 76.67242|  0:00:22s
epoch 1  | loss: 68.64703|  0:00:44s
epoch 2  | loss: 66.68172|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 65.23037|  0:00:22s
epoch 1  | loss: 64.59565|  0:00:44s
epoch 2  | loss: 64.04005|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.70088|  0:00:21s
epoch 1

[32m[I 2021-08-24 05:11:19,919][0m Trial 14 finished with value: 63.17791450736386 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 4, 'GAMMA': 1.585002115419822, 'N_INDEPENDENT': 2, 'N_SHARED': 2, 'LAMBDA_SPARSE': 0.009993324558691553, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 1e-08, 'OPT_MOMENTUM': 0.12756076277503292}. Best is trial 14 with value: 63.17791450736386.[0m


63.17791450736386
{'n_d': 16, 'n_a': 16, 'n_steps': 4, 'gamma': 1.6222324770472052, 'n_independent': 1, 'n_shared': 3, 'lambda_sparse': 0.007416246824718397, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 1e-08, 'momentum': 0.13623313134486095}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 76.61048|  0:00:22s
epoch 1  | loss: 69.55819|  0:00:43s
epoch 2  | loss: 67.18405|  0:01:05s


[32m[I 2021-08-24 05:12:35,605][0m Trial 15 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 4, 'gamma': 1.4524659523672803, 'n_independent': 2, 'n_shared': 2, 'lambda_sparse': 0.009950156356179492, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.01, 'weight_decay': 1e-08, 'momentum': 0.020394149264184383}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 106.55304|  0:00:22s
epoch 1  | loss: 83.96259|  0:00:44s
epoch 2  | loss: 76.80799|  0:01:06s


[32m[I 2021-08-24 05:13:51,413][0m Trial 16 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 4, 'gamma': 1.6839065984311312, 'n_independent': 1, 'n_shared': 1, 'lambda_sparse': 0.0025376174094229433, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 1e-08, 'momentum': 0.07526706703914764}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 74.29238|  0:00:16s
epoch 1  | loss: 67.93359|  0:00:32s
epoch 2  | loss: 65.82514|  0:00:48s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 64.47382|  0:00:16s
epoch 1  | loss: 63.83307|  0:00:32s
epoch 2  | loss: 63.44434|  0:00:48s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.24044|  0:00:16s
epoch 1  | loss: 63.088

[32m[I 2021-08-24 05:22:51,093][0m Trial 17 finished with value: 63.40669920388071 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 4, 'GAMMA': 1.6839065984311312, 'N_INDEPENDENT': 1, 'N_SHARED': 1, 'LAMBDA_SPARSE': 0.0025376174094229433, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 1e-08, 'OPT_MOMENTUM': 0.07526706703914764}. Best is trial 14 with value: 63.17791450736386.[0m


63.40669920388071
{'n_d': 16, 'n_a': 16, 'n_steps': 4, 'gamma': 1.4083913680781868, 'n_independent': 2, 'n_shared': 3, 'lambda_sparse': 0.00744256400490399, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.0001, 'momentum': 0.1662516219001998}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 75.75508|  0:00:24s
epoch 1  | loss: 69.69306|  0:00:49s
epoch 2  | loss: 67.38528|  0:01:14s


[32m[I 2021-08-24 05:24:16,213][0m Trial 18 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 5, 'gamma': 1.5595794681530166, 'n_independent': 3, 'n_shared': 2, 'lambda_sparse': 0.0011945534143917823, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 1e-08, 'momentum': 0.16832763115533567}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 88.82929|  0:00:30s
epoch 1  | loss: 73.2216 |  0:01:00s
epoch 2  | loss: 71.05242|  0:01:30s


[32m[I 2021-08-24 05:25:58,844][0m Trial 19 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 4, 'gamma': 1.1788850963889328, 'n_independent': 2, 'n_shared': 2, 'lambda_sparse': 0.004613517149276819, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.01, 'weight_decay': 1e-08, 'momentum': 0.08647486036475441}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 101.73548|  0:00:22s
epoch 1  | loss: 79.24416|  0:00:44s
epoch 2  | loss: 74.11311|  0:01:06s


[32m[I 2021-08-24 05:27:14,656][0m Trial 20 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 4, 'gamma': 1.30868921167157, 'n_independent': 2, 'n_shared': 2, 'lambda_sparse': 0.009588751128603626, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 1e-08, 'momentum': 0.15745675347393767}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 75.09701|  0:00:22s
epoch 1  | loss: 67.7968 |  0:00:44s
epoch 2  | loss: 65.71622|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 64.55962|  0:00:21s
epoch 1  | loss: 63.97726|  0:00:43s
epoch 2  | loss: 63.54504|  0:01:05s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.35702|  0:00:22s
epoch 1  | loss: 63.26534|

[32m[I 2021-08-24 05:47:08,953][0m Trial 21 finished with value: 63.165082325876625 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 4, 'GAMMA': 1.30868921167157, 'N_INDEPENDENT': 2, 'N_SHARED': 2, 'LAMBDA_SPARSE': 0.009588751128603626, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 1e-08, 'OPT_MOMENTUM': 0.15745675347393767}. Best is trial 21 with value: 63.165082325876625.[0m


63.165082325876625
{'n_d': 16, 'n_a': 16, 'n_steps': 4, 'gamma': 1.3841683550645967, 'n_independent': 2, 'n_shared': 2, 'lambda_sparse': 0.009881171491928825, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 1e-08, 'momentum': 0.17751450847383385}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 75.22069|  0:00:22s
epoch 1  | loss: 67.89556|  0:00:44s
epoch 2  | loss: 65.95131|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 64.7626 |  0:00:21s
epoch 1  | loss: 64.11842|  0:00:44s
epoch 2  | loss: 63.62233|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.45785|  0:00:21s
epoch

[32m[I 2021-08-24 05:52:11,510][0m Trial 22 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 4, 'gamma': 1.235954801499835, 'n_independent': 2, 'n_shared': 2, 'lambda_sparse': 0.008416320417564987, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 1e-08, 'momentum': 0.08332017193221376}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 75.31253|  0:00:22s
epoch 1  | loss: 68.05407|  0:00:44s
epoch 2  | loss: 65.78873|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 64.81416|  0:00:22s
epoch 1  | loss: 64.1692 |  0:00:44s
epoch 2  | loss: 63.75619|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.538  |  0:00:21s
epoch 1  | loss: 63.40305

[32m[I 2021-08-24 05:57:14,097][0m Trial 23 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 4, 'gamma': 1.5288513975414133, 'n_independent': 2, 'n_shared': 2, 'lambda_sparse': 0.009005505608690649, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 1e-08, 'momentum': 0.14597584594477908}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 76.39824|  0:00:22s
epoch 1  | loss: 68.31108|  0:00:44s
epoch 2  | loss: 66.45672|  0:01:06s


[32m[I 2021-08-24 05:58:29,875][0m Trial 24 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 4, 'gamma': 1.6624473659904146, 'n_independent': 1, 'n_shared': 2, 'lambda_sparse': 0.006691486574627247, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.05, 'weight_decay': 1e-08, 'momentum': 0.28569691471898473}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 77.13349|  0:00:19s
epoch 1  | loss: 70.303  |  0:00:38s
epoch 2  | loss: 68.63738|  0:00:57s


[32m[I 2021-08-24 05:59:36,261][0m Trial 25 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.3480966558027054, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.009180204583287064, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.0001, 'weight_decay': 0.0001, 'momentum': 0.1872380839288239}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 104.73915|  0:00:17s
epoch 1  | loss: 104.12555|  0:00:35s
epoch 2  | loss: 103.72631|  0:00:52s


[32m[I 2021-08-24 06:00:36,646][0m Trial 26 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 5, 'gamma': 1.219569520476413, 'n_independent': 2, 'n_shared': 3, 'lambda_sparse': 0.008008077334386516, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 1e-08, 'momentum': 0.10350594905506696}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 99.5449 |  0:00:30s
epoch 1  | loss: 73.00013|  0:01:00s
epoch 2  | loss: 70.94226|  0:01:30s


[32m[I 2021-08-24 06:02:19,147][0m Trial 27 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 4, 'gamma': 1.4826385998302847, 'n_independent': 2, 'n_shared': 2, 'lambda_sparse': 0.009305366905976551, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 1e-08, 'momentum': 0.052081576819035684}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 76.44739|  0:00:22s
epoch 1  | loss: 68.39207|  0:00:44s
epoch 2  | loss: 66.48201|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 65.2716 |  0:00:21s
epoch 1  | loss: 64.36426|  0:00:44s
epoch 2  | loss: 63.99879|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.63209|  0:00:22s
epoch 1  | loss: 63.516

[32m[I 2021-08-24 06:25:23,449][0m Trial 28 finished with value: 63.25932631229833 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 4, 'GAMMA': 1.4826385998302847, 'N_INDEPENDENT': 2, 'N_SHARED': 2, 'LAMBDA_SPARSE': 0.009305366905976551, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 1e-08, 'OPT_MOMENTUM': 0.052081576819035684}. Best is trial 21 with value: 63.165082325876625.[0m


63.25932631229833
{'n_d': 16, 'n_a': 16, 'n_steps': 5, 'gamma': 1.5032276416825772, 'n_independent': 1, 'n_shared': 1, 'lambda_sparse': 0.007029426553327786, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.01, 'weight_decay': 1e-08, 'momentum': 0.0484405929937903}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 74.08504|  0:00:19s
epoch 1  | loss: 73.57095|  0:00:39s
epoch 2  | loss: 73.27465|  0:00:58s


[32m[I 2021-08-24 06:26:32,232][0m Trial 29 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 4, 'gamma': 1.6208643703158094, 'n_independent': 3, 'n_shared': 2, 'lambda_sparse': 0.009287150018543465, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 1e-05, 'momentum': 0.013844702955601938}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 83.47176|  0:00:24s
epoch 1  | loss: 70.48211|  0:00:49s
epoch 2  | loss: 68.27792|  0:01:14s


[32m[I 2021-08-24 06:27:57,535][0m Trial 30 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 4, 'gamma': 1.4678479828568607, 'n_independent': 2, 'n_shared': 2, 'lambda_sparse': 0.00960152735775091, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 1e-08, 'momentum': 0.061321782363252086}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 76.51294|  0:00:22s
epoch 1  | loss: 68.69168|  0:00:44s
epoch 2  | loss: 66.82289|  0:01:06s


[32m[I 2021-08-24 06:29:13,349][0m Trial 31 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 4, 'gamma': 1.3824433561587595, 'n_independent': 2, 'n_shared': 2, 'lambda_sparse': 0.008658646807205728, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 1e-08, 'momentum': 0.10684745539793014}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 75.72726|  0:00:22s
epoch 1  | loss: 67.91288|  0:00:44s
epoch 2  | loss: 66.07438|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 64.76722|  0:00:22s
epoch 1  | loss: 64.2058 |  0:00:44s
epoch 2  | loss: 63.87089|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.54813|  0:00:22s
epoch 1  | loss: 63.4310

[32m[I 2021-08-24 06:33:00,626][0m Trial 32 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 4, 'gamma': 1.7430751176833559, 'n_independent': 2, 'n_shared': 2, 'lambda_sparse': 0.007954008996198894, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.14737881727423588}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 77.69194|  0:00:22s
epoch 1  | loss: 68.47371|  0:00:44s
epoch 2  | loss: 66.28355|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 64.86454|  0:00:21s
epoch 1  | loss: 64.07705|  0:00:43s
epoch 2  | loss: 63.5635 |  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.34508|  0:00:22s
epoch 1  | loss: 63.1575

[32m[I 2021-08-24 06:47:23,512][0m Trial 33 finished with value: 62.894696943139856 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 4, 'GAMMA': 1.7430751176833559, 'N_INDEPENDENT': 2, 'N_SHARED': 2, 'LAMBDA_SPARSE': 0.007954008996198894, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.14737881727423588}. Best is trial 33 with value: 62.894696943139856.[0m


62.894696943139856
{'n_d': 16, 'n_a': 16, 'n_steps': 4, 'gamma': 1.7478214934954, 'n_independent': 2, 'n_shared': 2, 'lambda_sparse': 0.00806409791015495, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.0001, 'weight_decay': 0.001, 'momentum': 0.1987927619189804}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 127.53714|  0:00:22s
epoch 1  | loss: 127.08568|  0:00:44s
epoch 2  | loss: 126.46354|  0:01:06s


[32m[I 2021-08-24 06:48:39,317][0m Trial 34 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.744373659187181, 'n_independent': 2, 'n_shared': 2, 'lambda_sparse': 0.005635637640041961, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.1459471311383962}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 72.56262|  0:00:17s
epoch 1  | loss: 65.25443|  0:00:35s
epoch 2  | loss: 63.69821|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.31019|  0:00:17s
epoch 1  | loss: 63.10999|  0:00:34s
epoch 2  | loss: 63.01461|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.97133|  0:00:17s
epoch 1  | loss: 62.92951|

[32m[I 2021-08-24 06:59:08,849][0m Trial 35 finished with value: 63.151900993382036 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 3, 'GAMMA': 1.744373659187181, 'N_INDEPENDENT': 2, 'N_SHARED': 2, 'LAMBDA_SPARSE': 0.005635637640041961, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.1459471311383962}. Best is trial 33 with value: 62.894696943139856.[0m


63.151900993382036
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.8353390255848947, 'n_independent': 2, 'n_shared': 2, 'lambda_sparse': 0.005480359621173101, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.05, 'weight_decay': 0.001, 'momentum': 0.15199760095934775}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 76.03844|  0:00:17s
epoch 1  | loss: 69.07205|  0:00:35s
epoch 2  | loss: 66.58685|  0:00:52s


[32m[I 2021-08-24 07:00:09,163][0m Trial 36 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.7354074541748936, 'n_independent': 2, 'n_shared': 2, 'lambda_sparse': 0.0035880720732672706, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.001, 'weight_decay': 0.001, 'momentum': 0.2141284855118129}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 93.19781|  0:00:17s
epoch 1  | loss: 89.98088|  0:00:35s
epoch 2  | loss: 87.49403|  0:00:52s


[32m[I 2021-08-24 07:01:09,461][0m Trial 37 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.8794367016307136, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.006447644324997224, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.15621468696998464}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 72.63829|  0:00:17s
epoch 1  | loss: 65.38788|  0:00:35s
epoch 2  | loss: 63.70181|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.28654|  0:00:17s
epoch 1  | loss: 63.12311|  0:00:35s
epoch 2  | loss: 63.02009|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.98928|  0:00:17s
epoch 1  | loss: 62.9266

[32m[I 2021-08-24 07:11:02,045][0m Trial 38 finished with value: 63.13356472698337 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 3, 'GAMMA': 1.8794367016307136, 'N_INDEPENDENT': 3, 'N_SHARED': 1, 'LAMBDA_SPARSE': 0.006447644324997224, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.15621468696998464}. Best is trial 33 with value: 62.894696943139856.[0m


63.13356472698337
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.901349659299048, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.005868200902152614, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.01, 'weight_decay': 0.001, 'momentum': 0.27746754707907406}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 89.26026|  0:00:17s
epoch 1  | loss: 76.0324 |  0:00:35s
epoch 2  | loss: 72.51093|  0:00:52s


[32m[I 2021-08-24 07:12:02,464][0m Trial 39 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.823120636202522, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.006570367738257818, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.0001, 'weight_decay': 0.001, 'momentum': 0.195170928249647}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 104.85203|  0:00:17s
epoch 1  | loss: 104.29324|  0:00:35s
epoch 2  | loss: 103.96543|  0:00:52s


[32m[I 2021-08-24 07:13:02,907][0m Trial 40 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.9159271606755643, 'n_independent': 2, 'n_shared': 1, 'lambda_sparse': 0.005194797529660402, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.15850692818447867}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 71.0875 |  0:00:15s
epoch 1  | loss: 65.04705|  0:00:30s
epoch 2  | loss: 63.56919|  0:00:45s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.18543|  0:00:15s
epoch 1  | loss: 63.0915 |  0:00:30s
epoch 2  | loss: 63.0222 |  0:00:45s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.99403|  0:00:15s
epoch 1  | loss: 62.9283

[32m[I 2021-08-24 07:20:34,947][0m Trial 41 finished with value: 63.05322322003894 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 3, 'GAMMA': 1.9159271606755643, 'N_INDEPENDENT': 2, 'N_SHARED': 1, 'LAMBDA_SPARSE': 0.005194797529660402, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.15850692818447867}. Best is trial 33 with value: 62.894696943139856.[0m


63.05322322003894
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.934210563034585, 'n_independent': 2, 'n_shared': 1, 'lambda_sparse': 0.005174766552102482, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.16480775061806832}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 71.18352|  0:00:15s
epoch 1  | loss: 65.35517|  0:00:30s
epoch 2  | loss: 63.72713|  0:00:45s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.26165|  0:00:15s
epoch 1  | loss: 63.11922|  0:00:30s
epoch 2  | loss: 63.03059|  0:00:45s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.98672|  0:00:15s
epoch 1

[32m[I 2021-08-24 07:28:56,259][0m Trial 42 finished with value: 62.92552871774355 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 3, 'GAMMA': 1.934210563034585, 'N_INDEPENDENT': 2, 'N_SHARED': 1, 'LAMBDA_SPARSE': 0.005174766552102482, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.16480775061806832}. Best is trial 33 with value: 62.894696943139856.[0m


62.92552871774355
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.9077031427572049, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.004900229222724117, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.22374525858798955}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 72.18184|  0:00:17s
epoch 1  | loss: 65.1142 |  0:00:35s
epoch 2  | loss: 63.76938|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.34592|  0:00:17s
epoch 1  | loss: 63.17256|  0:00:35s
epoch 2  | loss: 63.05454|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.0412 |  0:00:17s
epoch 

[32m[I 2021-08-24 07:37:33,309][0m Trial 43 finished with value: 63.01937407476282 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 3, 'GAMMA': 1.9077031427572049, 'N_INDEPENDENT': 3, 'N_SHARED': 1, 'LAMBDA_SPARSE': 0.004900229222724117, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.22374525858798955}. Best is trial 33 with value: 62.894696943139856.[0m


63.01937407476282
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.9394848405615424, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.005030956665339209, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.22818521822302093}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 72.2253 |  0:00:17s
epoch 1  | loss: 65.23493|  0:00:35s
epoch 2  | loss: 63.74073|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.28403|  0:00:17s
epoch 1  | loss: 63.13831|  0:00:35s
epoch 2  | loss: 63.04745|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.03189|  0:00:17s
epoch 

[32m[I 2021-08-24 07:47:25,704][0m Trial 44 finished with value: 63.0639694385897 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 3, 'GAMMA': 1.9394848405615424, 'N_INDEPENDENT': 3, 'N_SHARED': 1, 'LAMBDA_SPARSE': 0.005030956665339209, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.22818521822302093}. Best is trial 33 with value: 62.894696943139856.[0m


63.0639694385897
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.943426836486172, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.004526594878326191, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.001, 'weight_decay': 0.001, 'momentum': 0.22544124299570742}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 102.89851|  0:00:17s
epoch 1  | loss: 98.74026|  0:00:35s
epoch 2  | loss: 95.40212|  0:00:52s


[32m[I 2021-08-24 07:48:26,081][0m Trial 45 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.9747720557124226, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.0040240725427461025, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.05, 'weight_decay': 0.001, 'momentum': 0.263952740896216}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 76.16468|  0:00:17s
epoch 1  | loss: 67.8262 |  0:00:35s
epoch 2  | loss: 65.6114 |  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 64.43163|  0:00:17s
epoch 1  | loss: 63.81939|  0:00:35s
epoch 2  | loss: 63.47522|  0:00:52s


[32m[I 2021-08-24 07:50:26,843][0m Trial 46 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.925215712404879, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.00509882416478592, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.3068958151533216}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 71.59231|  0:00:17s
epoch 1  | loss: 64.49296|  0:00:35s
epoch 2  | loss: 63.40467|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.15708|  0:00:17s
epoch 1  | loss: 63.05252|  0:00:35s
epoch 2  | loss: 62.9942 |  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.94546|  0:00:17s
epoch 1  | loss: 62.88608| 

[32m[I 2021-08-24 07:59:03,528][0m Trial 47 finished with value: 63.076178137344456 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 3, 'GAMMA': 1.925215712404879, 'N_INDEPENDENT': 3, 'N_SHARED': 1, 'LAMBDA_SPARSE': 0.00509882416478592, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.3068958151533216}. Best is trial 33 with value: 62.894696943139856.[0m


63.076178137344456
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.8543930477748467, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.004721744607715571, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 1e-06, 'momentum': 0.241792240420602}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 72.14148|  0:00:17s
epoch 1  | loss: 65.14425|  0:00:35s
epoch 2  | loss: 63.9361 |  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.33086|  0:00:17s
epoch 1  | loss: 63.16336|  0:00:35s
epoch 2  | loss: 63.07986|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.99515|  0:00:17s
epoch 1

[32m[I 2021-08-24 08:08:55,716][0m Trial 48 finished with value: 63.36396984562183 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 3, 'GAMMA': 1.8543930477748467, 'N_INDEPENDENT': 3, 'N_SHARED': 1, 'LAMBDA_SPARSE': 0.004721744607715571, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 1e-06, 'OPT_MOMENTUM': 0.241792240420602}. Best is trial 33 with value: 62.894696943139856.[0m


63.36396984562183
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.7887930695566006, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.00255845805933815, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.39724731975892535}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 70.67778|  0:00:17s
epoch 1  | loss: 64.25902|  0:00:35s
epoch 2  | loss: 63.44557|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.18793|  0:00:17s
epoch 1  | loss: 63.1015 |  0:00:35s
epoch 2  | loss: 63.01994|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.99497|  0:00:17s
epoch 1

[32m[I 2021-08-24 08:17:32,694][0m Trial 49 finished with value: 62.929815645198275 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 3, 'GAMMA': 1.7887930695566006, 'N_INDEPENDENT': 3, 'N_SHARED': 1, 'LAMBDA_SPARSE': 0.00255845805933815, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.39724731975892535}. Best is trial 33 with value: 62.894696943139856.[0m


62.929815645198275
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.9857230360799634, 'n_independent': 2, 'n_shared': 1, 'lambda_sparse': 0.0017516952609498162, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.001, 'weight_decay': 0.001, 'momentum': 0.394801290396137}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 86.4104 |  0:00:15s
epoch 1  | loss: 83.9402 |  0:00:30s
epoch 2  | loss: 81.88992|  0:00:45s


[32m[I 2021-08-24 08:18:25,518][0m Trial 50 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.999863428081797, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.002654740509095443, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.20902069290086636}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 72.55702|  0:00:17s
epoch 1  | loss: 65.18461|  0:00:35s
epoch 2  | loss: 63.69168|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.3221 |  0:00:17s
epoch 1  | loss: 63.10986|  0:00:35s
epoch 2  | loss: 63.02691|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.98021|  0:00:17s
epoch 1  | loss: 62.94632

[32m[I 2021-08-24 08:28:36,597][0m Trial 51 finished with value: 63.182956101589774 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 3, 'GAMMA': 1.999863428081797, 'N_INDEPENDENT': 3, 'N_SHARED': 1, 'LAMBDA_SPARSE': 0.002654740509095443, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.20902069290086636}. Best is trial 33 with value: 62.894696943139856.[0m


63.182956101589774
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.793035013007968, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.00029917239991381767, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.33462728870926206}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 71.14515|  0:00:17s
epoch 1  | loss: 64.36684|  0:00:35s
epoch 2  | loss: 63.42285|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.18727|  0:00:17s
epoch 1  | loss: 63.0977 |  0:00:34s
epoch 2  | loss: 63.04116|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.98803|  0:00:17s
epoc

[32m[I 2021-08-24 08:38:28,891][0m Trial 52 finished with value: 63.004163931606385 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 3, 'GAMMA': 1.793035013007968, 'N_INDEPENDENT': 3, 'N_SHARED': 1, 'LAMBDA_SPARSE': 0.00029917239991381767, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.33462728870926206}. Best is trial 33 with value: 62.894696943139856.[0m


63.004163931606385
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.79627006586357, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.00039172628583960905, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.35127402544190567}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 71.11692|  0:00:17s
epoch 1  | loss: 64.4184 |  0:00:35s
epoch 2  | loss: 63.41958|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.19958|  0:00:17s
epoch 1  | loss: 63.08405|  0:00:35s
epoch 2  | loss: 62.98388|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.93006|  0:00:17s
epoch

[32m[I 2021-08-24 08:47:05,815][0m Trial 53 finished with value: 63.03851176971118 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 3, 'GAMMA': 1.79627006586357, 'N_INDEPENDENT': 3, 'N_SHARED': 1, 'LAMBDA_SPARSE': 0.00039172628583960905, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.35127402544190567}. Best is trial 33 with value: 62.894696943139856.[0m


63.03851176971118
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.7884437387069754, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 8.008432567949174e-05, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.37465897143870086}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 70.66642|  0:00:17s
epoch 1  | loss: 64.25075|  0:00:35s
epoch 2  | loss: 63.32127|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.15099|  0:00:17s
epoch 1  | loss: 63.06029|  0:00:35s
epoch 2  | loss: 62.98471|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.92835|  0:00:17s
epoch

[32m[I 2021-08-24 08:55:42,596][0m Trial 54 finished with value: 62.927615370635415 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 3, 'GAMMA': 1.7884437387069754, 'N_INDEPENDENT': 3, 'N_SHARED': 1, 'LAMBDA_SPARSE': 8.008432567949174e-05, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.37465897143870086}. Best is trial 33 with value: 62.894696943139856.[0m


62.927615370635415
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.7023500764040609, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 7.859360119910559e-05, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 1e-05, 'momentum': 0.37290636186477727}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 70.80122|  0:00:17s
epoch 1  | loss: 64.31377|  0:00:35s
epoch 2  | loss: 63.51225|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.24307|  0:00:17s
epoch 1  | loss: 63.13366|  0:00:34s
epoch 2  | loss: 63.0621 |  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.02767|  0:00:17s
epoc

[32m[I 2021-08-24 08:58:43,504][0m Trial 55 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.7772267817606713, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.0011751903904642148, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 1e-06, 'momentum': 0.3724912491357574}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 71.07379|  0:00:17s
epoch 1  | loss: 64.56316|  0:00:35s
epoch 2  | loss: 63.56355|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.30245|  0:00:17s
epoch 1  | loss: 63.16696|  0:00:35s
epoch 2  | loss: 63.13091|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.07046|  0:00:17s
epoch 1  | loss: 63.0453

[32m[I 2021-08-24 09:01:44,480][0m Trial 56 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.879704715327426, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.0008963611655299962, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.0001, 'momentum': 0.3331132642147828}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 71.55073|  0:00:17s
epoch 1  | loss: 64.81274|  0:00:35s
epoch 2  | loss: 63.56261|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.24847|  0:00:17s
epoch 1  | loss: 63.17702|  0:00:35s
epoch 2  | loss: 63.07696|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.03263|  0:00:17s
epoch 1  | loss: 62.9801

[32m[I 2021-08-24 09:04:45,398][0m Trial 57 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.853887055386389, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.0019605935396092397, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.37864493360374896}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 71.00567|  0:00:17s
epoch 1  | loss: 64.24062|  0:00:35s
epoch 2  | loss: 63.38431|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.15165|  0:00:17s
epoch 1  | loss: 63.10126|  0:00:34s
epoch 2  | loss: 63.0382 |  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.98804|  0:00:17s
epoch 1  | loss: 62.9514

[32m[I 2021-08-24 09:13:22,654][0m Trial 58 finished with value: 63.08680724221384 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 3, 'GAMMA': 1.853887055386389, 'N_INDEPENDENT': 3, 'N_SHARED': 1, 'LAMBDA_SPARSE': 0.0019605935396092397, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.37864493360374896}. Best is trial 33 with value: 62.894696943139856.[0m


63.08680724221384
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.794697596498357, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.0031543083463797003, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.05, 'weight_decay': 0.001, 'momentum': 0.3330498026069268}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 75.08654|  0:00:17s
epoch 1  | loss: 67.0272 |  0:00:35s
epoch 2  | loss: 64.9483 |  0:00:52s


[32m[I 2021-08-24 09:14:23,172][0m Trial 59 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.6944706144654105, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.0005862616236343487, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.0001, 'weight_decay': 0.001, 'momentum': 0.3545631666376906}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 104.76101|  0:00:17s
epoch 1  | loss: 104.07338|  0:00:35s
epoch 2  | loss: 103.61022|  0:00:52s


[32m[I 2021-08-24 09:15:23,635][0m Trial 60 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.8019468373463163, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 5.539593740575472e-05, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.3572596508350987}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 70.87506|  0:00:17s
epoch 1  | loss: 63.95825|  0:00:35s
epoch 2  | loss: 63.29182|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.11142|  0:00:17s
epoch 1  | loss: 63.04078|  0:00:35s
epoch 2  | loss: 62.99967|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.92278|  0:00:17s
epoch 1  | loss: 62.8572

[32m[I 2021-08-24 09:25:16,954][0m Trial 61 finished with value: 63.012547490808046 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 3, 'GAMMA': 1.8019468373463163, 'N_INDEPENDENT': 3, 'N_SHARED': 1, 'LAMBDA_SPARSE': 5.539593740575472e-05, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.3572596508350987}. Best is trial 33 with value: 62.894696943139856.[0m


63.012547490808046
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.647277212788032, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.00017549607272840194, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.3088505732257468}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 71.14798|  0:00:17s
epoch 1  | loss: 64.21884|  0:00:35s
epoch 2  | loss: 63.33655|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.11965|  0:00:17s
epoch 1  | loss: 63.03723|  0:00:35s
epoch 2  | loss: 62.95345|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.94305|  0:00:17s
epoch

[32m[I 2021-08-24 09:35:10,251][0m Trial 62 finished with value: 63.111725986717765 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 3, 'GAMMA': 1.647277212788032, 'N_INDEPENDENT': 3, 'N_SHARED': 1, 'LAMBDA_SPARSE': 0.00017549607272840194, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.3088505732257468}. Best is trial 33 with value: 62.894696943139856.[0m


63.111725986717765
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.7220749943027565, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.0017327938035905308, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.38568831780695034}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 70.76078|  0:00:17s
epoch 1  | loss: 64.17508|  0:00:35s
epoch 2  | loss: 63.34081|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.15625|  0:00:17s
epoch 1  | loss: 63.06187|  0:00:35s
epoch 2  | loss: 63.0186 |  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.95107|  0:00:17s
epoc

[32m[I 2021-08-24 09:45:59,602][0m Trial 63 finished with value: 62.881970075760684 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 3, 'GAMMA': 1.7220749943027565, 'N_INDEPENDENT': 3, 'N_SHARED': 1, 'LAMBDA_SPARSE': 0.0017327938035905308, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.38568831780695034}. Best is trial 63 with value: 62.881970075760684.[0m


62.881970075760684
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.717352590311143, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.0015464585594017384, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.3997657793902783}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 70.5895 |  0:00:17s
epoch 1  | loss: 64.30353|  0:00:35s
epoch 2  | loss: 63.33331|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.13259|  0:00:17s
epoch 1  | loss: 63.06924|  0:00:35s
epoch 2  | loss: 62.99444|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.9405 |  0:00:17s
epoch 

[32m[I 2021-08-24 09:56:30,017][0m Trial 64 finished with value: 63.00332048555625 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 3, 'GAMMA': 1.717352590311143, 'N_INDEPENDENT': 3, 'N_SHARED': 1, 'LAMBDA_SPARSE': 0.0015464585594017384, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.3997657793902783}. Best is trial 63 with value: 62.881970075760684.[0m


63.00332048555625
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.593093432413831, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.0018987095006724197, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.01, 'weight_decay': 0.001, 'momentum': 0.398993539049872}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 86.3564 |  0:00:17s
epoch 1  | loss: 73.73925|  0:00:35s
epoch 2  | loss: 71.07623|  0:00:52s


[32m[I 2021-08-24 09:57:30,450][0m Trial 65 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.7004021603388741, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.001500776404975598, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 1e-05, 'momentum': 0.3872485301411334}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 70.7749 |  0:00:17s
epoch 1  | loss: 64.42449|  0:00:35s
epoch 2  | loss: 63.51569|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.2383 |  0:00:17s
epoch 1  | loss: 63.16097|  0:00:35s
epoch 2  | loss: 63.09906|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.04045|  0:00:17s
epoch 1  | loss: 63.00041

[32m[I 2021-08-24 10:00:31,446][0m Trial 66 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 5, 'gamma': 1.7161924292651984, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.0008087908873502478, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.0001, 'momentum': 0.36318713644422584}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 78.16903|  0:00:26s
epoch 1  | loss: 71.35592|  0:00:53s
epoch 2  | loss: 69.9877 |  0:01:19s


[32m[I 2021-08-24 10:02:03,291][0m Trial 67 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.0083374946651071, 'n_independent': 3, 'n_shared': 3, 'lambda_sparse': 0.002635478523931188, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.3838017992965024}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 73.30735|  0:00:22s
epoch 1  | loss: 64.83088|  0:00:44s
epoch 2  | loss: 63.70497|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.27934|  0:00:22s
epoch 1  | loss: 63.13715|  0:00:44s
epoch 2  | loss: 63.05284|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.96867|  0:00:22s
epoch 1  | loss: 62.85547

[32m[I 2021-08-24 10:13:12,994][0m Trial 68 finished with value: 62.82561528772247 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 3, 'GAMMA': 1.0083374946651071, 'N_INDEPENDENT': 3, 'N_SHARED': 3, 'LAMBDA_SPARSE': 0.002635478523931188, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.3838017992965024}. Best is trial 68 with value: 62.82561528772247.[0m


62.82561528772247
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.0777585494047368, 'n_independent': 3, 'n_shared': 3, 'lambda_sparse': 0.002358845249430625, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.3848516598489421}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 73.48709|  0:00:22s
epoch 1  | loss: 64.84363|  0:00:44s
epoch 2  | loss: 63.7439 |  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.39514|  0:00:22s
epoch 1  | loss: 63.18716|  0:00:44s
epoch 2  | loss: 63.12854|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.00316|  0:00:22s
epoch 1

[32m[I 2021-08-24 10:24:22,445][0m Trial 69 finished with value: 62.931829380169546 and parameters: {'MAX_EPOCH': 3000, 'N_STEPS': 3, 'GAMMA': 1.0777585494047368, 'N_INDEPENDENT': 3, 'N_SHARED': 3, 'LAMBDA_SPARSE': 0.002358845249430625, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.3848516598489421}. Best is trial 68 with value: 62.82561528772247.[0m


62.931829380169546
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.0158404735617563, 'n_independent': 2, 'n_shared': 3, 'lambda_sparse': 0.002607990698499623, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 1e-06, 'momentum': 0.3831561637638316}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 71.59301|  0:00:19s
epoch 1  | loss: 64.2203 |  0:00:39s
epoch 2  | loss: 63.52741|  0:00:59s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.19691|  0:00:19s
epoch 1  | loss: 63.06108|  0:00:39s
epoch 2  | loss: 63.04972|  0:00:59s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.93608|  0:00:19s
epoch 

[32m[I 2021-08-24 10:34:23,623][0m Trial 70 finished with value: 63.14323034948292 and parameters: {'MAX_EPOCH': 3000, 'N_STEPS': 3, 'GAMMA': 1.0158404735617563, 'N_INDEPENDENT': 2, 'N_SHARED': 3, 'LAMBDA_SPARSE': 0.002607990698499623, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 1e-06, 'OPT_MOMENTUM': 0.3831561637638316}. Best is trial 68 with value: 62.82561528772247.[0m


63.14323034948292
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.111493385460889, 'n_independent': 3, 'n_shared': 3, 'lambda_sparse': 0.002349821205876703, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.36745324060810725}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 73.66261|  0:00:22s
epoch 1  | loss: 64.80864|  0:00:44s
epoch 2  | loss: 63.64028|  0:01:06s


[32m[I 2021-08-24 10:35:38,823][0m Trial 71 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.0888526078879095, 'n_independent': 3, 'n_shared': 3, 'lambda_sparse': 0.0021857850723638475, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.3915000787357117}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 73.06565|  0:00:22s
epoch 1  | loss: 64.57415|  0:00:44s
epoch 2  | loss: 63.57133|  0:01:06s


[32m[I 2021-08-24 10:36:54,029][0m Trial 72 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.019859892355235, 'n_independent': 3, 'n_shared': 3, 'lambda_sparse': 0.003301488647697995, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.3417672511057575}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 73.52579|  0:00:22s
epoch 1  | loss: 64.72846|  0:00:44s
epoch 2  | loss: 63.69741|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.29054|  0:00:22s
epoch 1  | loss: 63.1855 |  0:00:44s
epoch 2  | loss: 63.07348|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.00641|  0:00:22s
epoch 1  | loss: 62.86576|

[32m[I 2021-08-24 10:41:54,545][0m Trial 73 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.05347839380884, 'n_independent': 3, 'n_shared': 3, 'lambda_sparse': 0.0013869052464605418, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.3992738923671165}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 73.08237|  0:00:22s
epoch 1  | loss: 64.56779|  0:00:44s
epoch 2  | loss: 63.70209|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.35568|  0:00:22s
epoch 1  | loss: 63.14382|  0:00:44s
epoch 2  | loss: 63.04473|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.97965|  0:00:22s
epoch 1  | loss: 62.88027|

[32m[I 2021-08-24 10:53:01,914][0m Trial 74 finished with value: 62.83376677780085 and parameters: {'MAX_EPOCH': 3000, 'N_STEPS': 3, 'GAMMA': 1.05347839380884, 'N_INDEPENDENT': 3, 'N_SHARED': 3, 'LAMBDA_SPARSE': 0.0013869052464605418, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.3992738923671165}. Best is trial 68 with value: 62.82561528772247.[0m


62.83376677780085
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.05626804141177, 'n_independent': 3, 'n_shared': 3, 'lambda_sparse': 0.002852783807325721, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.12260518209585454}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 75.84664|  0:00:22s
epoch 1  | loss: 65.97158|  0:00:44s
epoch 2  | loss: 64.27637|  0:01:06s


[32m[I 2021-08-24 10:54:17,108][0m Trial 75 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.1447307194318364, 'n_independent': 2, 'n_shared': 3, 'lambda_sparse': 0.0012658532218967173, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.001, 'weight_decay': 0.001, 'momentum': 0.32071719883374866}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 118.62803|  0:00:19s
epoch 1  | loss: 112.05062|  0:00:39s
epoch 2  | loss: 106.21163|  0:00:59s


[32m[I 2021-08-24 10:55:24,738][0m Trial 76 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.0001665728121618, 'n_independent': 3, 'n_shared': 3, 'lambda_sparse': 0.003932891740074421, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.01, 'weight_decay': 0.001, 'momentum': 0.38227608808043845}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 97.41608|  0:00:22s
epoch 1  | loss: 79.41061|  0:00:44s
epoch 2  | loss: 74.13484|  0:01:06s


[32m[I 2021-08-24 10:56:39,838][0m Trial 77 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.062809516941148, 'n_independent': 3, 'n_shared': 3, 'lambda_sparse': 0.002160075793795832, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.17847466005590593}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 75.40509|  0:00:22s
epoch 1  | loss: 65.62311|  0:00:44s
epoch 2  | loss: 64.09511|  0:01:06s


[32m[I 2021-08-24 10:57:54,952][0m Trial 78 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 5, 'gamma': 1.2468590886464774, 'n_independent': 1, 'n_shared': 3, 'lambda_sparse': 0.003485688535339972, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.0001, 'weight_decay': 1e-05, 'momentum': 0.36334088269599607}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 87.21269|  0:00:26s
epoch 1  | loss: 87.18582|  0:00:52s
epoch 2  | loss: 87.00603|  0:01:19s


[32m[I 2021-08-24 10:59:25,905][0m Trial 79 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.1493753371200661, 'n_independent': 2, 'n_shared': 3, 'lambda_sparse': 0.0029264469408937285, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.3748903296741834}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 71.86623|  0:00:19s
epoch 1  | loss: 64.22578|  0:00:39s
epoch 2  | loss: 63.4348 |  0:00:59s


[32m[I 2021-08-24 11:00:33,604][0m Trial 80 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.6591244548450548, 'n_independent': 3, 'n_shared': 3, 'lambda_sparse': 0.0015524044046755, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.39931297395232046}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 74.48243|  0:00:22s
epoch 1  | loss: 65.10326|  0:00:44s
epoch 2  | loss: 63.726  |  0:01:06s


[32m[I 2021-08-24 11:01:48,725][0m Trial 81 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.7375105986720993, 'n_independent': 3, 'n_shared': 3, 'lambda_sparse': 0.0015796462520568993, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.3887564052073636}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 74.86974|  0:00:22s
epoch 1  | loss: 65.19545|  0:00:44s
epoch 2  | loss: 63.70378|  0:01:06s


[32m[I 2021-08-24 11:03:03,901][0m Trial 82 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.76688676918252, 'n_independent': 3, 'n_shared': 3, 'lambda_sparse': 0.0007463493297338865, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.3825597270175926}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 74.87925|  0:00:22s
epoch 1  | loss: 65.19468|  0:00:44s
epoch 2  | loss: 63.80963|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.41627|  0:00:22s
epoch 1  | loss: 63.17423|  0:00:44s
epoch 2  | loss: 63.106  |  0:01:06s


[32m[I 2021-08-24 11:05:34,011][0m Trial 83 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.099012999665537, 'n_independent': 3, 'n_shared': 3, 'lambda_sparse': 0.001211317175630462, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.39908052735097965}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 73.09541|  0:00:22s
epoch 1  | loss: 64.51581|  0:00:44s
epoch 2  | loss: 63.48229|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.24724|  0:00:22s
epoch 1  | loss: 63.07595|  0:00:44s
epoch 2  | loss: 62.98044|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.87536|  0:00:22s
epoch 1  | loss: 62.78079

[32m[I 2021-08-24 11:18:16,489][0m Trial 84 finished with value: 62.86444571981508 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 3, 'GAMMA': 1.099012999665537, 'N_INDEPENDENT': 3, 'N_SHARED': 3, 'LAMBDA_SPARSE': 0.001211317175630462, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.39908052735097965}. Best is trial 68 with value: 62.82561528772247.[0m


62.86444571981508
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.0866567222196273, 'n_independent': 3, 'n_shared': 3, 'lambda_sparse': 0.001054557199126861, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.05, 'weight_decay': 0.0001, 'momentum': 0.13632260366255655}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 82.10919|  0:00:22s
epoch 1  | loss: 69.47918|  0:00:44s
epoch 2  | loss: 67.08021|  0:01:06s


[32m[I 2021-08-24 11:19:31,692][0m Trial 85 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.1916138142946422, 'n_independent': 3, 'n_shared': 3, 'lambda_sparse': 0.002432690678377684, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.36429122331072583}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 73.8901 |  0:00:22s
epoch 1  | loss: 64.8295 |  0:00:44s
epoch 2  | loss: 63.69595|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.2846 |  0:00:22s
epoch 1  | loss: 63.15282|  0:00:44s
epoch 2  | loss: 63.06924|  0:01:06s


[32m[I 2021-08-24 11:22:01,936][0m Trial 86 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.035001364251845, 'n_independent': 3, 'n_shared': 3, 'lambda_sparse': 0.00696396814997789, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.3904658732912835}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 73.33615|  0:00:22s
epoch 1  | loss: 64.5633 |  0:00:44s
epoch 2  | loss: 63.62195|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.27977|  0:00:22s
epoch 1  | loss: 63.08965|  0:00:44s
epoch 2  | loss: 62.95027|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.87299|  0:00:22s
epoch 1  | loss: 62.71246| 

[32m[I 2021-08-24 11:33:09,435][0m Trial 87 finished with value: 63.02446620137946 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 3, 'GAMMA': 1.035001364251845, 'N_INDEPENDENT': 3, 'N_SHARED': 3, 'LAMBDA_SPARSE': 0.00696396814997789, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.3904658732912835}. Best is trial 68 with value: 62.82561528772247.[0m


63.02446620137946
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.070105849695056, 'n_independent': 2, 'n_shared': 3, 'lambda_sparse': 0.0005277786419359493, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.34653006572557227}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 71.9566 |  0:00:19s
epoch 1  | loss: 64.25622|  0:00:39s
epoch 2  | loss: 63.42656|  0:00:59s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.2019 |  0:00:19s
epoch 1  | loss: 63.09886|  0:00:39s
epoch 2  | loss: 63.00317|  0:00:59s


[32m[I 2021-08-24 11:35:24,557][0m Trial 88 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.1232109055729649, 'n_independent': 3, 'n_shared': 3, 'lambda_sparse': 0.001319870811140449, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 1e-06, 'momentum': 0.37568181884016527}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 73.50078|  0:00:22s
epoch 1  | loss: 65.05394|  0:00:44s
epoch 2  | loss: 63.85103|  0:01:06s


[32m[I 2021-08-24 11:36:39,734][0m Trial 89 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.0397662894771873, 'n_independent': 3, 'n_shared': 3, 'lambda_sparse': 0.007659519167961672, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.3574457618619971}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 73.75026|  0:00:22s
epoch 1  | loss: 64.831  |  0:00:44s
epoch 2  | loss: 63.67846|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.31711|  0:00:22s
epoch 1  | loss: 63.16661|  0:00:44s
epoch 2  | loss: 63.076  |  0:01:06s


[32m[I 2021-08-24 11:39:09,915][0m Trial 90 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.8326852359745762, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.0017606669165683162, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.3987944490547571}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 70.69732|  0:00:17s
epoch 1  | loss: 64.29444|  0:00:35s
epoch 2  | loss: 63.38327|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.16245|  0:00:17s
epoch 1  | loss: 63.07151|  0:00:34s
epoch 2  | loss: 62.96526|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.91984|  0:00:17s
epoch 1  | loss: 62.8637

[32m[I 2021-08-24 11:47:45,436][0m Trial 91 finished with value: 62.88622800000069 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 3, 'GAMMA': 1.8326852359745762, 'N_INDEPENDENT': 3, 'N_SHARED': 1, 'LAMBDA_SPARSE': 0.0017606669165683162, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.3987944490547571}. Best is trial 68 with value: 62.82561528772247.[0m


62.88622800000069
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.8449155898729905, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.001995513586964738, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.3908595693033985}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 70.66639|  0:00:17s
epoch 1  | loss: 64.25358|  0:00:34s
epoch 2  | loss: 63.38093|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.14849|  0:00:17s
epoch 1  | loss: 63.04331|  0:00:34s
epoch 2  | loss: 62.95659|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.92742|  0:00:17s
epoch 1

[32m[I 2021-08-24 11:50:45,862][0m Trial 92 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.820777004644815, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.001785341305440397, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.36905996442571176}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 71.06503|  0:00:17s
epoch 1  | loss: 64.28227|  0:00:34s
epoch 2  | loss: 63.3402 |  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.15423|  0:00:17s
epoch 1  | loss: 63.04693|  0:00:34s
epoch 2  | loss: 62.98877|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.91903|  0:00:17s
epoch 1  | loss: 62.84555

[32m[I 2021-08-24 11:59:21,758][0m Trial 93 finished with value: 62.98031151107413 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 3, 'GAMMA': 1.820777004644815, 'N_INDEPENDENT': 3, 'N_SHARED': 1, 'LAMBDA_SPARSE': 0.001785341305440397, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.36905996442571176}. Best is trial 68 with value: 62.82561528772247.[0m


62.98031151107413
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.8672342033561966, 'n_independent': 3, 'n_shared': 3, 'lambda_sparse': 0.0022339141473657325, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.38117953340569427}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 75.5343 |  0:00:22s
epoch 1  | loss: 65.21575|  0:00:44s
epoch 2  | loss: 63.80087|  0:01:06s


[32m[I 2021-08-24 12:00:36,912][0m Trial 94 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.1055483382460878, 'n_independent': 2, 'n_shared': 1, 'lambda_sparse': 0.006183991787573704, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.001, 'weight_decay': 0.001, 'momentum': 0.3899725757798737}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 85.58577|  0:00:15s
epoch 1  | loss: 82.49632|  0:00:30s
epoch 2  | loss: 80.11414|  0:00:45s


[32m[I 2021-08-24 12:01:29,638][0m Trial 95 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.7714346229415054, 'n_independent': 3, 'n_shared': 1, 'lambda_sparse': 0.0037428936226311044, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.09579011992966585}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 72.91469|  0:00:17s
epoch 1  | loss: 65.38368|  0:00:34s
epoch 2  | loss: 63.8371 |  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.37137|  0:00:17s
epoch 1  | loss: 63.20095|  0:00:34s
epoch 2  | loss: 63.06501|  0:00:52s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.0069 |  0:00:17s
epoch 1  | loss: 62.954

[32m[I 2021-08-24 12:05:30,153][0m Trial 96 pruned. [0m


{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.8253034209895445, 'n_independent': 3, 'n_shared': 2, 'lambda_sparse': 0.0009909974472197755, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.168599731977958}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 75.68528|  0:00:19s
epoch 1  | loss: 65.91178|  0:00:39s
epoch 2  | loss: 63.87689|  0:00:59s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.32516|  0:00:19s
epoch 1  | loss: 63.15935|  0:00:39s
epoch 2  | loss: 63.06686|  0:00:59s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.99596|  0:00:19s
epoch 1  | loss: 62.92349

[32m[I 2021-08-24 12:16:54,121][0m Trial 97 finished with value: 62.96906715360755 and parameters: {'MAX_EPOCH': 1000, 'N_STEPS': 3, 'GAMMA': 1.8253034209895445, 'N_INDEPENDENT': 3, 'N_SHARED': 2, 'LAMBDA_SPARSE': 0.0009909974472197755, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.168599731977958}. Best is trial 68 with value: 62.82561528772247.[0m


62.96906715360755
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.2742090124556205, 'n_independent': 3, 'n_shared': 3, 'lambda_sparse': 0.0026796074007847523, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.3421014814947857}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 74.16438|  0:00:22s
epoch 1  | loss: 64.94617|  0:00:44s
epoch 2  | loss: 63.79334|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 63.33479|  0:00:22s
epoch 1  | loss: 63.19904|  0:00:44s
epoch 2  | loss: 63.07682|  0:01:06s
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 62.98712|  0:00:22s
epoch 

[32m[I 2021-08-24 12:28:01,488][0m Trial 98 finished with value: 62.93034781022294 and parameters: {'MAX_EPOCH': 3000, 'N_STEPS': 3, 'GAMMA': 1.2742090124556205, 'N_INDEPENDENT': 3, 'N_SHARED': 3, 'LAMBDA_SPARSE': 0.0026796074007847523, 'OPT_LR': 0.1, 'OPT_WEIGHT_DECAY': 0.001, 'OPT_MOMENTUM': 0.3421014814947857}. Best is trial 68 with value: 62.82561528772247.[0m


62.93034781022294
{'n_d': 16, 'n_a': 16, 'n_steps': 3, 'gamma': 1.2703667711658635, 'n_independent': 2, 'n_shared': 1, 'lambda_sparse': 0.002765409889607477, 'optimizer_fn': <class 'torch.optim.sgd.SGD'>, 'optimizer_params': {'lr': 0.01, 'weight_decay': 0.001, 'momentum': 0.33986181100866325}, 'mask_type': 'entmax', 'scheduler_params': {'mode': 'min', 'patience': 20, 'min_lr': 1e-06, 'factor': 0.9}, 'scheduler_fn': <class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, 'verbose': 1, 'seed': 42}
Device used : cpu
No early stopping will be performed, last training weights will be used.
epoch 0  | loss: 78.37949|  0:00:15s
epoch 1  | loss: 72.35527|  0:00:30s
epoch 2  | loss: 70.40182|  0:00:45s


[32m[I 2021-08-24 12:28:54,271][0m Trial 99 pruned. [0m


NameError: name 'joblib' is not defined

In [6]:
import joblib
from optuna.trial import TrialState

joblib.dump(study, "study_TABNET16_2.pkl")

pruned_trials = study.get_trials(deepcopy=False, states=[TrialState.PRUNED])
complete_trials = study.get_trials(deepcopy=False, states=[TrialState.COMPLETE])



print("Study statistics: ")
print("  Number of finished trials: ", len(study.trials))
print("  Number of pruned trials: ", len(pruned_trials))
print("  Number of complete trials: ", len(complete_trials))

print("Best trial:")
trial = study.best_trial

print("  Value: ", trial.value)

print("  Params: ")
for key, value in trial.params.items():
    print("    {}: {}".format(key, value))


# print('Best trial: score {},\nparams {}'.format(study.best_trial.value,study.best_trial.params))
# best_param = study.best_trial.params

Study statistics: 
  Number of finished trials:  100
  Number of pruned trials:  57
  Number of complete trials:  43
Best trial:
  Value:  62.82561528772247
  Params: 
    MAX_EPOCH: 1000
    N_STEPS: 3
    GAMMA: 1.0083374946651071
    N_INDEPENDENT: 3
    N_SHARED: 3
    LAMBDA_SPARSE: 0.002635478523931188
    OPT_LR: 0.1
    OPT_WEIGHT_DECAY: 0.001
    OPT_MOMENTUM: 0.3838017992965024


In [None]:
pruned_trials = study.get_trials(deepcopy=False, states=[TrialState.PRUNED])
complete_trials = study.get_trials(deepcopy=False, states=[TrialState.COMPLETE])

print("Study statistics: ")
print("  Number of finished trials: ", len(study.trials))
print("  Number of pruned trials: ", len(pruned_trials))
print("  Number of complete trials: ", len(complete_trials))

print("Best trial:")
trial = study.best_trial

print("  Value: ", trial.value)

print("  Params: ")
for key, value in trial.params.items():
    print("    {}: {}".format(key, value))

In [None]:
train_oof = np.zeros((len(x_train)))
test_preds = 0

kf = KFold(n_splits=NUM_FOLDS, shuffle=True, random_state=SEED)

for f, (train_ind, val_ind) in tqdm(enumerate(kf.split(x_train, y_train))):

    print(f'Fold {f}')
    train_df, val_df = x_train.iloc[train_ind][columns], x_train.iloc[val_ind][columns]

    train_target, val_target = y_train[train_ind], y_train[val_ind]

    print(train_df.shape, train_target.shape)
    print(val_df.shape, val_target.shape)

    train_target=train_target.reshape(-1,1)
    val_target=val_target.reshape(-1,1)

    train_df      = train_df.to_numpy()
    train_target      = train_target.reshape(-1, 1)

    val_df = val_df.to_numpy()
    val_target = val_target.reshape(-1, 1)

    model = TabNetRegressor(**tabnet_params)

    model.fit(X_train=train_df,
              y_train=train_target,
              eval_set=[(val_df, val_target)],
              eval_name = ["val"],
              eval_metric = ['mse'],#["logits_ll"],
              max_epochs=MAX_EPOCH, #20
              patience=20, batch_size=BATCH_SIZE,
              drop_last=False)#,

    temp_oof = model.predict(val_df)
    train_oof[val_ind] = temp_oof.reshape(-1)     
    print(mean_squared_error(temp_oof, val_target, squared=False))
    
    temp_test = model.predict(x_test.to_numpy())
    test_preds += temp_test/NUM_FOLDS

Index(['f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9', 'f10',
       'f11', 'f12', 'f13', 'f14', 'f15', 'f16', 'f17', 'f18', 'f19', 'f20',
       'f21', 'f22', 'f23', 'f24', 'f25', 'f26', 'f27', 'f28', 'f29', 'f30',
       'f31', 'f32', 'f33', 'f34', 'f35', 'f36', 'f37', 'f38', 'f39', 'f40',
       'f41', 'f42', 'f43', 'f44', 'f45', 'f46', 'f47', 'f48', 'f49', 'f50',
       'f51', 'f52', 'f53', 'f54', 'f55', 'f56', 'f57', 'f58', 'f59', 'f60',
       'f61', 'f62', 'f63', 'f64', 'f65', 'f66', 'f67', 'f68', 'f69', 'f70',
       'f71', 'f72', 'f73', 'f74', 'f75', 'f76', 'f77', 'f78', 'f79', 'f80',
       'f81', 'f82', 'f83', 'f84', 'f85', 'f86', 'f87', 'f88', 'f89', 'f90',
       'f91', 'f92', 'f93', 'f94', 'f95', 'f96', 'f97', 'f98', 'f99'],
      dtype='object')


In [None]:
# XG-Boost Score: 61.34741620778698

In [22]:
print('#### fold #########',np.sqrt(mean_squared_error(y_test, test_preds)),mean_squared_error(y_test, test_preds))

#### fold ######### 7.893137848830042 62.30162510063334


In [24]:
np.save('TabNet_ytest.npy', test_preds)

In [20]:
from pandas import Series, DataFrame

raw_data = {'id': [ i for i in range(250000,400000)],
            'loss':  }

data = DataFrame(raw_data)
data.set_index('id', inplace=True)
print(data)
data.to_csv("submission.csv", mode='w')

ValueError: All arrays must be of the same length