# Notebook pro trénink s destilací nad datasetem CIFAR10
V tomto notebooku je trénován MobileNetV2 nad datasetem CIFAR10, jako učitelsý model je využíván finetunued ViT nad stejným datasetem. 

MobileNetV2 je používán s náhodnou inicializací, tréninkem pouze klasifikační hlavy inicializovaného (předtrénovaného nad ImageNetem) MobileNetuV2 a trénink celého modelu, taktéž inicializovaného. Tyto tři úlohy jsou trénovány bězným způsobem a také s pomocí destilace výše zmíněného modelu.  

Při destilaci je využíváno předpočítaných logitů ze sešitu precompute_logits.

In [1]:
%pip install transformers[torch] huggingface_hub datasets evaluate torchvision optuna

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting huggingface_hub
  Downloading huggingface_hub-0.27.1-py3-none-any.whl.metadata (13 kB)
Collecting datasets
  Downloading datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Collecting evaluate
  Downloading evaluate-0.4.3-py3-none-any.whl.metadata (9.2 kB)
Collecting optuna
  Downloading optuna-4.1.0-py3-none-any.whl.metadata (16 kB)
Collecting transformers[torch]
  Downloading transformers-4.48.0-py3-none-any.whl.metadata (44 kB)
Collecting tokenizers<0.22,>=0.21 (from transformers[torch])
  Downloading tokenizers-0.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting accelerate>=0.26.0 (from transformers[torch])
  Downloading accelerate-1.2.1-py3-none-any.whl.metadata (19 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x

## Import knihoven a definice metod

In [2]:
from transformers import Trainer, TrainingArguments, MobileNetV2Config, MobileNetV2ForImageClassification, EarlyStoppingCallback
from torchvision import transforms
from torch.utils.data import Dataset
import torch.nn.functional as F
from PIL import Image
import torch.nn as nn
import numpy as np
import evaluate
import random
import pickle
import optuna
import torch
import math
import os 

  from .autonotebook import tqdm as notebook_tqdm


Resetování náhodného seedu pro replikovatelnost výsledků.
Zřejmě je možné části odebrat.

TODO: Odebrat zbytečná nastavení.

In [3]:
def reset_seed(seed=42):
    torch.manual_seed(seed)
    random.seed(seed)
    np.random.seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed) 
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    os.environ['PYTHONHASHSEED'] = str(seed)

Nový wrapper, který pracuje přímo se soubory staženého a upraveného datasetu CIFAR10.
Využití načtení pomocí metody jako dříve není možné kvůli jiné checksum. 

Zároveň se již dotahují logity přímo z datasetu.

In [4]:
class CustomCIFAR10(Dataset):
    def __init__(self, root, train=True, transform=None, target_transform=None):
        self.root = root
        self.train = train
        self.transform = transform
        self.target_transform = target_transform

        self.data = []
        self.targets = []
        self.logits = []
        
        if self.train:
             for i in range(1, 6):
                 data_file = os.path.join(self.root, 'cifar-10-batches-py', f'data_batch_{i}')
                 with open(data_file, 'rb') as fo:
                     dict = pickle.load(fo, encoding='bytes')
                     self.data.append(dict[b'data'])
                     self.targets.extend(dict[b'labels'])
                     self.logits.extend(dict[b'logits'])  
        else:
            data_file = os.path.join(self.root, 'cifar-10-batches-py', 'test_batch')
            with open(data_file, 'rb') as fo:
                dict = pickle.load(fo, encoding='bytes')
                self.data.append(dict[b'data'])
                self.targets.extend(dict[b'labels'])
                self.logits.extend(dict[b'logits'])  

        self.data = np.concatenate(self.data, axis=0)
        self.targets = np.array(self.targets)
        self.logits = np.array(self.logits)


    def __len__(self):
        return len(self.data)

    def __getitem__(self, index):
        image = self.data[index].reshape(3, 32, 32).transpose(1, 2, 0)
        label = self.targets[index]
        logit = self.logits[index]
        
        image = Image.fromarray(image.astype('uint8'), 'RGB')
        logit = torch.tensor(logit, dtype=torch.float)
        if self.transform:
            image = self.transform(image)

        if self.target_transform:
            target = self.target_transform(target)
            
        return {
            'pixel_values': image,
            'labels': label,
            'logits': logit
        }


Definice accuracy metriky pro trénování modelu.

In [5]:
accuracy_metric = evaluate.load("accuracy")
precision_metric = evaluate.load("precision")
recall_metric = evaluate.load("recall")
f1_metric = evaluate.load("f1")

def compute_metrics(eval_pred):
    pred, labels = eval_pred
    predictions = np.argmax(pred, axis=1)
    
    accuracy = accuracy_metric.compute(predictions=predictions, references=labels)
    precision = precision_metric.compute(predictions=predictions, references=labels, average='macro', zero_division = 0)
    recall = recall_metric.compute(predictions=predictions, references=labels, average='macro', zero_division = 0)
    f1 = f1_metric.compute(predictions=predictions, references=labels, average='macro')

    return {
        "accuracy": accuracy["accuracy"],
        "precision": precision["precision"],
        "recall": recall["recall"],
        "f1": f1["f1"]
    }

Downloading builder script: 100%|██████████| 7.56k/7.56k [00:00<00:00, 13.8MB/s]
Downloading builder script: 100%|██████████| 7.38k/7.38k [00:00<00:00, 15.2MB/s]
Downloading builder script: 100%|██████████| 6.79k/6.79k [00:00<00:00, 14.2MB/s]


Trénovací argumenty pro trainer. 

In [6]:
class Custom_training_args(TrainingArguments):
    def __init__(self, lambda_param, temperature, *args, **kwargs):
        super().__init__(*args, **kwargs)    
        self.lambda_param = lambda_param
        self.temperature = temperature

In [7]:
def get_training_args(output_dir:str, logging_dir:str, remove_unused_columns:bool):
    return (
        Custom_training_args(
        output_dir=output_dir,
        eval_strategy="epoch",
        save_strategy="epoch",
        logging_strategy="epoch",
        learning_rate=5e-5, #Defaultní hodnota 
        per_device_train_batch_size=128,
        per_device_eval_batch_size=128,
        num_train_epochs=15,
        weight_decay=0.01,
        seed = 42,  #Defaultní hodnota 
        metric_for_best_model="f1",
        fp16=True, 
        logging_dir=logging_dir,
        remove_unused_columns=remove_unused_columns,
        lambda_param = 0.5, 
        temperature = 5
    ))

Náhodně inicializovaný MobileNetV2.

In [8]:
def get_random_init_mobilenet():
    reset_seed(42)
    student_config = MobileNetV2Config()
    student_config.num_labels = 10
    return MobileNetV2ForImageClassification(student_config)

In [9]:
reset_seed(42)

In [10]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("GPU is available and will be used:", torch.cuda.get_device_name(0))
else:
    device = torch.device("cpu")
    print("GPU is not available, using CPU.")

GPU is available and will be used: NVIDIA A100 80GB PCIe MIG 1g.10gb


Provedení transformací nad datasetem.

In [11]:
transform = transforms.Compose([
    transforms.Resize((224, 224)), 
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])
print(os.getcwd())
test = CustomCIFAR10(root='./data/10-logits', train=False, transform=transform)
train = CustomCIFAR10(root='./data/10-logits', train=True, transform=transform)

/home/jovyan


### Standardní trénink náhodně inicializovaného modelu. 

In [12]:
training_args = get_training_args("./results/cifar10-random", './logs/cifar10-random', True)

In [13]:
def hp_space(trial):
    params =  {
        "learning_rate": trial.suggest_float("learning_rate", 1e-6, 5e-4, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "adam_beta1" : trial.suggest_float("adam_beta1", 0.9, 0.99, step=0.01)
    }   
    print(f"Trial {trial.number} with params: {params}")
    return params

In [14]:
#Nápočet epoch na steps
min_r = math.ceil(50000/128)*2
max_r = math.ceil(50000/128)*15

In [15]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



In [28]:
trainer = Trainer(
    args=training_args,
    train_dataset=train,
    eval_dataset=test,
    compute_metrics=compute_metrics,
    model_init=get_random_init_mobilenet
  )
  

In [16]:
best_trial = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    study_name="Test",
    n_trials=150
)

[I 2025-01-04 21:20:33,926] A new study created in memory with name: Test


Trial 0 with params: {'learning_rate': 1.0253509690168497e-05, 'weight_decay': 0.01, 'adam_beta1': 0.97}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3246,1.952858,0.2791,0.279871,0.2791,0.259486
2,1.9593,1.712043,0.3508,0.347448,0.3508,0.342523
3,1.8109,1.617426,0.3922,0.389229,0.3922,0.385845


[I 2025-01-04 21:27:26,053] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 4.128205343826226e-05, 'weight_decay': 0.001, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9742,1.53719,0.4307,0.429679,0.4307,0.424683
2,1.5412,1.330349,0.5157,0.515273,0.5157,0.510939
3,1.3125,1.170791,0.5809,0.580427,0.5809,0.578425
4,1.1445,1.056477,0.6237,0.618623,0.6237,0.617743
5,1.0126,1.003619,0.6454,0.659922,0.6454,0.646621
6,0.9023,0.950919,0.6697,0.673489,0.6697,0.665197


[I 2025-01-04 21:41:07,363] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 1.4347159517201402e-06, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.96}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4605,2.282299,0.1106,0.10778,0.1106,0.042568
2,2.4206,2.251575,0.1534,0.117842,0.1534,0.102382
3,2.386,2.220645,0.1817,0.190924,0.1817,0.128917
4,2.3441,2.159038,0.2124,0.210841,0.2124,0.158581
5,2.2976,2.121108,0.2225,0.213822,0.2225,0.180521
6,2.2322,2.050332,0.251,0.245453,0.251,0.211239
7,2.1638,1.967224,0.2743,0.267993,0.2743,0.240347
8,2.1014,1.944384,0.2863,0.281919,0.2863,0.259355
9,2.0626,1.88449,0.2925,0.290812,0.2925,0.268828
10,2.0277,1.870312,0.3018,0.294115,0.3018,0.283157


[I 2025-01-04 22:08:40,232] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 8.14829321010529e-05, 'weight_decay': 0.0, 'adam_beta1': 0.99}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8941,1.430043,0.4765,0.470782,0.4765,0.469335
2,1.3893,1.165547,0.5748,0.576051,0.5748,0.573984
3,1.1102,0.983707,0.6453,0.653153,0.6453,0.645667
4,0.9095,0.865909,0.6929,0.693976,0.6929,0.687767
5,0.7547,0.792643,0.7203,0.733455,0.7203,0.722677
6,0.6192,0.748712,0.7448,0.752064,0.7448,0.744119


[I 2025-01-04 22:21:34,354] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 0.0001764971584817573, 'weight_decay': 0.002, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6406,1.127816,0.5875,0.588188,0.5875,0.585114
2,1.0209,0.820439,0.7112,0.718462,0.7112,0.713173
3,0.7381,0.670809,0.7674,0.771259,0.7674,0.766575
4,0.5715,0.586457,0.802,0.804205,0.802,0.801382
5,0.4413,0.590939,0.7975,0.805855,0.7975,0.798904
6,0.3355,0.586605,0.8099,0.820852,0.8099,0.810224
7,0.2506,0.587743,0.8146,0.818786,0.8146,0.813713
8,0.1754,0.60947,0.8218,0.82409,0.8218,0.819391
9,0.1259,0.609261,0.8274,0.830458,0.8274,0.827604
10,0.0875,0.614316,0.8285,0.83009,0.8285,0.828583


[I 2025-01-04 22:47:15,137] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 3.1261029103110603e-06, 'weight_decay': 0.003, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4351,2.242303,0.1635,0.136943,0.1635,0.111866
2,2.3502,2.14284,0.2235,0.210075,0.2235,0.178484
3,2.2007,1.968147,0.2729,0.274187,0.2729,0.247437
4,2.0476,1.852924,0.3054,0.295098,0.3054,0.287945
5,1.9712,1.791265,0.3225,0.318366,0.3225,0.305945
6,1.9223,1.761298,0.3419,0.334431,0.3419,0.330193
7,1.8883,1.718303,0.3538,0.348998,0.3538,0.345933
8,1.8572,1.729753,0.3525,0.350417,0.3525,0.33989
9,1.8349,1.68352,0.361,0.359587,0.361,0.35687
10,1.8103,1.661988,0.3759,0.369264,0.3759,0.368133


[I 2025-01-04 23:13:38,212] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 1.4648955132800731e-05, 'weight_decay': 0.003, 'adam_beta1': 0.96}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2563,1.826764,0.3087,0.31461,0.3087,0.292582
2,1.8794,1.647966,0.3774,0.375282,0.3774,0.368815
3,1.7187,1.529115,0.4315,0.429389,0.4315,0.422839
4,1.5983,1.434979,0.4706,0.466812,0.4706,0.463697
5,1.4963,1.387527,0.4919,0.492083,0.4919,0.487411
6,1.4176,1.318753,0.5203,0.516597,0.5203,0.513999
7,1.3518,1.283824,0.5315,0.53481,0.5315,0.530667
8,1.2935,1.290219,0.537,0.528555,0.537,0.52647
9,1.2428,1.210784,0.5538,0.562601,0.5538,0.554204
10,1.1966,1.201471,0.5659,0.563295,0.5659,0.560735


[I 2025-01-04 23:59:14,928] Trial 6 finished with value: 0.5950815756680115 and parameters: {'learning_rate': 1.4648955132800731e-05, 'weight_decay': 0.003, 'adam_beta1': 0.96}. Best is trial 6 with value: 0.5950815756680115.


Trial 7 with params: {'learning_rate': 2.379522116387725e-06, 'weight_decay': 0.003, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4468,2.260634,0.1454,0.1549,0.1454,0.097441
2,2.3863,2.205066,0.1901,0.161216,0.1901,0.136788
3,2.3062,2.111651,0.2298,0.221321,0.2298,0.189907
4,2.1789,1.956355,0.2692,0.257775,0.2692,0.234927
5,2.0694,1.879248,0.2902,0.288061,0.2902,0.261908
6,2.0033,1.830837,0.3178,0.311658,0.3178,0.29864


[I 2025-01-05 00:12:52,817] Trial 7 pruned. 


Trial 8 with params: {'learning_rate': 1.7018418817029176e-05, 'weight_decay': 0.008, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1899,1.748065,0.3472,0.346124,0.3472,0.338002
2,1.8072,1.573016,0.4137,0.411779,0.4137,0.406236
3,1.6486,1.469678,0.4623,0.45938,0.4623,0.456181
4,1.5265,1.391242,0.4942,0.486963,0.4942,0.486008
5,1.43,1.337853,0.5166,0.519818,0.5166,0.512017
6,1.3466,1.266646,0.5423,0.535865,0.5423,0.534215
7,1.2734,1.223051,0.5558,0.561654,0.5558,0.556145
8,1.2093,1.206107,0.5643,0.557,0.5643,0.556482
9,1.154,1.157013,0.5804,0.590095,0.5804,0.582638
10,1.1061,1.135454,0.5876,0.585054,0.5876,0.583652


[I 2025-01-05 00:55:40,712] Trial 8 finished with value: 0.6208557778745601 and parameters: {'learning_rate': 1.7018418817029176e-05, 'weight_decay': 0.008, 'adam_beta1': 0.91}. Best is trial 8 with value: 0.6208557778745601.


Trial 9 with params: {'learning_rate': 2.4428866967349976e-05, 'weight_decay': 0.006, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1049,1.678595,0.3648,0.362437,0.3648,0.356758
2,1.71,1.476397,0.4493,0.444266,0.4493,0.444005
3,1.5237,1.371078,0.4975,0.492691,0.4975,0.490335
4,1.3954,1.277173,0.537,0.531524,0.537,0.529376
5,1.2839,1.230579,0.5537,0.559545,0.5537,0.550563
6,1.1989,1.151359,0.5843,0.583187,0.5843,0.577479
7,1.1141,1.112761,0.6025,0.608018,0.6025,0.602562
8,1.0469,1.099864,0.609,0.605735,0.609,0.601062
9,0.9814,1.062936,0.6153,0.629036,0.6153,0.619208
10,0.9276,1.023877,0.6338,0.634735,0.6338,0.632403


[I 2025-01-05 01:38:23,211] Trial 9 finished with value: 0.6587412372988884 and parameters: {'learning_rate': 2.4428866967349976e-05, 'weight_decay': 0.006, 'adam_beta1': 0.9}. Best is trial 9 with value: 0.6587412372988884.


Trial 10 with params: {'learning_rate': 0.00026497374689934315, 'weight_decay': 0.008, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5501,1.07052,0.6095,0.61599,0.6095,0.60885
2,0.9387,0.791795,0.7222,0.733242,0.7222,0.722489
3,0.6961,0.640728,0.7764,0.782317,0.7764,0.775712


[I 2025-01-05 01:44:45,380] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 6.527343955903165e-06, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3908,2.129595,0.2269,0.214195,0.2269,0.193048
2,2.1013,1.828567,0.3235,0.322282,0.3235,0.307401
3,1.9237,1.732433,0.3536,0.352255,0.3536,0.347266


[I 2025-01-05 01:51:07,505] Trial 11 pruned. 


Trial 12 with params: {'learning_rate': 3.134890324531348e-05, 'weight_decay': 0.005, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0415,1.604173,0.3981,0.387857,0.3981,0.389202
2,1.6263,1.405563,0.482,0.480198,0.482,0.477872
3,1.409,1.245762,0.5494,0.543667,0.5494,0.540694
4,1.2537,1.147284,0.5863,0.57934,0.5863,0.577982
5,1.1328,1.100701,0.6041,0.614502,0.6041,0.603709
6,1.0259,1.017457,0.6387,0.640086,0.6387,0.634207


[I 2025-01-05 02:03:56,905] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 1.6056817955297837e-05, 'weight_decay': 0.007, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2195,1.784407,0.3307,0.334054,0.3307,0.318914
2,1.835,1.59344,0.4042,0.406138,0.4042,0.397507
3,1.6717,1.488459,0.4516,0.446642,0.4516,0.444709
4,1.5608,1.411375,0.4861,0.481367,0.4861,0.479865
5,1.4684,1.374187,0.4949,0.500257,0.4949,0.491805
6,1.3935,1.297013,0.5264,0.521843,0.5264,0.517804


[I 2025-01-05 02:16:47,622] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 1.947699682751316e-06, 'weight_decay': 0.005, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4531,2.270793,0.1296,0.113251,0.1296,0.075413
2,2.4031,2.227194,0.174,0.142878,0.174,0.120905
3,2.3436,2.164479,0.2065,0.191618,0.2065,0.159332
4,2.2618,2.048353,0.2522,0.23654,0.2522,0.209989
5,2.1583,1.963016,0.2683,0.263557,0.2683,0.230846
6,2.0769,1.900415,0.292,0.29083,0.292,0.264963
7,2.0239,1.836032,0.3111,0.304448,0.3111,0.292289
8,1.9822,1.836321,0.3138,0.311642,0.3138,0.295086
9,1.9562,1.789328,0.3188,0.315088,0.3188,0.307131
10,1.932,1.776399,0.3336,0.325596,0.3336,0.324586


[I 2025-01-05 02:42:24,410] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 3.7646643049236884e-05, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.002,1.577376,0.4059,0.402007,0.4059,0.392889
2,1.5902,1.369401,0.4973,0.497588,0.4973,0.49068
3,1.3685,1.21385,0.5634,0.559047,0.5634,0.557309
4,1.1987,1.10486,0.6036,0.598958,0.6036,0.596596
5,1.0733,1.061325,0.6223,0.639277,0.6223,0.624395
6,0.9606,0.978991,0.6552,0.657911,0.6552,0.651362
7,0.8636,0.932991,0.6701,0.677782,0.6701,0.670745
8,0.7748,0.893197,0.6885,0.685967,0.6885,0.684532
9,0.6987,0.878446,0.6965,0.707104,0.6965,0.699116
10,0.6292,0.860722,0.6993,0.705631,0.6993,0.698749


[I 2025-01-05 03:25:18,187] Trial 15 finished with value: 0.7115095943026561 and parameters: {'learning_rate': 3.7646643049236884e-05, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.92}. Best is trial 15 with value: 0.7115095943026561.


Trial 16 with params: {'learning_rate': 4.793568139083467e-05, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9636,1.517127,0.4326,0.426099,0.4326,0.425027
2,1.5189,1.301817,0.5258,0.524072,0.5258,0.52115
3,1.2805,1.13664,0.5954,0.593616,0.5954,0.588324
4,1.0965,1.018689,0.6371,0.63767,0.6371,0.633892
5,0.9563,0.948012,0.6657,0.674683,0.6657,0.667007
6,0.8327,0.873027,0.6947,0.701483,0.6947,0.693779
7,0.7312,0.848679,0.7033,0.713526,0.7033,0.704211
8,0.639,0.84095,0.7082,0.709359,0.7082,0.703627
9,0.5556,0.813897,0.7222,0.730861,0.7222,0.724781
10,0.4772,0.819984,0.7233,0.733256,0.7233,0.723457


[I 2025-01-05 04:07:55,253] Trial 16 finished with value: 0.7264228400259334 and parameters: {'learning_rate': 4.793568139083467e-05, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.91}. Best is trial 16 with value: 0.7264228400259334.


Trial 17 with params: {'learning_rate': 4.5954116842262964e-05, 'weight_decay': 0.01, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9592,1.498164,0.4406,0.434725,0.4406,0.429227
2,1.5103,1.284363,0.5348,0.53596,0.5348,0.53189
3,1.2699,1.130308,0.5946,0.593785,0.5946,0.590546
4,1.0968,1.015123,0.6339,0.633395,0.6339,0.628984
5,0.9578,0.960816,0.6583,0.674505,0.6583,0.660201
6,0.8375,0.880179,0.6891,0.695175,0.6891,0.687079
7,0.7295,0.840937,0.7079,0.714594,0.7079,0.708108
8,0.6416,0.842875,0.7126,0.713827,0.7126,0.70855
9,0.5563,0.829616,0.7202,0.728773,0.7202,0.722622
10,0.4817,0.818117,0.7218,0.727749,0.7218,0.721353


[I 2025-01-05 04:50:31,890] Trial 17 finished with value: 0.7324007888083972 and parameters: {'learning_rate': 4.5954116842262964e-05, 'weight_decay': 0.01, 'adam_beta1': 0.93}. Best is trial 17 with value: 0.7324007888083972.


Trial 18 with params: {'learning_rate': 0.00013765420776152537, 'weight_decay': 0.006, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7098,1.246266,0.5434,0.55002,0.5434,0.541676
2,1.0932,0.892218,0.6882,0.688161,0.6882,0.686755
3,0.8157,0.725626,0.7488,0.755123,0.7488,0.748851
4,0.6321,0.618335,0.788,0.788621,0.788,0.787188
5,0.4919,0.60125,0.7934,0.802161,0.7934,0.794982
6,0.3776,0.600538,0.8021,0.815965,0.8021,0.803029


[I 2025-01-05 05:03:21,236] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.00013579422085040563, 'weight_decay': 0.01, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7034,1.203514,0.5646,0.568082,0.5646,0.562523
2,1.0906,0.919844,0.6724,0.686609,0.6724,0.672239
3,0.7997,0.729169,0.7411,0.748621,0.7411,0.741454


[I 2025-01-05 05:09:45,228] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 0.00017908732484905353, 'weight_decay': 0.01, 'adam_beta1': 0.96}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6761,1.188751,0.5694,0.569628,0.5694,0.568315
2,1.0513,0.830241,0.7062,0.705964,0.7062,0.70553
3,0.7493,0.679388,0.7621,0.767416,0.7621,0.761431
4,0.5746,0.594829,0.7961,0.796734,0.7961,0.793515
5,0.4486,0.587099,0.8003,0.809539,0.8003,0.801703
6,0.3393,0.560434,0.8142,0.820768,0.8142,0.813883
7,0.2484,0.601537,0.8162,0.820883,0.8162,0.815266
8,0.1772,0.615484,0.8165,0.819076,0.8165,0.813967
9,0.1265,0.621376,0.825,0.832725,0.825,0.826762
10,0.0896,0.653146,0.8222,0.829261,0.8222,0.822789


[I 2025-01-05 05:52:17,516] Trial 20 finished with value: 0.8436343253621142 and parameters: {'learning_rate': 0.00017908732484905353, 'weight_decay': 0.01, 'adam_beta1': 0.96}. Best is trial 20 with value: 0.8436343253621142.


Trial 21 with params: {'learning_rate': 0.00024796752162416176, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.98}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.604,1.125922,0.595,0.592714,0.595,0.58988
2,1.0082,0.819158,0.7081,0.7107,0.7081,0.70588
3,0.7359,0.703378,0.7554,0.764378,0.7554,0.754871
4,0.571,0.572495,0.8047,0.805637,0.8047,0.80164
5,0.4527,0.588036,0.7942,0.80576,0.7942,0.794658
6,0.3546,0.56029,0.8126,0.826051,0.8126,0.812595
7,0.2675,0.581054,0.8168,0.825978,0.8168,0.816683
8,0.2006,0.555858,0.8316,0.83469,0.8316,0.830775
9,0.1427,0.595301,0.8284,0.839054,0.8284,0.830749
10,0.1048,0.573688,0.8383,0.842008,0.8383,0.838236


[I 2025-01-05 06:34:57,766] Trial 21 finished with value: 0.8460513731033876 and parameters: {'learning_rate': 0.00024796752162416176, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.98}. Best is trial 21 with value: 0.8460513731033876.


Trial 22 with params: {'learning_rate': 0.0004413674387972158, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.99}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5263,1.072006,0.6091,0.606467,0.6091,0.602693
2,0.976,0.848625,0.6972,0.704602,0.6972,0.696241
3,0.7541,0.691066,0.7536,0.757052,0.7536,0.753133
4,0.6088,0.630164,0.7815,0.785131,0.7815,0.779897
5,0.5171,0.610278,0.7904,0.801282,0.7904,0.79152
6,0.4315,0.544594,0.8145,0.827902,0.8145,0.815427
7,0.3552,0.572938,0.8153,0.824265,0.8153,0.815738
8,0.2818,0.549921,0.8265,0.83069,0.8265,0.824541
9,0.2303,0.536593,0.8358,0.841596,0.8358,0.836641
10,0.1788,0.520882,0.843,0.844831,0.843,0.842888


[I 2025-01-05 07:17:39,913] Trial 22 finished with value: 0.86291730276543 and parameters: {'learning_rate': 0.0004413674387972158, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.99}. Best is trial 22 with value: 0.86291730276543.


Trial 23 with params: {'learning_rate': 0.00040595483908833195, 'weight_decay': 0.008, 'adam_beta1': 0.99}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6046,1.151673,0.5766,0.572685,0.5766,0.567392
2,1.0462,0.877985,0.6785,0.68861,0.6785,0.68058
3,0.7993,0.707619,0.7486,0.751764,0.7486,0.748546
4,0.6486,0.644641,0.7777,0.779473,0.7777,0.776084
5,0.5362,0.642922,0.7768,0.792114,0.7768,0.777777
6,0.4519,0.595346,0.7975,0.817441,0.7975,0.798586


[I 2025-01-05 07:30:27,576] Trial 23 pruned. 


Trial 24 with params: {'learning_rate': 0.0003306866048037238, 'weight_decay': 0.01, 'adam_beta1': 0.96}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5338,1.049941,0.6124,0.613865,0.6124,0.60947
2,0.9302,0.742202,0.7376,0.737658,0.7376,0.735975
3,0.6826,0.646021,0.7788,0.784694,0.7788,0.777208
4,0.5521,0.557602,0.8142,0.815036,0.8142,0.812262
5,0.4443,0.53778,0.8192,0.826836,0.8192,0.820028
6,0.3591,0.541463,0.8192,0.836789,0.8192,0.821371
7,0.2847,0.49979,0.8369,0.841481,0.8369,0.836534
8,0.223,0.523813,0.838,0.841018,0.838,0.836406
9,0.1645,0.518979,0.8455,0.852752,0.8455,0.847362
10,0.1239,0.493854,0.8542,0.857621,0.8542,0.854677


[I 2025-01-05 07:56:04,290] Trial 24 pruned. 


Trial 25 with params: {'learning_rate': 0.00015871762956826775, 'weight_decay': 0.01, 'adam_beta1': 0.99}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7296,1.264098,0.5315,0.525612,0.5315,0.521144
2,1.1628,0.948111,0.6611,0.666969,0.6611,0.660386
3,0.8737,0.768148,0.7302,0.730452,0.7302,0.729357


[I 2025-01-05 08:02:27,685] Trial 25 pruned. 


Trial 26 with params: {'learning_rate': 5.2161675322108636e-05, 'weight_decay': 0.005, 'adam_beta1': 0.99}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0434,1.576046,0.4048,0.399431,0.4048,0.393377
2,1.5752,1.320534,0.5168,0.507791,0.5168,0.509507
3,1.3187,1.174596,0.5763,0.576565,0.5763,0.572134


[I 2025-01-05 08:08:51,458] Trial 26 pruned. 


Trial 27 with params: {'learning_rate': 4.848784343928481e-05, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.98}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9818,1.528348,0.4314,0.429014,0.4314,0.421205
2,1.5096,1.275821,0.5307,0.528413,0.5307,0.527289
3,1.2682,1.131454,0.5953,0.594555,0.5953,0.589652


[I 2025-01-05 08:15:15,614] Trial 27 pruned. 


Trial 28 with params: {'learning_rate': 0.00022623511831356313, 'weight_decay': 0.007, 'adam_beta1': 0.97}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6436,1.151175,0.5855,0.581571,0.5855,0.580988
2,1.0089,0.81074,0.7108,0.714458,0.7108,0.711026
3,0.7413,0.698268,0.7549,0.760881,0.7549,0.751885
4,0.5745,0.59562,0.7954,0.796538,0.7954,0.794411
5,0.4617,0.579388,0.8007,0.809625,0.8007,0.801779
6,0.3654,0.548162,0.8187,0.826444,0.8187,0.818501
7,0.2701,0.579607,0.8163,0.825852,0.8163,0.817796
8,0.2016,0.579981,0.8172,0.819854,0.8172,0.815249
9,0.149,0.602065,0.8223,0.830631,0.8223,0.824271
10,0.1043,0.593555,0.8306,0.835582,0.8306,0.831068


[I 2025-01-05 08:40:50,503] Trial 28 pruned. 


Trial 29 with params: {'learning_rate': 6.187797075267839e-05, 'weight_decay': 0.01, 'adam_beta1': 0.96}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9265,1.449535,0.4644,0.461455,0.4644,0.459055
2,1.4263,1.206546,0.5602,0.565743,0.5602,0.556031
3,1.16,1.018002,0.6329,0.635851,0.6329,0.630489


[I 2025-01-05 08:47:14,410] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 0.00040351618359122916, 'weight_decay': 0.01, 'adam_beta1': 0.99}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5302,1.107742,0.5958,0.594683,0.5958,0.587103
2,1.0136,0.847365,0.6985,0.707735,0.6985,0.701026
3,0.7764,0.718215,0.754,0.760479,0.754,0.754562
4,0.6245,0.641368,0.7779,0.779005,0.7779,0.774775
5,0.5215,0.614973,0.7881,0.800707,0.7881,0.78849
6,0.4405,0.55185,0.8099,0.824994,0.8099,0.810899
7,0.3565,0.55271,0.8205,0.827275,0.8205,0.820768
8,0.2914,0.544716,0.8305,0.83093,0.8305,0.828199
9,0.2286,0.526493,0.8388,0.845154,0.8388,0.840409
10,0.1805,0.542583,0.8371,0.839678,0.8371,0.836822


[I 2025-01-05 09:12:49,493] Trial 30 pruned. 


Trial 31 with params: {'learning_rate': 7.501161458092744e-05, 'weight_decay': 0.01, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8637,1.435934,0.4681,0.470104,0.4681,0.464792
2,1.3459,1.111851,0.6015,0.598183,0.6015,0.597492
3,1.0483,0.924457,0.6729,0.676566,0.6729,0.671784


[I 2025-01-05 09:19:13,623] Trial 31 pruned. 


Trial 32 with params: {'learning_rate': 0.00017987487341337253, 'weight_decay': 0.01, 'adam_beta1': 0.97}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6673,1.183231,0.567,0.56266,0.567,0.561249
2,1.0639,0.870205,0.6949,0.701186,0.6949,0.694775
3,0.7816,0.72269,0.7461,0.752602,0.7461,0.744001
4,0.5954,0.612397,0.7898,0.788797,0.7898,0.788245
5,0.4656,0.589297,0.7974,0.807671,0.7974,0.798929
6,0.3608,0.579994,0.8064,0.816853,0.8064,0.807168
7,0.2716,0.610317,0.8062,0.816345,0.8062,0.806156
8,0.1961,0.655771,0.8074,0.812743,0.8074,0.804916
9,0.1357,0.607155,0.8213,0.826973,0.8213,0.82247
10,0.0999,0.624281,0.829,0.834222,0.829,0.830235


[I 2025-01-05 09:44:49,029] Trial 32 pruned. 


Trial 33 with params: {'learning_rate': 0.0001375060054089062, 'weight_decay': 0.007, 'adam_beta1': 0.99}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7838,1.317146,0.5127,0.506138,0.5127,0.501142
2,1.2444,1.021243,0.6281,0.632958,0.6281,0.627122
3,0.9596,0.866364,0.6954,0.701867,0.6954,0.695338
4,0.7532,0.762638,0.7332,0.73041,0.7332,0.729612
5,0.5955,0.668178,0.7669,0.77407,0.7669,0.767181
6,0.4703,0.6301,0.7794,0.790219,0.7794,0.780411
7,0.3614,0.645798,0.7854,0.793198,0.7854,0.785507
8,0.2724,0.691113,0.7817,0.785478,0.7817,0.779639
9,0.1902,0.684846,0.7914,0.803737,0.7914,0.794863
10,0.1322,0.728731,0.7882,0.79406,0.7882,0.788138


[I 2025-01-05 10:10:22,513] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 2.8038934896473335e-05, 'weight_decay': 0.01, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0774,1.644676,0.381,0.374977,0.381,0.370814
2,1.676,1.456053,0.4671,0.465184,0.4671,0.464632
3,1.4847,1.319991,0.5186,0.515205,0.5186,0.514397
4,1.3385,1.219909,0.5548,0.553928,0.5548,0.549895
5,1.2181,1.165114,0.5807,0.591533,0.5807,0.578669
6,1.1184,1.084092,0.6175,0.615457,0.6175,0.612354
7,1.0295,1.044356,0.6285,0.632898,0.6285,0.627711
8,0.954,1.00784,0.6463,0.641398,0.6463,0.641087
9,0.8821,0.996263,0.6467,0.660349,0.6467,0.650152
10,0.8202,0.95187,0.6606,0.66081,0.6606,0.659115


[I 2025-01-05 10:35:49,533] Trial 34 pruned. 


Trial 35 with params: {'learning_rate': 0.0004925521491380374, 'weight_decay': 0.008, 'adam_beta1': 0.97}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4936,1.065401,0.6153,0.616797,0.6153,0.614399
2,0.9496,0.801986,0.7178,0.727551,0.7178,0.718673
3,0.7259,0.658534,0.7678,0.77433,0.7678,0.766697
4,0.5983,0.578264,0.8011,0.805685,0.8011,0.797801
5,0.4941,0.583973,0.7995,0.812949,0.7995,0.801593
6,0.4078,0.510527,0.8295,0.836786,0.8295,0.829517
7,0.336,0.53764,0.8276,0.833611,0.8276,0.826932
8,0.2732,0.536804,0.8312,0.835083,0.8312,0.829237
9,0.2115,0.502514,0.8457,0.851378,0.8457,0.846653
10,0.1695,0.484192,0.854,0.85749,0.854,0.853897


[I 2025-01-05 11:01:18,947] Trial 35 pruned. 


Trial 36 with params: {'learning_rate': 2.075398068791511e-06, 'weight_decay': 0.006, 'adam_beta1': 0.99}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4498,2.267393,0.1372,0.157429,0.1372,0.084057
2,2.4011,2.226266,0.1767,0.132207,0.1767,0.1235
3,2.3467,2.171937,0.2049,0.203067,0.2049,0.161622


[I 2025-01-05 11:07:41,466] Trial 36 pruned. 


Trial 37 with params: {'learning_rate': 3.92291334425912e-06, 'weight_decay': 0.0, 'adam_beta1': 0.97}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.43,2.229822,0.1785,0.174986,0.1785,0.125259
2,2.322,2.088967,0.2432,0.232828,0.2432,0.206842
3,2.129,1.899266,0.2954,0.292112,0.2954,0.277154
4,1.9891,1.797882,0.3216,0.313088,0.3216,0.311877
5,1.922,1.750868,0.3359,0.332163,0.3359,0.325653
6,1.8741,1.70966,0.3563,0.347011,0.3563,0.344639
7,1.8355,1.667521,0.3666,0.359263,0.3666,0.35771
8,1.8022,1.665559,0.3758,0.372505,0.3758,0.362887
9,1.773,1.63011,0.3919,0.392158,0.3919,0.387649
10,1.7466,1.609582,0.3987,0.392177,0.3987,0.390862


[I 2025-01-05 11:33:16,645] Trial 37 pruned. 


Trial 38 with params: {'learning_rate': 0.00018168575078529773, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6516,1.166662,0.5741,0.573484,0.5741,0.570754
2,1.0166,0.830036,0.7045,0.714556,0.7045,0.705619
3,0.724,0.654419,0.7735,0.780981,0.7735,0.773478
4,0.5589,0.586825,0.7983,0.798835,0.7983,0.797153
5,0.4344,0.582735,0.8017,0.810444,0.8017,0.802967
6,0.3245,0.584696,0.8053,0.821958,0.8053,0.806304
7,0.239,0.588599,0.817,0.823193,0.817,0.815664
8,0.172,0.608273,0.8202,0.821215,0.8202,0.817997
9,0.1237,0.591844,0.8288,0.834629,0.8288,0.83003
10,0.0851,0.625611,0.8287,0.835308,0.8287,0.829591


[I 2025-01-05 12:15:55,227] Trial 38 finished with value: 0.8405213542354458 and parameters: {'learning_rate': 0.00018168575078529773, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.9400000000000001}. Best is trial 22 with value: 0.86291730276543.


Trial 39 with params: {'learning_rate': 0.0002947977049247464, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5445,1.035609,0.6227,0.62702,0.6227,0.620208
2,0.935,0.755529,0.7345,0.743601,0.7345,0.73472
3,0.6858,0.619067,0.7835,0.789564,0.7835,0.78403
4,0.5442,0.580627,0.8047,0.804374,0.8047,0.803141
5,0.4433,0.551817,0.8136,0.82091,0.8136,0.814933
6,0.3483,0.543,0.8264,0.838439,0.8264,0.827464
7,0.2688,0.540105,0.828,0.833298,0.828,0.826846
8,0.2016,0.561683,0.8319,0.836564,0.8319,0.830493
9,0.1474,0.540595,0.84,0.84672,0.84,0.841428
10,0.1096,0.565936,0.8394,0.845226,0.8394,0.839216


[I 2025-01-05 12:58:29,990] Trial 39 finished with value: 0.860222532824702 and parameters: {'learning_rate': 0.0002947977049247464, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.9400000000000001}. Best is trial 22 with value: 0.86291730276543.


Trial 40 with params: {'learning_rate': 0.00020652942222134038, 'weight_decay': 0.01, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6026,1.095366,0.602,0.603605,0.602,0.600169
2,0.987,0.824548,0.7104,0.708952,0.7104,0.707536
3,0.721,0.656982,0.7696,0.774285,0.7696,0.767133
4,0.5627,0.591322,0.7942,0.792547,0.7942,0.791528
5,0.4466,0.571206,0.8063,0.815896,0.8063,0.808007
6,0.3423,0.564947,0.8174,0.833951,0.8174,0.819564
7,0.2522,0.566043,0.8227,0.827047,0.8227,0.82121
8,0.1845,0.597047,0.8204,0.823936,0.8204,0.818025
9,0.1337,0.59116,0.8285,0.838044,0.8285,0.83056
10,0.0949,0.608517,0.8334,0.837099,0.8334,0.833408


[I 2025-01-05 13:41:12,177] Trial 40 finished with value: 0.8506766774228713 and parameters: {'learning_rate': 0.00020652942222134038, 'weight_decay': 0.01, 'adam_beta1': 0.9500000000000001}. Best is trial 22 with value: 0.86291730276543.


Trial 41 with params: {'learning_rate': 0.0004276250043872556, 'weight_decay': 0.008, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.465,1.002748,0.6335,0.630586,0.6335,0.627556
2,0.9202,0.740912,0.7415,0.745255,0.7415,0.74014
3,0.6867,0.639018,0.7771,0.784604,0.7771,0.776443
4,0.5534,0.59532,0.7962,0.797429,0.7962,0.794177
5,0.4686,0.558515,0.8051,0.812614,0.8051,0.805744
6,0.379,0.511824,0.8265,0.838594,0.8265,0.826624
7,0.3047,0.551869,0.8264,0.831876,0.8264,0.824582
8,0.2471,0.600445,0.8246,0.831169,0.8246,0.821463
9,0.1903,0.511897,0.8458,0.850208,0.8458,0.846498
10,0.1478,0.508903,0.8506,0.853311,0.8506,0.850274


[I 2025-01-05 14:23:54,193] Trial 41 finished with value: 0.8663758905377342 and parameters: {'learning_rate': 0.0004276250043872556, 'weight_decay': 0.008, 'adam_beta1': 0.9400000000000001}. Best is trial 41 with value: 0.8663758905377342.


Trial 42 with params: {'learning_rate': 0.0004670686683720471, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4563,0.988651,0.6338,0.63298,0.6338,0.628775
2,0.8962,0.741054,0.7441,0.748267,0.7441,0.742493
3,0.6734,0.619009,0.7855,0.79321,0.7855,0.785067
4,0.5529,0.531752,0.8175,0.819251,0.8175,0.816925
5,0.4538,0.560722,0.8073,0.822262,0.8073,0.8105
6,0.3752,0.527844,0.8252,0.842409,0.8252,0.826647
7,0.309,0.555326,0.827,0.833333,0.827,0.825437
8,0.252,0.524185,0.8379,0.84081,0.8379,0.836136
9,0.197,0.472335,0.8519,0.855355,0.8519,0.852688
10,0.1497,0.51406,0.8488,0.852637,0.8488,0.847909


[I 2025-01-05 14:49:31,192] Trial 42 pruned. 


Trial 43 with params: {'learning_rate': 0.0004725747942797692, 'weight_decay': 0.008, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5202,1.049342,0.6168,0.616421,0.6168,0.612083
2,0.9645,0.779385,0.7232,0.730057,0.7232,0.722855
3,0.7218,0.655909,0.7668,0.776302,0.7668,0.767587
4,0.5814,0.577393,0.8027,0.803525,0.8027,0.799994
5,0.483,0.537759,0.8149,0.819047,0.8149,0.815666
6,0.3965,0.536287,0.8172,0.841473,0.8172,0.820237
7,0.3229,0.513649,0.8319,0.837585,0.8319,0.830164
8,0.2594,0.508824,0.8396,0.842581,0.8396,0.837451
9,0.2007,0.499723,0.8437,0.849736,0.8437,0.844718
10,0.155,0.519655,0.8469,0.853087,0.8469,0.847738


[I 2025-01-05 15:32:11,073] Trial 43 finished with value: 0.8734249573321554 and parameters: {'learning_rate': 0.0004725747942797692, 'weight_decay': 0.008, 'adam_beta1': 0.9500000000000001}. Best is trial 43 with value: 0.8734249573321554.


Trial 44 with params: {'learning_rate': 0.0002586202470904171, 'weight_decay': 0.008, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.572,1.047924,0.6226,0.623142,0.6226,0.6201
2,0.9539,0.788371,0.725,0.727569,0.725,0.724647
3,0.6946,0.650969,0.7722,0.778209,0.7722,0.772804
4,0.5383,0.583792,0.7985,0.800123,0.7985,0.797708
5,0.4297,0.58956,0.7993,0.809862,0.7993,0.800428
6,0.3355,0.550477,0.8176,0.836395,0.8176,0.820118
7,0.2543,0.571033,0.8207,0.82696,0.8207,0.819353
8,0.1926,0.573277,0.8255,0.829198,0.8255,0.823739
9,0.1382,0.566545,0.8368,0.844027,0.8368,0.838013
10,0.1034,0.565774,0.8443,0.84653,0.8443,0.844147


[I 2025-01-05 16:14:50,466] Trial 44 finished with value: 0.8570214282825903 and parameters: {'learning_rate': 0.0002586202470904171, 'weight_decay': 0.008, 'adam_beta1': 0.9400000000000001}. Best is trial 43 with value: 0.8734249573321554.


Trial 45 with params: {'learning_rate': 0.0002772250200432083, 'weight_decay': 0.007, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5561,1.047056,0.6191,0.616475,0.6191,0.615019
2,0.9374,0.771624,0.728,0.730108,0.728,0.726273
3,0.6842,0.642774,0.7797,0.78775,0.7797,0.780383
4,0.5397,0.551972,0.8052,0.80588,0.8052,0.803279
5,0.431,0.558962,0.8135,0.822577,0.8135,0.814913
6,0.34,0.554033,0.8219,0.836892,0.8219,0.823944
7,0.2618,0.571558,0.8256,0.832446,0.8256,0.824782
8,0.196,0.569202,0.832,0.835001,0.832,0.83016
9,0.1414,0.554201,0.8375,0.845658,0.8375,0.839308
10,0.1052,0.565717,0.8438,0.846079,0.8438,0.843327


[I 2025-01-05 16:57:22,535] Trial 45 finished with value: 0.8610621552023506 and parameters: {'learning_rate': 0.0002772250200432083, 'weight_decay': 0.007, 'adam_beta1': 0.9500000000000001}. Best is trial 43 with value: 0.8734249573321554.


Trial 46 with params: {'learning_rate': 0.0004397901974583414, 'weight_decay': 0.004, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4823,1.051332,0.618,0.61644,0.618,0.613575
2,0.9168,0.745865,0.7366,0.746159,0.7366,0.73634
3,0.6945,0.640588,0.7728,0.779592,0.7728,0.772126
4,0.5601,0.57268,0.8046,0.804024,0.8046,0.80225
5,0.4696,0.560456,0.8064,0.812624,0.8064,0.806883
6,0.3819,0.506327,0.8302,0.839951,0.8302,0.8307
7,0.3063,0.5073,0.8331,0.839835,0.8331,0.832498
8,0.2445,0.547802,0.8318,0.833204,0.8318,0.82896
9,0.1942,0.515192,0.8412,0.84665,0.8412,0.842383
10,0.1439,0.517146,0.8489,0.853143,0.8489,0.849248


[I 2025-01-05 17:39:56,002] Trial 46 finished with value: 0.8680805975880987 and parameters: {'learning_rate': 0.0004397901974583414, 'weight_decay': 0.004, 'adam_beta1': 0.9500000000000001}. Best is trial 43 with value: 0.8734249573321554.


Trial 47 with params: {'learning_rate': 0.0004797607389782365, 'weight_decay': 0.003, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4959,1.08307,0.6109,0.615528,0.6109,0.608715
2,0.9584,0.770279,0.7316,0.736569,0.7316,0.731607
3,0.7143,0.635144,0.7778,0.787581,0.7778,0.778356
4,0.577,0.584406,0.802,0.802784,0.802,0.800133
5,0.478,0.543707,0.8134,0.818064,0.8134,0.814049
6,0.3977,0.542697,0.8202,0.834782,0.8202,0.820668
7,0.3224,0.529765,0.832,0.837915,0.832,0.831131
8,0.2643,0.553548,0.8264,0.832056,0.8264,0.824555
9,0.2055,0.475171,0.8543,0.857102,0.8543,0.854533
10,0.1576,0.521419,0.8476,0.853592,0.8476,0.848236


[I 2025-01-05 18:22:51,755] Trial 47 finished with value: 0.8720004894730644 and parameters: {'learning_rate': 0.0004797607389782365, 'weight_decay': 0.003, 'adam_beta1': 0.9400000000000001}. Best is trial 43 with value: 0.8734249573321554.


Trial 48 with params: {'learning_rate': 0.0003492552435215607, 'weight_decay': 0.002, 'adam_beta1': 0.96}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5225,1.026445,0.6289,0.634853,0.6289,0.62711
2,0.9372,0.793807,0.7236,0.726657,0.7236,0.723118
3,0.7034,0.65558,0.7682,0.77488,0.7682,0.767834
4,0.5599,0.567577,0.8043,0.805656,0.8043,0.801937
5,0.4542,0.570614,0.8088,0.819051,0.8088,0.810261
6,0.3666,0.543063,0.8239,0.832641,0.8239,0.823988
7,0.2884,0.551899,0.8269,0.833216,0.8269,0.825696
8,0.2208,0.568634,0.8331,0.837188,0.8331,0.830852
9,0.1643,0.519863,0.8443,0.848539,0.8443,0.845071
10,0.1225,0.51418,0.8528,0.853625,0.8528,0.852203


[I 2025-01-05 19:05:47,283] Trial 48 finished with value: 0.8628823283516527 and parameters: {'learning_rate': 0.0003492552435215607, 'weight_decay': 0.002, 'adam_beta1': 0.96}. Best is trial 43 with value: 0.8734249573321554.


Trial 49 with params: {'learning_rate': 0.0003267485508168292, 'weight_decay': 0.004, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5037,1.02345,0.6326,0.629438,0.6326,0.6286
2,0.9216,0.760325,0.7338,0.735409,0.7338,0.733889
3,0.6831,0.616931,0.7885,0.796538,0.7885,0.787626
4,0.5444,0.562224,0.8098,0.81057,0.8098,0.80843
5,0.4408,0.543788,0.8159,0.823848,0.8159,0.817148
6,0.3548,0.551357,0.8169,0.831343,0.8169,0.818734
7,0.2802,0.529237,0.8305,0.835374,0.8305,0.8293
8,0.2159,0.516477,0.8365,0.837907,0.8365,0.834413
9,0.1623,0.517359,0.8486,0.858577,0.8486,0.850679
10,0.119,0.524445,0.8496,0.852097,0.8496,0.849438


[I 2025-01-05 19:48:45,071] Trial 49 finished with value: 0.8650364930541079 and parameters: {'learning_rate': 0.0003267485508168292, 'weight_decay': 0.004, 'adam_beta1': 0.9500000000000001}. Best is trial 43 with value: 0.8734249573321554.


Trial 50 with params: {'learning_rate': 0.000448494834140905, 'weight_decay': 0.003, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4748,0.999107,0.6378,0.637541,0.6378,0.635556
2,0.9235,0.764875,0.7381,0.742248,0.7381,0.736283
3,0.6857,0.650859,0.769,0.784805,0.769,0.770706
4,0.5587,0.553325,0.814,0.815186,0.814,0.812562
5,0.4617,0.548922,0.8086,0.818335,0.8086,0.810682
6,0.3787,0.5417,0.8193,0.837565,0.8193,0.821123
7,0.3083,0.50752,0.8386,0.843065,0.8386,0.837577
8,0.2413,0.558687,0.8308,0.83435,0.8308,0.828627
9,0.1872,0.500504,0.8507,0.855137,0.8507,0.851544
10,0.1435,0.492879,0.85,0.855243,0.85,0.851166


[I 2025-01-05 20:31:26,854] Trial 50 finished with value: 0.8705269585305098 and parameters: {'learning_rate': 0.000448494834140905, 'weight_decay': 0.003, 'adam_beta1': 0.9400000000000001}. Best is trial 43 with value: 0.8734249573321554.


Trial 51 with params: {'learning_rate': 0.00022905223619638782, 'weight_decay': 0.004, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5841,1.10386,0.6022,0.603365,0.6022,0.598135
2,0.9664,0.805039,0.7175,0.719092,0.7175,0.717316
3,0.707,0.64581,0.776,0.783406,0.776,0.776257
4,0.5492,0.567129,0.8066,0.805918,0.8066,0.804425
5,0.4374,0.573411,0.8037,0.81079,0.8037,0.804765
6,0.3437,0.572338,0.8123,0.830737,0.8123,0.813826


[I 2025-01-05 20:44:15,392] Trial 51 pruned. 


Trial 52 with params: {'learning_rate': 0.0003572576187741018, 'weight_decay': 0.003, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4752,1.011231,0.6409,0.654688,0.6409,0.640656
2,0.8802,0.715536,0.7446,0.750261,0.7446,0.744742
3,0.6615,0.627614,0.783,0.790018,0.783,0.781081
4,0.5325,0.54604,0.814,0.81537,0.814,0.8129
5,0.4367,0.546844,0.8124,0.825626,0.8124,0.814315
6,0.3532,0.541202,0.8221,0.839542,0.8221,0.823461


[I 2025-01-05 20:57:05,827] Trial 52 pruned. 


Trial 53 with params: {'learning_rate': 0.00011040902092007652, 'weight_decay': 0.001, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7724,1.349078,0.5101,0.524677,0.5101,0.505291
2,1.19,0.959483,0.6558,0.65621,0.6558,0.654239
3,0.8923,0.780131,0.7273,0.730935,0.7273,0.72542


[I 2025-01-05 21:03:31,158] Trial 53 pruned. 


Trial 54 with params: {'learning_rate': 0.0004016091664777296, 'weight_decay': 0.003, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4854,1.019874,0.6322,0.631339,0.6322,0.629722
2,0.9295,0.759615,0.7334,0.738021,0.7334,0.732286
3,0.6956,0.667049,0.7671,0.780671,0.7671,0.766386
4,0.5689,0.577947,0.7997,0.79876,0.7997,0.797395
5,0.4722,0.578991,0.8016,0.812196,0.8016,0.802677
6,0.3847,0.549861,0.82,0.832715,0.82,0.820019


[I 2025-01-05 21:16:21,759] Trial 54 pruned. 


Trial 55 with params: {'learning_rate': 0.00040570395318183824, 'weight_decay': 0.004, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4997,1.023191,0.6276,0.638368,0.6276,0.628595
2,0.9233,0.737692,0.742,0.743085,0.742,0.741157
3,0.6867,0.651854,0.7782,0.7859,0.7782,0.77771
4,0.558,0.576176,0.8076,0.809609,0.8076,0.806115
5,0.4619,0.544779,0.8142,0.825653,0.8142,0.816864
6,0.3733,0.529681,0.8234,0.839911,0.8234,0.82527
7,0.3005,0.530598,0.8329,0.840793,0.8329,0.833411
8,0.2384,0.519069,0.839,0.841863,0.839,0.837501
9,0.1859,0.470977,0.8574,0.859697,0.8574,0.858002
10,0.1384,0.499792,0.8516,0.853549,0.8516,0.851573


[I 2025-01-05 21:59:01,117] Trial 55 finished with value: 0.8691381024646935 and parameters: {'learning_rate': 0.00040570395318183824, 'weight_decay': 0.004, 'adam_beta1': 0.93}. Best is trial 43 with value: 0.8734249573321554.


Trial 56 with params: {'learning_rate': 0.000442388035837816, 'weight_decay': 0.004, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5035,1.052023,0.6156,0.622884,0.6156,0.613331
2,0.9658,0.818383,0.7103,0.714292,0.7103,0.709703
3,0.7357,0.703441,0.7522,0.764883,0.7522,0.752767
4,0.5973,0.587103,0.8024,0.801475,0.8024,0.800251
5,0.4907,0.568354,0.8036,0.817451,0.8036,0.806715
6,0.4054,0.561609,0.8164,0.833848,0.8164,0.816844


[I 2025-01-05 22:11:46,460] Trial 56 pruned. 


Trial 57 with params: {'learning_rate': 0.00015451028113544184, 'weight_decay': 0.004, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6745,1.157302,0.5834,0.588872,0.5834,0.581887
2,1.0556,0.859545,0.6962,0.706529,0.6962,0.696158
3,0.7735,0.710457,0.7541,0.757504,0.7541,0.75381
4,0.6051,0.631469,0.781,0.782455,0.781,0.780149
5,0.473,0.635361,0.7842,0.793067,0.7842,0.784963
6,0.3595,0.630759,0.7922,0.812003,0.7922,0.793914


[I 2025-01-05 22:24:35,111] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 0.0004265256770682201, 'weight_decay': 0.005, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4796,1.02131,0.6273,0.635027,0.6273,0.627584
2,0.9211,0.746995,0.7416,0.744051,0.7416,0.740491
3,0.683,0.64442,0.7756,0.781738,0.7756,0.775658
4,0.5548,0.58562,0.8059,0.808576,0.8059,0.80405
5,0.4573,0.544259,0.8203,0.827257,0.8203,0.821745
6,0.3728,0.517218,0.8227,0.839108,0.8227,0.823905
7,0.3002,0.532418,0.8278,0.834855,0.8278,0.826466
8,0.2363,0.520577,0.8387,0.841837,0.8387,0.836894
9,0.1821,0.491377,0.8553,0.857783,0.8553,0.855895
10,0.1405,0.527167,0.8474,0.850395,0.8474,0.846851


[I 2025-01-05 23:07:18,718] Trial 58 finished with value: 0.870244789846579 and parameters: {'learning_rate': 0.0004265256770682201, 'weight_decay': 0.005, 'adam_beta1': 0.93}. Best is trial 43 with value: 0.8734249573321554.


Trial 59 with params: {'learning_rate': 0.0003444410121164827, 'weight_decay': 0.001, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5123,1.033352,0.6242,0.625736,0.6242,0.620589
2,0.9388,0.756097,0.735,0.737943,0.735,0.734245
3,0.6984,0.675718,0.7693,0.782734,0.7693,0.77051
4,0.5609,0.570574,0.805,0.808212,0.805,0.803758
5,0.4616,0.560951,0.8066,0.818768,0.8066,0.808924
6,0.3711,0.54432,0.8173,0.839123,0.8173,0.819845
7,0.2961,0.563864,0.8219,0.82728,0.8219,0.820527
8,0.2265,0.559359,0.8293,0.833029,0.8293,0.827388
9,0.1708,0.497884,0.851,0.856588,0.851,0.852357
10,0.1295,0.522955,0.8485,0.852111,0.8485,0.849078


[I 2025-01-05 23:50:06,296] Trial 59 finished with value: 0.8635176173765824 and parameters: {'learning_rate': 0.0003444410121164827, 'weight_decay': 0.001, 'adam_beta1': 0.93}. Best is trial 43 with value: 0.8734249573321554.


Trial 60 with params: {'learning_rate': 0.0003311460639031371, 'weight_decay': 0.005, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4983,0.994967,0.6408,0.644601,0.6408,0.637629
2,0.8898,0.726301,0.7466,0.744651,0.7466,0.744756
3,0.657,0.629337,0.7844,0.792082,0.7844,0.783435
4,0.5209,0.557179,0.8115,0.813942,0.8115,0.810122
5,0.426,0.560725,0.8132,0.823096,0.8132,0.815486
6,0.3386,0.542596,0.8232,0.844791,0.8232,0.825387
7,0.2592,0.578227,0.8237,0.832134,0.8237,0.821836
8,0.2019,0.536198,0.8412,0.843731,0.8412,0.839165
9,0.1483,0.507593,0.8516,0.85675,0.8516,0.852504
10,0.1116,0.517639,0.8536,0.858036,0.8536,0.853783


[I 2025-01-06 00:33:00,846] Trial 60 finished with value: 0.868428585207249 and parameters: {'learning_rate': 0.0003311460639031371, 'weight_decay': 0.005, 'adam_beta1': 0.91}. Best is trial 43 with value: 0.8734249573321554.


Trial 61 with params: {'learning_rate': 0.0003523966961192257, 'weight_decay': 0.006, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4669,0.986324,0.6436,0.65189,0.6436,0.643738
2,0.8942,0.750718,0.7374,0.743662,0.7374,0.736818
3,0.6659,0.619064,0.7838,0.790967,0.7838,0.783333
4,0.5357,0.54677,0.8141,0.815339,0.8141,0.81239
5,0.4428,0.557886,0.8056,0.822502,0.8056,0.808679
6,0.3517,0.549823,0.8154,0.837919,0.8154,0.818001


[I 2025-01-06 00:45:52,764] Trial 61 pruned. 


Trial 62 with params: {'learning_rate': 0.00019775145102347727, 'weight_decay': 0.005, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5741,1.101775,0.599,0.596099,0.599,0.593851
2,0.9652,0.790364,0.7221,0.723805,0.7221,0.721135
3,0.7016,0.681907,0.7633,0.776884,0.7633,0.761906
4,0.5452,0.583881,0.8021,0.802764,0.8021,0.800616
5,0.4272,0.566688,0.8053,0.814781,0.8053,0.807536
6,0.3301,0.60713,0.807,0.826464,0.807,0.807575


[I 2025-01-06 00:58:41,915] Trial 62 pruned. 


Trial 63 with params: {'learning_rate': 0.0004472893403094078, 'weight_decay': 0.005, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.501,1.018426,0.6287,0.630333,0.6287,0.625007
2,0.9385,0.792908,0.7216,0.72691,0.7216,0.720806
3,0.7099,0.672514,0.7704,0.784173,0.7704,0.770667
4,0.5716,0.553957,0.8085,0.810627,0.8085,0.807715
5,0.4699,0.542509,0.811,0.819024,0.811,0.812912
6,0.3846,0.573377,0.8097,0.832863,0.8097,0.8111
7,0.3105,0.5265,0.834,0.838771,0.834,0.833181
8,0.2459,0.533909,0.8328,0.836044,0.8328,0.830638
9,0.193,0.486675,0.8488,0.853554,0.8488,0.849419
10,0.1455,0.496321,0.8546,0.857965,0.8546,0.854569


[I 2025-01-06 01:41:35,249] Trial 63 finished with value: 0.8657352607464006 and parameters: {'learning_rate': 0.0004472893403094078, 'weight_decay': 0.005, 'adam_beta1': 0.93}. Best is trial 43 with value: 0.8734249573321554.


Trial 64 with params: {'learning_rate': 0.000429611397476057, 'weight_decay': 0.005, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.442,1.005649,0.6334,0.635353,0.6334,0.631819
2,0.892,0.774661,0.7263,0.734686,0.7263,0.723331
3,0.6618,0.638024,0.7784,0.787745,0.7784,0.77776
4,0.5373,0.553533,0.8101,0.809299,0.8101,0.807624
5,0.4469,0.555553,0.811,0.822908,0.811,0.812637
6,0.3626,0.558757,0.817,0.837094,0.817,0.818927
7,0.2908,0.52567,0.8373,0.84217,0.8373,0.836478
8,0.2258,0.553724,0.8324,0.834611,0.8324,0.830474
9,0.176,0.479681,0.851,0.853557,0.851,0.851325
10,0.1296,0.525169,0.8522,0.856691,0.8522,0.851873


[I 2025-01-06 02:24:26,857] Trial 64 finished with value: 0.868353644895224 and parameters: {'learning_rate': 0.000429611397476057, 'weight_decay': 0.005, 'adam_beta1': 0.9}. Best is trial 43 with value: 0.8734249573321554.


Trial 65 with params: {'learning_rate': 0.00034975166753541603, 'weight_decay': 0.005, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4819,0.970475,0.6398,0.644578,0.6398,0.635729
2,0.8646,0.710227,0.7556,0.757745,0.7556,0.754642
3,0.6423,0.645422,0.7797,0.78992,0.7797,0.780125
4,0.5197,0.537317,0.8161,0.817518,0.8161,0.814382
5,0.4237,0.510032,0.8205,0.827558,0.8205,0.821922
6,0.3404,0.535378,0.8246,0.843021,0.8246,0.825927
7,0.2673,0.52942,0.8359,0.840861,0.8359,0.835183
8,0.204,0.546578,0.8353,0.840103,0.8353,0.832477
9,0.1551,0.493083,0.8499,0.853152,0.8499,0.850126
10,0.1154,0.557393,0.8467,0.8519,0.8467,0.84597


[I 2025-01-06 03:07:13,602] Trial 65 finished with value: 0.8653114598917085 and parameters: {'learning_rate': 0.00034975166753541603, 'weight_decay': 0.005, 'adam_beta1': 0.9}. Best is trial 43 with value: 0.8734249573321554.


Trial 66 with params: {'learning_rate': 0.00040470030266719775, 'weight_decay': 0.004, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4686,1.016645,0.6322,0.639199,0.6322,0.629123
2,0.895,0.714659,0.7486,0.753312,0.7486,0.749137
3,0.6669,0.656501,0.772,0.78108,0.772,0.771896
4,0.5419,0.566078,0.8071,0.808088,0.8071,0.805918
5,0.4474,0.523593,0.821,0.830585,0.821,0.822778
6,0.3609,0.581764,0.8091,0.839269,0.8091,0.812208


[I 2025-01-06 03:20:00,138] Trial 66 pruned. 


Trial 67 with params: {'learning_rate': 9.335272885246343e-05, 'weight_decay': 0.006, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7957,1.355059,0.5029,0.501718,0.5029,0.49921
2,1.2429,1.014603,0.6363,0.635225,0.6363,0.634468
3,0.9461,0.827621,0.7043,0.713812,0.7043,0.705246
4,0.7466,0.702152,0.7581,0.755833,0.7581,0.755708
5,0.5926,0.655342,0.7675,0.779398,0.7675,0.769906
6,0.4678,0.663406,0.7745,0.788596,0.7745,0.774928


[I 2025-01-06 03:32:52,094] Trial 67 pruned. 


Trial 68 with params: {'learning_rate': 3.0237449631860023e-05, 'weight_decay': 0.003, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0551,1.620177,0.384,0.377958,0.384,0.372818
2,1.6547,1.420216,0.4784,0.476308,0.4784,0.472678
3,1.4494,1.285144,0.5378,0.534039,0.5378,0.531652


[I 2025-01-06 03:39:18,472] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 0.00044913226856979936, 'weight_decay': 0.005, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.468,1.006453,0.6324,0.644623,0.6324,0.631177
2,0.9181,0.740215,0.7378,0.740151,0.7378,0.736084
3,0.6834,0.639592,0.779,0.791253,0.779,0.780677
4,0.5583,0.553223,0.8129,0.812512,0.8129,0.811597
5,0.4566,0.542028,0.8146,0.828162,0.8146,0.816846
6,0.3768,0.54527,0.8197,0.839236,0.8197,0.822054
7,0.3046,0.530746,0.8285,0.837387,0.8285,0.82774
8,0.2441,0.513483,0.8415,0.843294,0.8415,0.839556
9,0.1866,0.488621,0.8532,0.855764,0.8532,0.85292
10,0.1459,0.509703,0.853,0.85857,0.853,0.853293


[I 2025-01-06 04:04:56,771] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.00030285916646651407, 'weight_decay': 0.006, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5183,1.045423,0.6239,0.625953,0.6239,0.622024
2,0.9148,0.720611,0.7442,0.746582,0.7442,0.743937
3,0.6626,0.634525,0.7797,0.786354,0.7797,0.779195
4,0.5319,0.575112,0.8047,0.808651,0.8047,0.803058
5,0.4255,0.526435,0.8194,0.830211,0.8194,0.821532
6,0.3397,0.574135,0.8135,0.840988,0.8135,0.815764
7,0.2618,0.531765,0.8355,0.838998,0.8355,0.834718
8,0.1973,0.533049,0.8356,0.839508,0.8356,0.833668
9,0.1453,0.508038,0.8467,0.850928,0.8467,0.847487
10,0.1071,0.53218,0.8466,0.852401,0.8466,0.847646


[I 2025-01-06 04:47:48,262] Trial 70 finished with value: 0.8674378267475584 and parameters: {'learning_rate': 0.00030285916646651407, 'weight_decay': 0.006, 'adam_beta1': 0.9}. Best is trial 43 with value: 0.8734249573321554.


Trial 71 with params: {'learning_rate': 0.00045551296610184743, 'weight_decay': 0.004, 'adam_beta1': 0.96}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4669,0.983371,0.641,0.639384,0.641,0.638166
2,0.9158,0.755632,0.7352,0.738605,0.7352,0.734607
3,0.6981,0.653298,0.7766,0.780846,0.7766,0.776257
4,0.5684,0.566161,0.8062,0.809173,0.8062,0.805218
5,0.4764,0.538676,0.8154,0.823458,0.8154,0.817038
6,0.3894,0.538664,0.8181,0.833867,0.8181,0.818813


[I 2025-01-06 05:00:39,442] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 0.0004855269556893471, 'weight_decay': 0.005, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4801,1.013938,0.6306,0.636074,0.6306,0.630806
2,0.9139,0.741974,0.7439,0.750552,0.7439,0.744114
3,0.6834,0.638821,0.7747,0.785596,0.7747,0.774936
4,0.5611,0.573721,0.8079,0.807724,0.8079,0.806453
5,0.4749,0.53218,0.8182,0.824668,0.8182,0.819236
6,0.3903,0.531885,0.8214,0.834738,0.8214,0.822503
7,0.3235,0.512437,0.8343,0.840801,0.8343,0.833714
8,0.261,0.528239,0.8344,0.838386,0.8344,0.831999
9,0.2078,0.488125,0.8483,0.853126,0.8483,0.84927
10,0.1568,0.497539,0.8544,0.85899,0.8544,0.854505


[I 2025-01-06 05:43:31,016] Trial 72 finished with value: 0.8721757676811153 and parameters: {'learning_rate': 0.0004855269556893471, 'weight_decay': 0.005, 'adam_beta1': 0.9400000000000001}. Best is trial 43 with value: 0.8734249573321554.


Trial 73 with params: {'learning_rate': 0.0004931227153023962, 'weight_decay': 0.002, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4963,1.039001,0.6185,0.631263,0.6185,0.615784
2,0.9309,0.777544,0.7249,0.727488,0.7249,0.721888
3,0.7031,0.646621,0.7759,0.786035,0.7759,0.776156
4,0.566,0.580712,0.8,0.801221,0.8,0.798277
5,0.472,0.539491,0.8166,0.825056,0.8166,0.818308
6,0.3852,0.527739,0.8256,0.842178,0.8256,0.826789
7,0.3136,0.545101,0.8283,0.834271,0.8283,0.826542
8,0.2516,0.557794,0.8304,0.834445,0.8304,0.827944
9,0.1977,0.493158,0.8496,0.853416,0.8496,0.850175
10,0.1522,0.520611,0.8461,0.850482,0.8461,0.845889


[I 2025-01-06 06:26:14,054] Trial 73 finished with value: 0.8672765399160488 and parameters: {'learning_rate': 0.0004931227153023962, 'weight_decay': 0.002, 'adam_beta1': 0.9400000000000001}. Best is trial 43 with value: 0.8734249573321554.


Trial 74 with params: {'learning_rate': 0.0003660005345388166, 'weight_decay': 0.006, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4977,1.008491,0.6309,0.630904,0.6309,0.627708
2,0.9101,0.761845,0.7344,0.741753,0.7344,0.734019
3,0.6869,0.628953,0.7774,0.786135,0.7774,0.778511
4,0.5507,0.532757,0.8181,0.817527,0.8181,0.817007
5,0.4476,0.562788,0.8076,0.817025,0.8076,0.808857
6,0.3615,0.555791,0.8152,0.834577,0.8152,0.81681


[I 2025-01-06 06:39:06,765] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.0004267365683134317, 'weight_decay': 0.004, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4833,1.009558,0.6333,0.638699,0.6333,0.633473
2,0.8964,0.725079,0.7445,0.746264,0.7445,0.74364
3,0.6688,0.642739,0.7773,0.787814,0.7773,0.777885
4,0.5446,0.533775,0.8192,0.820136,0.8192,0.818136
5,0.4486,0.533399,0.8177,0.824557,0.8177,0.819164
6,0.372,0.5299,0.8241,0.841676,0.8241,0.826139
7,0.298,0.523146,0.8294,0.834344,0.8294,0.82861
8,0.2376,0.546413,0.8328,0.837791,0.8328,0.831008
9,0.1797,0.484547,0.8489,0.852671,0.8489,0.849174
10,0.1363,0.505192,0.8519,0.854956,0.8519,0.852045


[I 2025-01-06 07:21:51,277] Trial 75 finished with value: 0.8655523846377962 and parameters: {'learning_rate': 0.0004267365683134317, 'weight_decay': 0.004, 'adam_beta1': 0.93}. Best is trial 43 with value: 0.8734249573321554.


Trial 76 with params: {'learning_rate': 3.721571397975755e-06, 'weight_decay': 0.01, 'adam_beta1': 0.99}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4303,2.237418,0.1728,0.135045,0.1728,0.120363
2,2.342,2.13602,0.224,0.213535,0.224,0.188456
3,2.1945,1.957101,0.2784,0.277683,0.2784,0.250162
4,2.0337,1.832001,0.3146,0.304911,0.3146,0.299912
5,1.9492,1.770352,0.327,0.324455,0.327,0.315685
6,1.8955,1.729565,0.3513,0.345534,0.3513,0.341256


[I 2025-01-06 07:34:38,948] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.00011231663323072297, 'weight_decay': 0.003, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7602,1.297926,0.5284,0.530748,0.5284,0.524786
2,1.1754,0.963881,0.6576,0.654774,0.6576,0.655149
3,0.8852,0.794462,0.7157,0.720883,0.7157,0.715679


[I 2025-01-06 07:41:04,181] Trial 77 pruned. 


Trial 78 with params: {'learning_rate': 0.000447602055169624, 'weight_decay': 0.004, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4848,0.988222,0.6469,0.653563,0.6469,0.648013
2,0.9089,0.751412,0.7358,0.742914,0.7358,0.735643
3,0.6976,0.660852,0.7723,0.781041,0.7723,0.77324
4,0.5715,0.56272,0.8037,0.804317,0.8037,0.802065
5,0.4725,0.551282,0.8142,0.825291,0.8142,0.816193
6,0.3843,0.587186,0.8131,0.835408,0.8131,0.815222
7,0.3146,0.525934,0.8306,0.836641,0.8306,0.830484
8,0.2516,0.562216,0.8284,0.833504,0.8284,0.825899
9,0.2018,0.522993,0.8419,0.849751,0.8419,0.843796
10,0.1521,0.540394,0.8451,0.84979,0.8451,0.845102


[I 2025-01-06 08:23:45,203] Trial 78 finished with value: 0.8634235867779394 and parameters: {'learning_rate': 0.000447602055169624, 'weight_decay': 0.004, 'adam_beta1': 0.93}. Best is trial 43 with value: 0.8734249573321554.


Trial 79 with params: {'learning_rate': 0.0004000670460098183, 'weight_decay': 0.007, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4562,0.96469,0.6538,0.654271,0.6538,0.650009
2,0.863,0.702417,0.7551,0.758365,0.7551,0.753652
3,0.6474,0.651694,0.7769,0.786126,0.7769,0.776767
4,0.5313,0.559874,0.8095,0.811477,0.8095,0.807406
5,0.4422,0.521907,0.8186,0.827151,0.8186,0.820194
6,0.3574,0.54995,0.8222,0.840171,0.8222,0.82428
7,0.2914,0.519902,0.8392,0.842617,0.8392,0.838282
8,0.2262,0.531603,0.8356,0.838005,0.8356,0.833685
9,0.1761,0.478396,0.8551,0.857029,0.8551,0.854941
10,0.1301,0.500594,0.854,0.856883,0.854,0.854186


[I 2025-01-06 09:06:40,687] Trial 79 finished with value: 0.8658422177623833 and parameters: {'learning_rate': 0.0004000670460098183, 'weight_decay': 0.007, 'adam_beta1': 0.9}. Best is trial 43 with value: 0.8734249573321554.


Trial 80 with params: {'learning_rate': 0.00041799722774633744, 'weight_decay': 0.003, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4774,1.029866,0.6229,0.622256,0.6229,0.615935
2,0.919,0.750097,0.739,0.741464,0.739,0.737599
3,0.6822,0.668776,0.7687,0.77612,0.7687,0.768633
4,0.5529,0.559449,0.8099,0.81078,0.8099,0.808312
5,0.451,0.551913,0.8118,0.819375,0.8118,0.813604
6,0.3686,0.565648,0.8161,0.840289,0.8161,0.818305
7,0.2957,0.541114,0.8339,0.83775,0.8339,0.832484
8,0.2317,0.525971,0.8366,0.841185,0.8366,0.834997
9,0.1766,0.51759,0.8456,0.849341,0.8456,0.846299
10,0.1327,0.533382,0.8501,0.852684,0.8501,0.849111


[I 2025-01-06 09:49:15,337] Trial 80 finished with value: 0.8689689700730387 and parameters: {'learning_rate': 0.00041799722774633744, 'weight_decay': 0.003, 'adam_beta1': 0.91}. Best is trial 43 with value: 0.8734249573321554.


Trial 81 with params: {'learning_rate': 0.00019257852267789502, 'weight_decay': 0.0, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5993,1.08993,0.6077,0.607928,0.6077,0.604713
2,0.9635,0.80249,0.7165,0.720815,0.7165,0.714942
3,0.7036,0.652361,0.7747,0.780826,0.7747,0.774378
4,0.5488,0.565565,0.8073,0.809251,0.8073,0.806495
5,0.4226,0.558582,0.8087,0.819513,0.8087,0.810758
6,0.3173,0.586938,0.8125,0.833585,0.8125,0.814518


[I 2025-01-06 10:02:06,899] Trial 81 pruned. 


Trial 82 with params: {'learning_rate': 0.0003609902553553248, 'weight_decay': 0.002, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5027,1.024951,0.6261,0.630354,0.6261,0.624042
2,0.9159,0.727593,0.7419,0.746034,0.7419,0.741832
3,0.6707,0.644396,0.7767,0.784447,0.7767,0.77803
4,0.5389,0.561981,0.8088,0.808999,0.8088,0.80714
5,0.4397,0.581985,0.8019,0.816281,0.8019,0.805214
6,0.3511,0.549123,0.8196,0.835577,0.8196,0.820377
7,0.28,0.526889,0.8336,0.83713,0.8336,0.833164
8,0.2151,0.541592,0.8365,0.837327,0.8365,0.83418
9,0.1613,0.526104,0.8431,0.848855,0.8431,0.84431
10,0.1213,0.562318,0.8425,0.850482,0.8425,0.842413


[I 2025-01-06 10:44:42,475] Trial 82 finished with value: 0.8641234643600398 and parameters: {'learning_rate': 0.0003609902553553248, 'weight_decay': 0.002, 'adam_beta1': 0.91}. Best is trial 43 with value: 0.8734249573321554.


Trial 83 with params: {'learning_rate': 0.00010582806610219601, 'weight_decay': 0.006, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7684,1.333896,0.5096,0.515478,0.5096,0.505844
2,1.1938,0.966519,0.6528,0.65474,0.6528,0.652583
3,0.8931,0.814032,0.7136,0.723497,0.7136,0.714398


[I 2025-01-06 10:51:06,209] Trial 83 pruned. 


Trial 84 with params: {'learning_rate': 0.000430315470339567, 'weight_decay': 0.003, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4489,0.986564,0.6429,0.652847,0.6429,0.640177
2,0.8742,0.71361,0.7488,0.751844,0.7488,0.747174
3,0.6633,0.631513,0.7821,0.789316,0.7821,0.782447
4,0.5436,0.55926,0.8111,0.813371,0.8111,0.809399
5,0.4485,0.551704,0.8099,0.82138,0.8099,0.811146
6,0.366,0.573404,0.8096,0.833508,0.8096,0.811509


[I 2025-01-06 11:03:52,869] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 7.33845823015502e-05, 'weight_decay': 0.004, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.8709,1.434637,0.4767,0.478081,0.4767,0.468887
2,1.354,1.116524,0.6003,0.599948,0.6003,0.598039
3,1.0461,0.925452,0.6722,0.678963,0.6722,0.670578


[I 2025-01-06 11:10:19,174] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 0.0004686238940265176, 'weight_decay': 0.006, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4828,1.067545,0.6147,0.616233,0.6147,0.613784
2,0.9367,0.758618,0.7404,0.747841,0.7404,0.740866
3,0.6999,0.651856,0.7709,0.780034,0.7709,0.771354
4,0.5694,0.551811,0.8101,0.810179,0.8101,0.80852
5,0.4728,0.528005,0.8193,0.824367,0.8193,0.819714
6,0.3932,0.508082,0.8282,0.838517,0.8282,0.82893
7,0.3189,0.545018,0.8255,0.832265,0.8255,0.823295
8,0.2544,0.52248,0.8381,0.84046,0.8381,0.835732
9,0.1961,0.471568,0.8535,0.858236,0.8535,0.854827
10,0.1498,0.498642,0.8547,0.856142,0.8547,0.854141


[I 2025-01-06 11:36:06,624] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 0.00020604669292026708, 'weight_decay': 0.003, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6011,1.083391,0.6063,0.608053,0.6063,0.603271
2,0.9819,0.798708,0.7195,0.730051,0.7195,0.721361
3,0.7143,0.675274,0.7667,0.776219,0.7667,0.76687
4,0.5549,0.578859,0.8025,0.803954,0.8025,0.801226
5,0.4372,0.552844,0.8118,0.818038,0.8118,0.812621
6,0.335,0.573637,0.8102,0.82623,0.8102,0.810927


[I 2025-01-06 11:48:51,650] Trial 87 pruned. 


Trial 88 with params: {'learning_rate': 0.0003375974608332158, 'weight_decay': 0.004, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4978,0.98746,0.6353,0.643702,0.6353,0.633812
2,0.8919,0.732354,0.7398,0.748187,0.7398,0.74
3,0.6713,0.641148,0.7791,0.78711,0.7791,0.778527
4,0.544,0.558513,0.8096,0.810764,0.8096,0.808352
5,0.4448,0.551292,0.8119,0.822292,0.8119,0.81463
6,0.3545,0.552477,0.8213,0.839149,0.8213,0.82293
7,0.2725,0.573513,0.821,0.826372,0.821,0.818805
8,0.2085,0.554744,0.8352,0.838027,0.8352,0.832871
9,0.1592,0.555709,0.8352,0.84085,0.8352,0.835407
10,0.1164,0.569493,0.8444,0.848724,0.8444,0.845263


[I 2025-01-06 12:31:27,514] Trial 88 finished with value: 0.8655746142035923 and parameters: {'learning_rate': 0.0003375974608332158, 'weight_decay': 0.004, 'adam_beta1': 0.92}. Best is trial 43 with value: 0.8734249573321554.


Trial 89 with params: {'learning_rate': 2.6182343503528787e-06, 'weight_decay': 0.006, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4427,2.254329,0.1582,0.135416,0.1582,0.106412
2,2.3742,2.186322,0.1969,0.177306,0.1969,0.146771
3,2.2724,2.05983,0.2471,0.236901,0.2471,0.208843
4,2.1291,1.916776,0.2861,0.278441,0.2861,0.261006
5,2.0327,1.846283,0.3002,0.294921,0.3002,0.274112
6,1.9725,1.801134,0.3285,0.324926,0.3285,0.312327
7,1.9339,1.757089,0.3403,0.332348,0.3403,0.327599
8,1.9004,1.763123,0.3397,0.334785,0.3397,0.325425
9,1.8765,1.719727,0.3478,0.345125,0.3478,0.341148
10,1.854,1.70819,0.3537,0.345373,0.3537,0.345172


[I 2025-01-06 12:57:18,150] Trial 89 pruned. 


Trial 90 with params: {'learning_rate': 0.0001268966887509396, 'weight_decay': 0.004, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7198,1.235088,0.5542,0.550633,0.5542,0.551028
2,1.1143,0.912102,0.6762,0.684734,0.6762,0.676012
3,0.8231,0.727061,0.7441,0.749168,0.7441,0.743565
4,0.6314,0.63495,0.781,0.779171,0.781,0.778519
5,0.4934,0.625153,0.7884,0.799095,0.7884,0.790648
6,0.3705,0.639049,0.7908,0.813735,0.7908,0.792634


[I 2025-01-06 13:10:05,777] Trial 90 pruned. 


Trial 91 with params: {'learning_rate': 0.0004292963049723308, 'weight_decay': 0.004, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4834,1.008501,0.6352,0.637733,0.6352,0.631453
2,0.9218,0.735325,0.7401,0.745496,0.7401,0.740171
3,0.6891,0.669961,0.7694,0.781772,0.7694,0.769579
4,0.5583,0.560746,0.8126,0.81153,0.8126,0.810646
5,0.4615,0.53064,0.8168,0.826499,0.8168,0.818998
6,0.3752,0.528574,0.8205,0.839893,0.8205,0.822198
7,0.302,0.546023,0.8293,0.836442,0.8293,0.828479
8,0.2388,0.512451,0.8395,0.841021,0.8395,0.838169
9,0.1844,0.499047,0.8482,0.853243,0.8482,0.849416
10,0.1387,0.525033,0.8511,0.856674,0.8511,0.851951


[I 2025-01-06 13:52:41,551] Trial 91 finished with value: 0.8661125228947864 and parameters: {'learning_rate': 0.0004292963049723308, 'weight_decay': 0.004, 'adam_beta1': 0.9400000000000001}. Best is trial 43 with value: 0.8734249573321554.


Trial 92 with params: {'learning_rate': 5.417104271994568e-05, 'weight_decay': 0.004, 'adam_beta1': 0.96}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9195,1.494215,0.4479,0.445528,0.4479,0.441923
2,1.4674,1.252064,0.5364,0.542701,0.5364,0.530771
3,1.2072,1.065434,0.6196,0.622542,0.6196,0.618188
4,1.0177,0.952708,0.6629,0.659633,0.6629,0.658616
5,0.8727,0.891043,0.6847,0.697729,0.6847,0.687187
6,0.7541,0.827185,0.7099,0.718938,0.7099,0.708344


[I 2025-01-06 14:05:28,218] Trial 92 pruned. 


Trial 93 with params: {'learning_rate': 1.7230959137136504e-05, 'weight_decay': 0.0, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.188,1.758128,0.3376,0.334327,0.3376,0.328221
2,1.815,1.585962,0.4082,0.4058,0.4082,0.40269
3,1.6471,1.465144,0.4554,0.452772,0.4554,0.44776
4,1.5294,1.388459,0.4922,0.486299,0.4922,0.485305
5,1.4374,1.33866,0.5144,0.520698,0.5144,0.51052
6,1.3592,1.273064,0.5377,0.532705,0.5377,0.529253


[I 2025-01-06 14:18:26,521] Trial 93 pruned. 


Trial 94 with params: {'learning_rate': 0.00022213623144275155, 'weight_decay': 0.005, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6123,1.12719,0.5944,0.588529,0.5944,0.588651
2,0.9858,0.804674,0.7166,0.721927,0.7166,0.717569
3,0.7217,0.657851,0.7715,0.775922,0.7715,0.770085
4,0.5646,0.596874,0.7949,0.794804,0.7949,0.792611
5,0.4466,0.579802,0.7989,0.811948,0.7989,0.801686
6,0.3462,0.598399,0.8036,0.824908,0.8036,0.806


[I 2025-01-06 14:31:13,459] Trial 94 pruned. 


Trial 95 with params: {'learning_rate': 0.0004237366326187549, 'weight_decay': 0.004, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4949,0.981784,0.639,0.641251,0.639,0.636717
2,0.9152,0.777847,0.7296,0.734782,0.7296,0.729424
3,0.6953,0.642383,0.7794,0.789802,0.7794,0.780098
4,0.5657,0.552816,0.8139,0.812935,0.8139,0.812158
5,0.4712,0.555103,0.8068,0.813565,0.8068,0.808061
6,0.3854,0.529548,0.8253,0.842837,0.8253,0.826998
7,0.3109,0.519009,0.8323,0.837774,0.8323,0.830987
8,0.2466,0.517887,0.8356,0.838843,0.8356,0.834065
9,0.1928,0.485076,0.8504,0.855142,0.8504,0.851515
10,0.1447,0.499124,0.8539,0.857063,0.8539,0.854281


[I 2025-01-06 15:13:56,128] Trial 95 finished with value: 0.8719877942652667 and parameters: {'learning_rate': 0.0004237366326187549, 'weight_decay': 0.004, 'adam_beta1': 0.9400000000000001}. Best is trial 43 with value: 0.8734249573321554.


Trial 96 with params: {'learning_rate': 0.000413148563825221, 'weight_decay': 0.003, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.471,0.994665,0.6384,0.637454,0.6384,0.635661
2,0.9096,0.75106,0.7354,0.74139,0.7354,0.735864
3,0.6696,0.617798,0.7851,0.791071,0.7851,0.784615
4,0.5446,0.548153,0.8137,0.815269,0.8137,0.812327
5,0.4451,0.548388,0.8131,0.821297,0.8131,0.814996
6,0.3659,0.539871,0.8214,0.839258,0.8214,0.82342
7,0.2941,0.531826,0.8319,0.838061,0.8319,0.831296
8,0.231,0.558802,0.8319,0.83706,0.8319,0.829613
9,0.1749,0.500682,0.8485,0.85284,0.8485,0.849266
10,0.134,0.519961,0.855,0.856999,0.855,0.854983


[I 2025-01-06 15:39:52,802] Trial 96 pruned. 


Trial 97 with params: {'learning_rate': 0.00048761131220534826, 'weight_decay': 0.004, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4663,1.004144,0.6375,0.641706,0.6375,0.638331
2,0.9064,0.742697,0.741,0.744793,0.741,0.739731
3,0.6861,0.638436,0.7829,0.788011,0.7829,0.78191
4,0.5598,0.560233,0.807,0.808854,0.807,0.805733
5,0.4697,0.522531,0.8226,0.829503,0.8226,0.824442
6,0.3853,0.546408,0.8166,0.835934,0.8166,0.818696
7,0.3193,0.527354,0.8292,0.833227,0.8292,0.827673
8,0.2596,0.517769,0.8381,0.840048,0.8381,0.836439
9,0.1986,0.476436,0.8521,0.855492,0.8521,0.852541
10,0.155,0.473792,0.8584,0.861753,0.8584,0.858836


[I 2025-01-06 16:22:28,864] Trial 97 finished with value: 0.8709650111005172 and parameters: {'learning_rate': 0.00048761131220534826, 'weight_decay': 0.004, 'adam_beta1': 0.92}. Best is trial 43 with value: 0.8734249573321554.


Trial 98 with params: {'learning_rate': 0.00047100232840302893, 'weight_decay': 0.004, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4794,0.995511,0.6367,0.642089,0.6367,0.634788
2,0.9261,0.741851,0.7414,0.746798,0.7414,0.741291
3,0.7007,0.655505,0.7727,0.783538,0.7727,0.772378
4,0.5705,0.582737,0.8012,0.803523,0.8012,0.798767
5,0.4814,0.557349,0.8112,0.820417,0.8112,0.812184
6,0.3962,0.580275,0.8087,0.829953,0.8087,0.810244


[I 2025-01-06 16:35:16,269] Trial 98 pruned. 


Trial 99 with params: {'learning_rate': 0.0004514735259044771, 'weight_decay': 0.005, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.473,1.005352,0.6357,0.640436,0.6357,0.634453
2,0.9087,0.76104,0.7372,0.743965,0.7372,0.736132
3,0.6867,0.637072,0.7771,0.783378,0.7771,0.776669
4,0.5579,0.564109,0.8092,0.815156,0.8092,0.809342
5,0.4638,0.541249,0.8146,0.826223,0.8146,0.817479
6,0.3803,0.548409,0.8153,0.840204,0.8153,0.816718
7,0.3057,0.516029,0.8305,0.835836,0.8305,0.829574
8,0.2471,0.531576,0.8341,0.837817,0.8341,0.83291
9,0.1904,0.478365,0.853,0.855514,0.853,0.853358
10,0.1438,0.536494,0.8439,0.851127,0.8439,0.844961


[I 2025-01-06 17:17:54,965] Trial 99 finished with value: 0.8721532469666696 and parameters: {'learning_rate': 0.0004514735259044771, 'weight_decay': 0.005, 'adam_beta1': 0.93}. Best is trial 43 with value: 0.8734249573321554.


Trial 100 with params: {'learning_rate': 0.0002641183491967019, 'weight_decay': 0.006, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5647,1.029215,0.63,0.636395,0.63,0.628918
2,0.9402,0.747744,0.7379,0.742299,0.7379,0.73641
3,0.6738,0.633208,0.7797,0.784876,0.7797,0.779192
4,0.537,0.550297,0.8137,0.81523,0.8137,0.813207
5,0.429,0.531248,0.8207,0.826095,0.8207,0.820692
6,0.3358,0.541818,0.8211,0.841328,0.8211,0.823765
7,0.2583,0.537493,0.8328,0.837989,0.8328,0.831854
8,0.192,0.549773,0.8334,0.835756,0.8334,0.831329
9,0.1414,0.527465,0.8432,0.849311,0.8432,0.844258
10,0.1073,0.557513,0.8474,0.853141,0.8474,0.846951


[I 2025-01-06 18:00:43,756] Trial 100 finished with value: 0.8645121750354681 and parameters: {'learning_rate': 0.0002641183491967019, 'weight_decay': 0.006, 'adam_beta1': 0.93}. Best is trial 43 with value: 0.8734249573321554.


Trial 101 with params: {'learning_rate': 0.0004951724773049825, 'weight_decay': 0.004, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4542,1.014381,0.6353,0.641768,0.6353,0.637136
2,0.8979,0.769314,0.7311,0.736012,0.7311,0.727837
3,0.6844,0.66199,0.7732,0.781736,0.7732,0.773357
4,0.56,0.560628,0.8075,0.810163,0.8075,0.805838
5,0.4654,0.534379,0.8177,0.824763,0.8177,0.819316
6,0.3855,0.561054,0.8154,0.839704,0.8154,0.816886
7,0.3126,0.526515,0.8308,0.838331,0.8308,0.830913
8,0.2499,0.535112,0.8334,0.836048,0.8334,0.831657
9,0.2003,0.490494,0.8506,0.856048,0.8506,0.851891
10,0.1519,0.509538,0.8525,0.857347,0.8525,0.853488


[I 2025-01-06 18:43:22,041] Trial 101 finished with value: 0.8624748676835965 and parameters: {'learning_rate': 0.0004951724773049825, 'weight_decay': 0.004, 'adam_beta1': 0.92}. Best is trial 43 with value: 0.8734249573321554.


Trial 102 with params: {'learning_rate': 0.00040261834393029834, 'weight_decay': 0.005, 'adam_beta1': 0.9400000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4875,1.007465,0.6324,0.632277,0.6324,0.630101
2,0.9133,0.755406,0.7312,0.737288,0.7312,0.729748
3,0.679,0.634687,0.7772,0.788998,0.7772,0.778897
4,0.5554,0.546372,0.8147,0.816119,0.8147,0.813842
5,0.4549,0.550271,0.8114,0.817887,0.8114,0.812602
6,0.3738,0.567093,0.8133,0.836038,0.8133,0.816189


[I 2025-01-06 18:56:14,008] Trial 102 pruned. 


Trial 103 with params: {'learning_rate': 0.0004875046222887296, 'weight_decay': 0.007, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4544,1.035429,0.6227,0.62232,0.6227,0.615881
2,0.9159,0.746902,0.7405,0.741408,0.7405,0.738548
3,0.6921,0.657186,0.7727,0.783978,0.7727,0.773849
4,0.567,0.580375,0.8001,0.802367,0.8001,0.798884
5,0.4661,0.562836,0.8075,0.818758,0.8075,0.809514
6,0.3864,0.554367,0.8162,0.838492,0.8162,0.818517
7,0.316,0.507879,0.8345,0.839755,0.8345,0.833284
8,0.2522,0.54694,0.831,0.834344,0.831,0.82914
9,0.1969,0.481735,0.8528,0.855952,0.8528,0.853646
10,0.1538,0.478611,0.8612,0.86355,0.8612,0.861523


[I 2025-01-06 19:38:53,766] Trial 103 finished with value: 0.8729018320751306 and parameters: {'learning_rate': 0.0004875046222887296, 'weight_decay': 0.007, 'adam_beta1': 0.92}. Best is trial 43 with value: 0.8734249573321554.


Trial 104 with params: {'learning_rate': 0.00038101676079616616, 'weight_decay': 0.006, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4759,1.007977,0.6269,0.627276,0.6269,0.624763
2,0.9029,0.780982,0.7298,0.73418,0.7298,0.727996
3,0.676,0.638755,0.7763,0.786122,0.7763,0.776412
4,0.5465,0.551865,0.8109,0.812641,0.8109,0.810214
5,0.4497,0.51921,0.8225,0.829793,0.8225,0.823836
6,0.3669,0.53587,0.824,0.840371,0.824,0.825559
7,0.2897,0.549359,0.8257,0.831182,0.8257,0.82443
8,0.2294,0.521268,0.8405,0.842635,0.8405,0.838135
9,0.1767,0.501853,0.8478,0.851534,0.8478,0.848564
10,0.1303,0.52705,0.8502,0.856399,0.8502,0.850934


[I 2025-01-06 20:21:27,762] Trial 104 finished with value: 0.8673069480019088 and parameters: {'learning_rate': 0.00038101676079616616, 'weight_decay': 0.006, 'adam_beta1': 0.93}. Best is trial 43 with value: 0.8734249573321554.


Trial 105 with params: {'learning_rate': 1.6562808358868155e-06, 'weight_decay': 0.01, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.4572,2.277532,0.1206,0.124015,0.1206,0.059839
2,2.4126,2.240326,0.166,0.134109,0.166,0.112822
3,2.3694,2.198128,0.1933,0.185263,0.1933,0.141917
4,2.3121,2.119181,0.2269,0.202366,0.2269,0.175262
5,2.2402,2.049466,0.2459,0.238973,0.2459,0.20453
6,2.1515,1.966135,0.2749,0.273657,0.2749,0.242066
7,2.0867,1.894638,0.2919,0.284931,0.2919,0.266964
8,2.0356,1.885482,0.2959,0.292269,0.2959,0.273417
9,2.0062,1.832391,0.3081,0.305261,0.3081,0.291479
10,1.9766,1.823234,0.3194,0.312048,0.3194,0.307714


[I 2025-01-06 20:46:59,831] Trial 105 pruned. 


Trial 106 with params: {'learning_rate': 0.00033832242794776723, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5041,1.021198,0.6295,0.637126,0.6295,0.626644
2,0.8904,0.722517,0.7484,0.753113,0.7484,0.74647
3,0.6601,0.621556,0.7816,0.789003,0.7816,0.781997
4,0.5283,0.563579,0.807,0.808238,0.807,0.804537
5,0.4368,0.519542,0.8239,0.832784,0.8239,0.82586
6,0.3467,0.545855,0.8216,0.844708,0.8216,0.82374
7,0.2692,0.522023,0.8351,0.838209,0.8351,0.834164
8,0.2068,0.538181,0.8344,0.838972,0.8344,0.832128
9,0.1532,0.500001,0.8511,0.853477,0.8511,0.85167
10,0.1126,0.565213,0.8453,0.851347,0.8453,0.845753


[I 2025-01-06 21:29:33,388] Trial 106 finished with value: 0.8698852189150866 and parameters: {'learning_rate': 0.00033832242794776723, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.91}. Best is trial 43 with value: 0.8734249573321554.


Trial 107 with params: {'learning_rate': 0.00028522255066289827, 'weight_decay': 0.007, 'adam_beta1': 0.93}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5133,1.007248,0.6325,0.634758,0.6325,0.631105
2,0.9179,0.731844,0.7426,0.745861,0.7426,0.741407
3,0.6639,0.593407,0.7962,0.803744,0.7962,0.796694
4,0.5211,0.533475,0.8196,0.818742,0.8196,0.817881
5,0.4254,0.554023,0.8127,0.820069,0.8127,0.813794
6,0.3326,0.57059,0.815,0.839477,0.815,0.817328
7,0.2539,0.554289,0.8292,0.834969,0.8292,0.82702
8,0.1923,0.55371,0.8385,0.841047,0.8385,0.836847
9,0.1397,0.50961,0.8531,0.855862,0.8531,0.853658
10,0.1022,0.567822,0.845,0.852402,0.845,0.845209


[I 2025-01-06 22:12:05,882] Trial 107 finished with value: 0.863185979768234 and parameters: {'learning_rate': 0.00028522255066289827, 'weight_decay': 0.007, 'adam_beta1': 0.93}. Best is trial 43 with value: 0.8734249573321554.


Trial 108 with params: {'learning_rate': 0.0004365276969073769, 'weight_decay': 0.008, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4887,1.019611,0.6295,0.638153,0.6295,0.629674
2,0.9135,0.751291,0.7363,0.741747,0.7363,0.733969
3,0.6863,0.646845,0.7734,0.77937,0.7734,0.77269
4,0.5588,0.565889,0.8064,0.8097,0.8064,0.805643
5,0.4655,0.547127,0.8102,0.822986,0.8102,0.812744
6,0.3779,0.560665,0.8159,0.838858,0.8159,0.818546
7,0.3108,0.532496,0.827,0.832483,0.827,0.825579
8,0.2463,0.524314,0.837,0.839808,0.837,0.835077
9,0.1888,0.471913,0.8543,0.855554,0.8543,0.854048
10,0.1479,0.52281,0.8528,0.857168,0.8528,0.852818


[I 2025-01-06 22:54:44,884] Trial 108 finished with value: 0.8727676220851027 and parameters: {'learning_rate': 0.0004365276969073769, 'weight_decay': 0.008, 'adam_beta1': 0.91}. Best is trial 43 with value: 0.8734249573321554.


Trial 109 with params: {'learning_rate': 0.000212997016148891, 'weight_decay': 0.008, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5766,1.094815,0.6031,0.612541,0.6031,0.605233
2,0.9812,0.812522,0.7172,0.718901,0.7172,0.715546
3,0.7178,0.655792,0.7713,0.776905,0.7713,0.770252
4,0.5547,0.569587,0.8069,0.805872,0.8069,0.80568
5,0.429,0.552165,0.8132,0.820541,0.8132,0.814902
6,0.3266,0.570732,0.8129,0.833972,0.8129,0.81536
7,0.2408,0.556546,0.8258,0.832277,0.8258,0.825673
8,0.1763,0.599999,0.8241,0.826453,0.8241,0.821476
9,0.1256,0.549609,0.839,0.841442,0.839,0.839362
10,0.0912,0.602112,0.8349,0.838596,0.8349,0.835332


[I 2025-01-06 23:20:21,924] Trial 109 pruned. 


Trial 110 with params: {'learning_rate': 0.0002399718778688677, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.9}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5515,1.038404,0.6239,0.622679,0.6239,0.618936
2,0.9206,0.76678,0.7295,0.736039,0.7295,0.729121
3,0.6595,0.620691,0.7884,0.795524,0.7884,0.789411
4,0.5198,0.569419,0.8067,0.810447,0.8067,0.806308
5,0.4113,0.55324,0.8089,0.81752,0.8089,0.810595
6,0.3156,0.585042,0.8104,0.834598,0.8104,0.814136


[I 2025-01-06 23:33:08,914] Trial 110 pruned. 


Trial 111 with params: {'learning_rate': 0.00029279311575263313, 'weight_decay': 0.01, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5381,1.029084,0.6271,0.629944,0.6271,0.625677
2,0.9177,0.743338,0.7429,0.745048,0.7429,0.74118
3,0.6735,0.632915,0.7803,0.789758,0.7803,0.781466
4,0.5259,0.565544,0.8091,0.809425,0.8091,0.807847
5,0.4262,0.567103,0.8066,0.821537,0.8066,0.809399
6,0.3374,0.560736,0.8182,0.834808,0.8182,0.82036
7,0.2549,0.531882,0.8358,0.838637,0.8358,0.834667
8,0.1963,0.547865,0.8385,0.841163,0.8385,0.837171
9,0.1459,0.5297,0.844,0.847759,0.844,0.844576
10,0.1067,0.56984,0.8448,0.848234,0.8448,0.844341


[I 2025-01-06 23:58:43,844] Trial 111 pruned. 


Trial 112 with params: {'learning_rate': 0.00042883235294515854, 'weight_decay': 0.008, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4586,1.035847,0.6268,0.630918,0.6268,0.624924
2,0.8933,0.758939,0.7339,0.738334,0.7339,0.732454
3,0.6731,0.636689,0.7782,0.785521,0.7782,0.777974
4,0.55,0.560111,0.807,0.806917,0.807,0.805357
5,0.4553,0.518992,0.8232,0.831827,0.8232,0.825028
6,0.3727,0.536765,0.8229,0.840531,0.8229,0.824673
7,0.2975,0.537818,0.83,0.835377,0.83,0.829098
8,0.24,0.543594,0.8294,0.833039,0.8294,0.827152
9,0.1853,0.48844,0.8534,0.855882,0.8534,0.853524
10,0.1433,0.518822,0.8521,0.856005,0.8521,0.851279


[I 2025-01-07 00:41:22,585] Trial 112 finished with value: 0.8706206914500061 and parameters: {'learning_rate': 0.00042883235294515854, 'weight_decay': 0.008, 'adam_beta1': 0.92}. Best is trial 43 with value: 0.8734249573321554.


Trial 113 with params: {'learning_rate': 0.0004111791189875075, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.472,0.995304,0.6375,0.634364,0.6375,0.630972
2,0.8953,0.728746,0.7493,0.747151,0.7493,0.746951
3,0.6783,0.646693,0.78,0.788866,0.78,0.779676
4,0.5469,0.564003,0.8063,0.808938,0.8063,0.805246
5,0.4516,0.553311,0.8081,0.817002,0.8081,0.810078
6,0.367,0.592281,0.8018,0.830401,0.8018,0.804261


[I 2025-01-07 00:54:08,962] Trial 113 pruned. 


Trial 114 with params: {'learning_rate': 0.0003722199724674618, 'weight_decay': 0.008, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4863,1.007537,0.6326,0.636623,0.6326,0.629187
2,0.8848,0.73654,0.7441,0.748527,0.7441,0.742066
3,0.6645,0.638198,0.7791,0.788339,0.7791,0.779073
4,0.5326,0.544711,0.8116,0.813698,0.8116,0.810422
5,0.443,0.512094,0.8287,0.833874,0.8287,0.829566
6,0.3519,0.568748,0.8166,0.841472,0.8166,0.817022


[I 2025-01-07 01:06:56,829] Trial 114 pruned. 


Trial 115 with params: {'learning_rate': 0.0004963807424292574, 'weight_decay': 0.008, 'adam_beta1': 0.91}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4547,1.010476,0.6363,0.64722,0.6363,0.636405
2,0.8967,0.739689,0.7429,0.74679,0.7429,0.74096
3,0.6859,0.650148,0.7769,0.782966,0.7769,0.777242
4,0.563,0.572751,0.8052,0.807688,0.8052,0.803526
5,0.47,0.538843,0.8106,0.822436,0.8106,0.813435
6,0.393,0.552786,0.82,0.834168,0.82,0.820603
7,0.3211,0.578589,0.8209,0.828237,0.8209,0.819488
8,0.2586,0.551265,0.8343,0.837307,0.8343,0.832736
9,0.2034,0.498952,0.845,0.850238,0.845,0.846242
10,0.157,0.507991,0.8517,0.85713,0.8517,0.852682


[I 2025-01-07 01:49:39,681] Trial 115 finished with value: 0.8655737625501635 and parameters: {'learning_rate': 0.0004963807424292574, 'weight_decay': 0.008, 'adam_beta1': 0.91}. Best is trial 43 with value: 0.8734249573321554.


Trial 116 with params: {'learning_rate': 0.0004742032194648361, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4774,1.002081,0.6328,0.644258,0.6328,0.632684
2,0.9102,0.720287,0.746,0.749287,0.746,0.746166
3,0.6886,0.670612,0.7679,0.77866,0.7679,0.766967
4,0.5748,0.562095,0.8086,0.810384,0.8086,0.806697
5,0.4785,0.545969,0.8151,0.822396,0.8151,0.815168
6,0.3987,0.5171,0.8229,0.836807,0.8229,0.824455
7,0.3272,0.533114,0.8292,0.83479,0.8292,0.828167
8,0.2644,0.517907,0.834,0.835867,0.834,0.83226
9,0.2077,0.483381,0.849,0.851318,0.849,0.849532
10,0.1586,0.501144,0.8542,0.858107,0.8542,0.854797


[I 2025-01-07 02:32:38,287] Trial 116 finished with value: 0.8666231450272142 and parameters: {'learning_rate': 0.0004742032194648361, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.92}. Best is trial 43 with value: 0.8734249573321554.


Trial 117 with params: {'learning_rate': 0.00016769203705822665, 'weight_decay': 0.008, 'adam_beta1': 0.92}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6556,1.142616,0.5826,0.590841,0.5826,0.584549
2,1.0333,0.819238,0.7122,0.717916,0.7122,0.712892
3,0.7374,0.644444,0.7788,0.784087,0.7788,0.779574
4,0.5644,0.590818,0.7983,0.800921,0.7983,0.796929
5,0.4406,0.571379,0.8004,0.81036,0.8004,0.802439
6,0.3356,0.602637,0.8019,0.827367,0.8019,0.804206


[I 2025-01-07 02:45:25,489] Trial 117 pruned. 


Trial 118 with params: {'learning_rate': 0.00045039424742782536, 'weight_decay': 0.0, 'adam_beta1': 0.9500000000000001}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4709,1.01819,0.6282,0.628457,0.6282,0.626227
2,0.9275,0.786155,0.7183,0.724879,0.7183,0.716573
3,0.7026,0.661289,0.7673,0.777702,0.7673,0.765997
4,0.5689,0.574138,0.804,0.802679,0.804,0.800934
5,0.4778,0.554265,0.8102,0.818739,0.8102,0.8117
6,0.3974,0.523596,0.8262,0.836739,0.8262,0.826352
7,0.327,0.552846,0.8263,0.83605,0.8263,0.825359
8,0.2617,0.591155,0.8207,0.826989,0.8207,0.818889
9,0.2004,0.509233,0.8404,0.849822,0.8404,0.842428
10,0.1519,0.499016,0.8551,0.857505,0.8551,0.855394


In [None]:
print(best_trial.hyperparameters)

## Definice destilačního tréninku

Třída, která upravuje hugging face trenéra pro destilaci znalostí. Nově pracuje s logity uloženými v datasetu.

In [23]:
class ImageDistilTrainer(Trainer):
    def __init__(self, model_init, *args, **kwargs):
        self.model_init = model_init
        self.loss_function = nn.KLDivLoss(reduction="batchmean")
       
        super().__init__(model_init=model_init, *args, **kwargs)
        
        self.student = self.model_init()
        self.temperature = self.args.temperature
        self.lambda_param = self.args.lambda_param

        device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        self.student.to(device)


    def compute_loss(self, model, inputs, return_outputs=False, num_items_in_batch=None):
        logits = inputs.pop("logits")

        student_output = model(**inputs)
        self.lambda_param = self.args.lambda_param
        self.temperature = self.args.temperature
        
        soft_teacher = F.softmax(logits / self.temperature, dim=-1)
        soft_student = F.log_softmax(student_output.logits / self.temperature, dim=-1)

        distillation_loss = self.loss_function(soft_student, soft_teacher) * (self.temperature ** 2)

        student_target_loss = student_output.loss

        loss = ((1. - self.lambda_param) * student_target_loss + self.lambda_param * distillation_loss)
        return (loss, student_output) if return_outputs else loss

### Trénink náhodně inicializovaného modelu s pomocí destilace znalostí

In [24]:
reset_seed(42)

In [25]:
training_args = get_training_args("./results/cifar10-random-KD", './logs/cifar10-random-KD', False)

In [26]:
def hp_space(trial):
    params = {
        "learning_rate": trial.suggest_float("learning_rate", 1e-6, 5e-4, log=True),
        "weight_decay": trial.suggest_float("weight_decay", 0, 1e-2, step=1e-3),
        "adam_beta1" : trial.suggest_float("adam_beta1", 0.9, 0.99, step=0.01),
        "lambda_param": trial.suggest_float("lambda_param",0,1,step=.1),
        "temperature": trial.suggest_float("temperature", 2,7, step=.5)
    }
    print(f"Trial {trial.number} with params: {params}")
    return params




In [27]:
trainer = ImageDistilTrainer(
    args=training_args,
    train_dataset=train,
    eval_dataset=test,
    compute_metrics=compute_metrics,
    model_init=get_random_init_mobilenet,
)

In [28]:
pruner = optuna.pruners.HyperbandPruner(min_resource=min_r, max_resource=max_r, reduction_factor=2, bootstrap_count=2)
sampler = optuna.samplers.TPESampler(seed=42, multivariate=True)



In [29]:
best_trial = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=hp_space,
    compute_objective=lambda metrics: metrics["eval_f1"],
    pruner=pruner,
    sampler=sampler,
    n_trials=100,
    study_name = "Distilation hp search"
)

[I 2025-01-10 23:15:42,747] A new study created in memory with name: Distilation hp search


Trial 0 with params: {'learning_rate': 1.0253509690168497e-05, 'weight_decay': 0.01, 'adam_beta1': 0.97, 'lambda_param': 0.6000000000000001, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7886,1.702865,0.2751,0.272882,0.2751,0.252327
2,1.564,1.732621,0.3477,0.348024,0.3477,0.335026
3,1.4649,1.731783,0.3853,0.386552,0.3853,0.376806
4,1.3975,1.757175,0.42,0.41732,0.42,0.410095
5,1.3473,1.746583,0.4365,0.436226,0.4365,0.428679
6,1.2997,1.751526,0.4599,0.454141,0.4599,0.448627
7,1.2635,1.736288,0.4755,0.473125,0.4755,0.468188
8,1.2326,1.775258,0.4694,0.462602,0.4694,0.455094


[I 2025-01-10 23:42:54,473] Trial 0 pruned. 


Trial 1 with params: {'learning_rate': 2.6364803038431666e-06, 'weight_decay': 0.0, 'adam_beta1': 0.98, 'lambda_param': 0.6000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6176,1.461965,0.1547,0.140296,0.1547,0.105169
2,1.5752,1.444535,0.202,0.195464,0.202,0.161025


[I 2025-01-10 23:49:50,228] Trial 1 pruned. 


Trial 2 with params: {'learning_rate': 1.1364672700011182e-06, 'weight_decay': 0.01, 'adam_beta1': 0.98, 'lambda_param': 0.2, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2282,2.060961,0.1057,0.082175,0.1057,0.031778
2,2.2009,2.040057,0.1376,0.107847,0.1376,0.086882
3,2.1717,2.027902,0.1638,0.128634,0.1638,0.115614
4,2.1477,1.997504,0.1874,0.167167,0.1874,0.132427
5,2.1303,1.983586,0.1936,0.17869,0.1936,0.145234
6,2.1016,1.965677,0.214,0.204685,0.214,0.168797
7,2.0732,1.936366,0.234,0.222323,0.234,0.194816
8,2.0465,1.923746,0.2391,0.223052,0.2391,0.204124


[I 2025-01-11 00:17:06,678] Trial 2 pruned. 


Trial 3 with params: {'learning_rate': 3.1261029103110603e-06, 'weight_decay': 0.003, 'adam_beta1': 0.9500000000000001, 'lambda_param': 0.4, 'temperature': 3.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9669,1.803293,0.1641,0.18426,0.1641,0.11652
2,1.9,1.759229,0.2229,0.202938,0.2229,0.183592
3,1.7929,1.710603,0.2693,0.266797,0.2693,0.236143
4,1.7004,1.707074,0.2912,0.281414,0.2912,0.267588


[I 2025-01-11 00:30:41,956] Trial 3 pruned. 


Trial 4 with params: {'learning_rate': 4.480975918214949e-05, 'weight_decay': 0.001, 'adam_beta1': 0.92, 'lambda_param': 0.4, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6197,1.558647,0.4243,0.420783,0.4243,0.414943
2,1.3161,1.476809,0.5173,0.514492,0.5173,0.512235


[I 2025-01-11 00:37:29,040] Trial 4 pruned. 


Trial 5 with params: {'learning_rate': 0.00013157287601765647, 'weight_decay': 0.002, 'adam_beta1': 0.9500000000000001, 'lambda_param': 0.6000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3899,1.718354,0.5441,0.5437,0.5441,0.536819
2,0.9334,1.701452,0.6719,0.673959,0.6719,0.668729
3,0.7102,1.678886,0.7391,0.742763,0.7391,0.737488
4,0.571,1.730081,0.7715,0.770222,0.7715,0.76853


[I 2025-01-11 00:51:07,503] Trial 5 pruned. 


Trial 6 with params: {'learning_rate': 4.3625993625605605e-05, 'weight_decay': 0.001, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8421,0.988685,0.4108,0.415742,0.4108,0.399539
2,0.6334,1.066968,0.5072,0.508127,0.5072,0.499223
3,0.5156,1.124608,0.5765,0.57863,0.5765,0.568148
4,0.4448,1.203409,0.6245,0.61852,0.6245,0.616502
5,0.3944,1.190698,0.648,0.6583,0.648,0.647538
6,0.3545,1.262013,0.6657,0.675996,0.6657,0.661127
7,0.3226,1.261955,0.6881,0.699198,0.6881,0.68678
8,0.2954,1.276771,0.6907,0.697822,0.6907,0.686017
9,0.2723,1.358591,0.7066,0.719824,0.7066,0.707956
10,0.2508,1.3013,0.7179,0.726295,0.7179,0.718496


[I 2025-01-11 01:42:37,301] Trial 6 finished with value: 0.716753954361114 and parameters: {'learning_rate': 4.3625993625605605e-05, 'weight_decay': 0.001, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 6 with value: 0.716753954361114.


Trial 7 with params: {'learning_rate': 0.00015199881220083957, 'weight_decay': 0.003, 'adam_beta1': 0.9, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0829,1.422456,0.5784,0.578594,0.5784,0.568207
2,0.6813,1.362785,0.6995,0.709137,0.6995,0.699944
3,0.5165,1.406807,0.759,0.769244,0.759,0.759764
4,0.4152,1.443983,0.7991,0.799926,0.7991,0.79757
5,0.3441,1.409517,0.802,0.816431,0.802,0.804768
6,0.2847,1.491206,0.8055,0.822915,0.8055,0.805115
7,0.2374,1.455112,0.8266,0.829447,0.8266,0.826007
8,0.2002,1.429668,0.8155,0.818673,0.8155,0.812998
9,0.1699,1.470028,0.828,0.832663,0.828,0.828802
10,0.1496,1.457331,0.8292,0.837223,0.8292,0.829864


[I 2025-01-11 02:33:55,765] Trial 7 finished with value: 0.8373784588057148 and parameters: {'learning_rate': 0.00015199881220083957, 'weight_decay': 0.003, 'adam_beta1': 0.9, 'lambda_param': 0.7000000000000001, 'temperature': 4.0}. Best is trial 7 with value: 0.8373784588057148.


Trial 8 with params: {'learning_rate': 2.1348999901951977e-06, 'weight_decay': 0.005, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2698,1.12955,0.1329,0.121772,0.1329,0.082015
2,1.235,1.132201,0.194,0.179378,0.194,0.149812
3,1.1976,1.155825,0.2257,0.20829,0.2257,0.185728
4,1.1548,1.206052,0.2538,0.240586,0.2538,0.213346


[I 2025-01-11 02:47:33,225] Trial 8 pruned. 


Trial 9 with params: {'learning_rate': 6.139426050898147e-05, 'weight_decay': 0.003, 'adam_beta1': 0.9500000000000001, 'lambda_param': 0.6000000000000001, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4256,1.559006,0.4493,0.446032,0.4493,0.443278
2,1.1253,1.533063,0.5383,0.544468,0.5383,0.532989
3,0.9181,1.493714,0.6188,0.620172,0.6188,0.615088
4,0.7768,1.489991,0.6716,0.668658,0.6716,0.666244
5,0.6795,1.473513,0.6874,0.698045,0.6874,0.688347
6,0.5954,1.484064,0.7155,0.724356,0.7155,0.713397
7,0.5284,1.512276,0.7231,0.733713,0.7231,0.721414
8,0.4721,1.568946,0.72,0.729592,0.72,0.714572


[I 2025-01-11 03:14:53,276] Trial 9 pruned. 


Trial 10 with params: {'learning_rate': 0.0003145780170753732, 'weight_decay': 0.004, 'adam_beta1': 0.9, 'lambda_param': 0.5, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1237,1.311288,0.6503,0.653711,0.6503,0.649085
2,0.6868,1.221903,0.7617,0.764621,0.7617,0.760362
3,0.5275,1.252133,0.7907,0.797297,0.7907,0.789595
4,0.4416,1.161419,0.8131,0.818369,0.8131,0.811953
5,0.3754,1.185659,0.8235,0.835293,0.8235,0.825517
6,0.3115,1.184404,0.8269,0.843685,0.8269,0.82814
7,0.2628,1.189402,0.8373,0.843887,0.8373,0.837167
8,0.2221,1.195347,0.8432,0.845885,0.8432,0.842378


[I 2025-01-11 03:42:18,186] Trial 10 pruned. 


Trial 11 with params: {'learning_rate': 5.068963931664152e-05, 'weight_decay': 0.001, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8243,0.990263,0.4161,0.420404,0.4161,0.404493
2,0.6103,1.097165,0.5163,0.523295,0.5163,0.506319
3,0.4973,1.132581,0.5914,0.600944,0.5914,0.583704
4,0.4236,1.236222,0.6431,0.64404,0.6431,0.63824
5,0.3693,1.208286,0.6637,0.679152,0.6637,0.662686
6,0.3273,1.243094,0.6892,0.696202,0.6892,0.684186
7,0.2927,1.282907,0.7101,0.720073,0.7101,0.709275
8,0.2646,1.301071,0.7045,0.711767,0.7045,0.699778
9,0.2388,1.36939,0.7284,0.737829,0.7284,0.729318
10,0.2166,1.316465,0.7348,0.741131,0.7348,0.734529


[I 2025-01-11 04:33:09,224] Trial 11 finished with value: 0.7301273590912186 and parameters: {'learning_rate': 5.068963931664152e-05, 'weight_decay': 0.001, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 7 with value: 0.8373784588057148.


Trial 12 with params: {'learning_rate': 0.0003184805948017843, 'weight_decay': 0.005, 'adam_beta1': 0.9, 'lambda_param': 0.7000000000000001, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1269,1.782518,0.6456,0.644614,0.6456,0.639541
2,0.6784,1.765137,0.7581,0.758229,0.7581,0.757148
3,0.5049,1.83463,0.7887,0.795524,0.7887,0.787921
4,0.4093,1.776625,0.8119,0.817004,0.8119,0.810212


[I 2025-01-11 04:46:43,908] Trial 12 pruned. 


Trial 13 with params: {'learning_rate': 8.787488401186728e-05, 'weight_decay': 0.002, 'adam_beta1': 0.9, 'lambda_param': 0.9, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.989,1.356385,0.4835,0.487195,0.4835,0.471346
2,0.6727,1.391599,0.628,0.62665,0.628,0.624568
3,0.5195,1.432552,0.6898,0.690251,0.6898,0.686758
4,0.4263,1.53613,0.7333,0.738533,0.7333,0.731936


[I 2025-01-11 05:00:21,374] Trial 13 pruned. 


Trial 14 with params: {'learning_rate': 1.0415089321753806e-05, 'weight_decay': 0.006, 'adam_beta1': 0.93, 'lambda_param': 0.8, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2385,1.181042,0.285,0.287795,0.285,0.266046
2,1.1163,1.205641,0.3441,0.351066,0.3441,0.325646


[I 2025-01-11 05:07:06,808] Trial 14 pruned. 


Trial 15 with params: {'learning_rate': 0.0002852725666675145, 'weight_decay': 0.0, 'adam_beta1': 0.9400000000000001, 'lambda_param': 0.6000000000000001, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0413,1.28395,0.6317,0.630216,0.6317,0.626811
2,0.6409,1.233595,0.7487,0.754463,0.7487,0.74984
3,0.4905,1.267297,0.7829,0.790874,0.7829,0.781609
4,0.4016,1.20492,0.818,0.820461,0.818,0.81683
5,0.3403,1.214085,0.8238,0.832285,0.8238,0.825097
6,0.2873,1.245446,0.8217,0.839079,0.8217,0.822717
7,0.2413,1.251313,0.8364,0.840486,0.8364,0.834936
8,0.203,1.23888,0.8377,0.841076,0.8377,0.835963


[I 2025-01-11 05:34:20,953] Trial 15 pruned. 


Trial 16 with params: {'learning_rate': 0.0002187116642503092, 'weight_decay': 0.01, 'adam_beta1': 0.92, 'lambda_param': 0.4, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2971,1.335056,0.6248,0.625254,0.6248,0.622233
2,0.8165,1.219661,0.723,0.728015,0.723,0.722543
3,0.6266,1.188132,0.7708,0.781029,0.7708,0.770099
4,0.5126,1.151904,0.8056,0.806604,0.8056,0.804274


[I 2025-01-11 05:47:46,286] Trial 16 pruned. 


Trial 17 with params: {'learning_rate': 6.214481745658709e-06, 'weight_decay': 0.001, 'adam_beta1': 0.9, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3119,1.191361,0.2294,0.214232,0.2294,0.200955
2,1.2033,1.234918,0.2881,0.293524,0.2881,0.260904
3,1.145,1.243268,0.3329,0.334158,0.3329,0.317676
4,1.1106,1.258648,0.3486,0.352409,0.3486,0.333975
5,1.0868,1.268524,0.3655,0.366542,0.3655,0.350567
6,1.0614,1.270624,0.3884,0.381847,0.3884,0.374138
7,1.0413,1.271834,0.4017,0.397293,0.4017,0.389834
8,1.0227,1.294009,0.4003,0.398667,0.4003,0.380884


[I 2025-01-11 06:15:17,959] Trial 17 pruned. 


Trial 18 with params: {'learning_rate': 0.0003091050760493089, 'weight_decay': 0.0, 'adam_beta1': 0.91, 'lambda_param': 0.30000000000000004, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3821,1.455485,0.6448,0.645052,0.6448,0.64096
2,0.8551,1.331617,0.7637,0.763144,0.7637,0.761985
3,0.6524,1.312134,0.7941,0.799682,0.7941,0.793799
4,0.5303,1.237548,0.8124,0.820008,0.8124,0.811073
5,0.4474,1.251306,0.8237,0.83791,0.8237,0.826107
6,0.3693,1.269194,0.83,0.850854,0.83,0.832114
7,0.3092,1.220075,0.8431,0.846144,0.8431,0.842157
8,0.2549,1.23347,0.84,0.843833,0.84,0.837915


[I 2025-01-11 06:42:24,613] Trial 18 pruned. 


Trial 19 with params: {'learning_rate': 0.0002947545082361056, 'weight_decay': 0.002, 'adam_beta1': 0.9500000000000001, 'lambda_param': 1.0, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7155,1.524979,0.6149,0.608866,0.6149,0.607087
2,0.424,1.435207,0.7364,0.740865,0.7364,0.736614
3,0.3115,1.59425,0.781,0.79263,0.781,0.778794
4,0.2433,1.576003,0.816,0.818705,0.816,0.815122
5,0.1987,1.609864,0.8177,0.83187,0.8177,0.820314
6,0.1638,1.655583,0.8258,0.844658,0.8258,0.827353
7,0.1336,1.704596,0.8471,0.850965,0.8471,0.845868
8,0.1108,1.639132,0.8402,0.843759,0.8402,0.838149


[I 2025-01-11 07:09:18,343] Trial 19 pruned. 


Trial 20 with params: {'learning_rate': 5.405290206151555e-05, 'weight_decay': 0.008, 'adam_beta1': 0.98, 'lambda_param': 0.2, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7944,1.574666,0.4293,0.431993,0.4293,0.420695
2,1.4265,1.415015,0.5297,0.529846,0.5297,0.527584


[I 2025-01-11 07:16:07,404] Trial 20 pruned. 


Trial 21 with params: {'learning_rate': 2.2633022690645107e-05, 'weight_decay': 0.0, 'adam_beta1': 0.91, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.886,0.946618,0.3427,0.354704,0.3427,0.332316
2,0.7672,0.992865,0.4138,0.42179,0.4138,0.406242
3,0.6607,1.029289,0.4652,0.459924,0.4652,0.453186
4,0.5878,1.086193,0.5124,0.507281,0.5124,0.49936


[I 2025-01-11 07:29:41,604] Trial 21 pruned. 


Trial 22 with params: {'learning_rate': 6.933432092218376e-05, 'weight_decay': 0.004, 'adam_beta1': 0.9, 'lambda_param': 0.9, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9172,1.10726,0.46,0.464641,0.46,0.449178
2,0.6601,1.161521,0.5793,0.585803,0.5793,0.575174


[I 2025-01-11 07:36:24,259] Trial 22 pruned. 


Trial 23 with params: {'learning_rate': 3.9470832997428617e-05, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3721,1.379144,0.4066,0.400644,0.4066,0.395556
2,1.1381,1.358323,0.4923,0.494324,0.4923,0.488143
3,0.9899,1.331263,0.5476,0.54646,0.5476,0.53895
4,0.8747,1.325292,0.5935,0.585784,0.5935,0.584996
5,0.7883,1.299066,0.6173,0.627335,0.6173,0.614917
6,0.7158,1.307998,0.6516,0.654654,0.6516,0.647616
7,0.6579,1.297888,0.6663,0.672733,0.6663,0.665435
8,0.6099,1.356095,0.6735,0.677372,0.6735,0.666561


[I 2025-01-11 08:03:52,827] Trial 23 pruned. 


Trial 24 with params: {'learning_rate': 0.0003793850102112891, 'weight_decay': 0.001, 'adam_beta1': 0.91, 'lambda_param': 0.9, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6916,1.278946,0.6492,0.647749,0.6492,0.645275
2,0.4165,1.360292,0.7621,0.765113,0.7621,0.761884
3,0.3177,1.374305,0.7854,0.798105,0.7854,0.784177
4,0.2589,1.423625,0.8159,0.819163,0.8159,0.814824
5,0.2195,1.404794,0.8272,0.837718,0.8272,0.829139
6,0.1842,1.416215,0.8301,0.843833,0.8301,0.830604
7,0.1575,1.461539,0.8402,0.843364,0.8402,0.83885
8,0.1333,1.417961,0.8423,0.846273,0.8423,0.840867
9,0.1138,1.397598,0.86,0.861931,0.86,0.860237
10,0.0982,1.476766,0.8553,0.860966,0.8553,0.856152


[I 2025-01-11 08:54:49,191] Trial 24 finished with value: 0.8671829944838988 and parameters: {'learning_rate': 0.0003793850102112891, 'weight_decay': 0.001, 'adam_beta1': 0.91, 'lambda_param': 0.9, 'temperature': 6.5}. Best is trial 24 with value: 0.8671829944838988.


Trial 25 with params: {'learning_rate': 0.0004228288513803348, 'weight_decay': 0.001, 'adam_beta1': 0.9, 'lambda_param': 0.9, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7024,1.256235,0.6424,0.639874,0.6424,0.634701
2,0.423,1.292933,0.7521,0.755702,0.7521,0.750576
3,0.3196,1.367278,0.7844,0.795126,0.7844,0.781652
4,0.2631,1.393158,0.8196,0.82517,0.8196,0.81908
5,0.2225,1.411397,0.8297,0.838677,0.8297,0.831349
6,0.189,1.432035,0.8296,0.845622,0.8296,0.830122
7,0.16,1.412469,0.8457,0.849717,0.8457,0.844494
8,0.1354,1.436679,0.846,0.852594,0.846,0.84506
9,0.1152,1.414883,0.8594,0.862725,0.8594,0.859981
10,0.099,1.443976,0.8657,0.867906,0.8657,0.865846


[I 2025-01-11 09:45:49,132] Trial 25 finished with value: 0.8738118138794979 and parameters: {'learning_rate': 0.0004228288513803348, 'weight_decay': 0.001, 'adam_beta1': 0.9, 'lambda_param': 0.9, 'temperature': 6.5}. Best is trial 25 with value: 0.8738118138794979.


Trial 26 with params: {'learning_rate': 0.00043281328928984495, 'weight_decay': 0.002, 'adam_beta1': 0.9, 'lambda_param': 0.8, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7992,1.253574,0.6525,0.648017,0.6525,0.64667
2,0.4866,1.316876,0.7567,0.755435,0.7567,0.754533
3,0.3735,1.349306,0.7888,0.798435,0.7888,0.78831
4,0.3131,1.425681,0.8065,0.810942,0.8065,0.804619


[I 2025-01-11 09:59:47,928] Trial 26 pruned. 


Trial 27 with params: {'learning_rate': 0.00037534084529860086, 'weight_decay': 0.001, 'adam_beta1': 0.93, 'lambda_param': 1.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6034,1.284188,0.6387,0.639859,0.6387,0.631366
2,0.3538,1.362511,0.7584,0.761668,0.7584,0.758155
3,0.263,1.41278,0.7841,0.792442,0.7841,0.782854
4,0.2135,1.400066,0.8143,0.818748,0.8143,0.813071
5,0.1774,1.409125,0.8235,0.83568,0.8235,0.825741
6,0.1474,1.459471,0.8255,0.841835,0.8255,0.826188
7,0.123,1.450803,0.8414,0.844553,0.8414,0.840299
8,0.1011,1.465175,0.8386,0.840102,0.8386,0.836257


[I 2025-01-11 10:27:50,576] Trial 27 pruned. 


Trial 28 with params: {'learning_rate': 9.219233712970501e-05, 'weight_decay': 0.005, 'adam_beta1': 0.92, 'lambda_param': 0.8, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0851,1.38016,0.4981,0.497638,0.4981,0.492101
2,0.7433,1.405729,0.6397,0.639183,0.6397,0.635763


[I 2025-01-11 10:34:53,587] Trial 28 pruned. 


Trial 29 with params: {'learning_rate': 1.0849602255221522e-06, 'weight_decay': 0.006, 'adam_beta1': 0.96, 'lambda_param': 0.2, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.1797,2.011833,0.1034,0.078039,0.1034,0.027814
2,2.154,1.99178,0.1312,0.124741,0.1312,0.075844


[I 2025-01-11 10:41:54,932] Trial 29 pruned. 


Trial 30 with params: {'learning_rate': 0.00021216455415222076, 'weight_decay': 0.004, 'adam_beta1': 0.9, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3135,1.370996,0.6083,0.611466,0.6083,0.601512
2,0.819,1.265506,0.7398,0.7414,0.7398,0.739208
3,0.6215,1.243058,0.7841,0.792093,0.7841,0.784581
4,0.5059,1.16164,0.8125,0.819319,0.8125,0.812381
5,0.425,1.182377,0.8204,0.831541,0.8204,0.822571
6,0.3499,1.220746,0.8167,0.838126,0.8167,0.818091
7,0.291,1.198037,0.8353,0.83856,0.8353,0.83416
8,0.2428,1.195996,0.8336,0.836788,0.8336,0.831667
9,0.2047,1.180772,0.8459,0.849816,0.8459,0.846718
10,0.1796,1.16846,0.8473,0.850702,0.8473,0.847019


[I 2025-01-11 11:34:01,772] Trial 30 finished with value: 0.8496167270684216 and parameters: {'learning_rate': 0.00021216455415222076, 'weight_decay': 0.004, 'adam_beta1': 0.9, 'lambda_param': 0.4, 'temperature': 4.0}. Best is trial 25 with value: 0.8738118138794979.


Trial 31 with params: {'learning_rate': 0.00027098418702840296, 'weight_decay': 0.004, 'adam_beta1': 0.9, 'lambda_param': 0.2, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.418,1.229995,0.6477,0.64742,0.6477,0.642154
2,0.8765,1.058219,0.7506,0.752179,0.7506,0.750067
3,0.6721,1.014047,0.7884,0.793075,0.7884,0.787863
4,0.5506,0.9534,0.8088,0.811804,0.8088,0.807575
5,0.4574,0.921113,0.8245,0.835676,0.8245,0.826792
6,0.375,0.980552,0.8143,0.835954,0.8143,0.815514
7,0.3057,0.940524,0.8345,0.840773,0.8345,0.834334
8,0.2467,0.922114,0.8423,0.844498,0.8423,0.841316
9,0.2033,0.910348,0.8437,0.848333,0.8437,0.844511
10,0.172,0.895297,0.8514,0.854914,0.8514,0.852074


[I 2025-01-11 12:26:04,330] Trial 31 finished with value: 0.8513744443875344 and parameters: {'learning_rate': 0.00027098418702840296, 'weight_decay': 0.004, 'adam_beta1': 0.9, 'lambda_param': 0.2, 'temperature': 4.5}. Best is trial 25 with value: 0.8738118138794979.


Trial 32 with params: {'learning_rate': 0.0003799395139850002, 'weight_decay': 0.005, 'adam_beta1': 0.91, 'lambda_param': 0.1, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4512,1.150584,0.6389,0.646061,0.6389,0.638284
2,0.8929,0.919537,0.7545,0.755443,0.7545,0.752587
3,0.6783,0.826226,0.8015,0.807867,0.8015,0.801384
4,0.5557,0.773081,0.8202,0.823981,0.8202,0.819443
5,0.4625,0.76212,0.8247,0.835271,0.8247,0.82672
6,0.3796,0.772061,0.8284,0.84604,0.8284,0.82958
7,0.3087,0.755879,0.8393,0.843393,0.8393,0.837367
8,0.2484,0.750461,0.8462,0.849439,0.8462,0.844791
9,0.1952,0.712601,0.8557,0.859704,0.8557,0.856292
10,0.1605,0.710891,0.8556,0.85969,0.8556,0.855718


[I 2025-01-11 13:18:27,869] Trial 32 finished with value: 0.8628454075424614 and parameters: {'learning_rate': 0.0003799395139850002, 'weight_decay': 0.005, 'adam_beta1': 0.91, 'lambda_param': 0.1, 'temperature': 5.0}. Best is trial 25 with value: 0.8738118138794979.


Trial 33 with params: {'learning_rate': 0.0004990812311355335, 'weight_decay': 0.005, 'adam_beta1': 0.9, 'lambda_param': 0.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4798,1.009539,0.6331,0.6392,0.6331,0.633066
2,0.9211,0.760648,0.7316,0.736677,0.7316,0.730026
3,0.695,0.630021,0.7795,0.788684,0.7795,0.781613
4,0.5557,0.574791,0.8086,0.810611,0.8086,0.806002


[I 2025-01-11 13:32:32,202] Trial 33 pruned. 


Trial 34 with params: {'learning_rate': 9.29708981207645e-05, 'weight_decay': 0.005, 'adam_beta1': 0.93, 'lambda_param': 0.1, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7459,1.426636,0.5012,0.502762,0.5012,0.495378
2,1.2413,1.17561,0.6362,0.641351,0.6362,0.636184
3,0.9789,1.017571,0.704,0.709359,0.704,0.704538
4,0.8039,0.956294,0.7388,0.736865,0.7388,0.735903


[I 2025-01-11 13:46:28,002] Trial 34 pruned. 


Trial 35 with params: {'learning_rate': 4.572836396686645e-05, 'weight_decay': 0.005, 'adam_beta1': 0.9, 'lambda_param': 0.2, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.785,1.570255,0.436,0.432786,0.436,0.43033
2,1.4305,1.417548,0.5295,0.527545,0.5295,0.525478
3,1.2316,1.311829,0.5891,0.589507,0.5891,0.584177
4,1.0792,1.252782,0.6368,0.633439,0.6368,0.632645


[I 2025-01-11 14:00:11,717] Trial 35 pruned. 


Trial 36 with params: {'learning_rate': 0.0003859356054058685, 'weight_decay': 0.0, 'adam_beta1': 0.91, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9065,1.28207,0.6468,0.644033,0.6468,0.642437
2,0.5635,1.262002,0.7636,0.763579,0.7636,0.762046
3,0.4365,1.323433,0.7836,0.793913,0.7836,0.782232
4,0.3637,1.288074,0.8132,0.816479,0.8132,0.811103
5,0.3072,1.302064,0.8268,0.836424,0.8268,0.828561
6,0.2622,1.322766,0.8219,0.843482,0.8219,0.823984
7,0.2248,1.350336,0.8328,0.8394,0.8328,0.831645
8,0.193,1.325045,0.8455,0.848474,0.8455,0.844756
9,0.1652,1.353049,0.8555,0.858503,0.8555,0.855899
10,0.1437,1.299347,0.86,0.86288,0.86,0.860351


[I 2025-01-11 14:54:26,109] Trial 36 finished with value: 0.8709704807575521 and parameters: {'learning_rate': 0.0003859356054058685, 'weight_decay': 0.0, 'adam_beta1': 0.91, 'lambda_param': 0.7000000000000001, 'temperature': 6.5}. Best is trial 25 with value: 0.8738118138794979.


Trial 37 with params: {'learning_rate': 0.00019901387264934772, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6902,1.362734,0.6041,0.599677,0.6041,0.592705
2,0.4063,1.348853,0.7208,0.725907,0.7208,0.720539
3,0.299,1.428963,0.7681,0.776759,0.7681,0.766397
4,0.2363,1.586788,0.8014,0.805311,0.8014,0.799859


[I 2025-01-11 15:08:17,624] Trial 37 pruned. 


Trial 38 with params: {'learning_rate': 0.00023856520750016358, 'weight_decay': 0.0, 'adam_beta1': 0.91, 'lambda_param': 0.6000000000000001, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0587,1.3152,0.6239,0.623183,0.6239,0.616556
2,0.6631,1.258037,0.7361,0.743203,0.7361,0.736224
3,0.5079,1.252994,0.7806,0.786263,0.7806,0.779064
4,0.4169,1.261936,0.8085,0.811596,0.8085,0.806732


[I 2025-01-11 15:22:12,424] Trial 38 pruned. 


Trial 39 with params: {'learning_rate': 0.00045696384172056527, 'weight_decay': 0.0, 'adam_beta1': 0.91, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5716,1.181671,0.6232,0.618828,0.6232,0.609904
2,0.3426,1.265421,0.7483,0.753951,0.7483,0.747398
3,0.2534,1.370171,0.779,0.791682,0.779,0.776577
4,0.2055,1.365504,0.8187,0.822322,0.8187,0.818145
5,0.172,1.394804,0.8242,0.837193,0.8242,0.826127
6,0.1438,1.440999,0.8206,0.842226,0.8206,0.821904
7,0.1211,1.434578,0.841,0.846259,0.841,0.839607
8,0.1008,1.404397,0.8447,0.848553,0.8447,0.843648
9,0.085,1.415862,0.8579,0.861459,0.8579,0.858377
10,0.0703,1.454174,0.8674,0.871149,0.8674,0.86803


[I 2025-01-11 16:14:46,446] Trial 39 finished with value: 0.8776587205453099 and parameters: {'learning_rate': 0.00045696384172056527, 'weight_decay': 0.0, 'adam_beta1': 0.91, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 39 with value: 0.8776587205453099.


Trial 40 with params: {'learning_rate': 0.00046283884311314297, 'weight_decay': 0.003, 'adam_beta1': 0.91, 'lambda_param': 0.9, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6955,1.284767,0.6617,0.663566,0.6617,0.661401
2,0.4193,1.398658,0.7439,0.746216,0.7439,0.741758
3,0.325,1.33933,0.7854,0.792931,0.7854,0.783726
4,0.2692,1.364603,0.8137,0.819035,0.8137,0.813588
5,0.2304,1.426043,0.8275,0.836265,0.8275,0.829261
6,0.1924,1.417498,0.8209,0.848351,0.8209,0.822798
7,0.1655,1.450359,0.8432,0.846801,0.8432,0.841714
8,0.1401,1.405748,0.8432,0.846986,0.8432,0.841676


[I 2025-01-11 16:43:00,887] Trial 40 pruned. 


Trial 41 with params: {'learning_rate': 0.0004492853036662864, 'weight_decay': 0.001, 'adam_beta1': 0.91, 'lambda_param': 0.8, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7894,1.288988,0.6631,0.656524,0.6631,0.658261
2,0.4792,1.349342,0.7707,0.769506,0.7707,0.768591
3,0.3694,1.374668,0.8008,0.80688,0.8008,0.800345
4,0.3075,1.3504,0.8276,0.830897,0.8276,0.82654
5,0.2623,1.348997,0.8271,0.839663,0.8271,0.829198
6,0.2226,1.370739,0.822,0.849394,0.822,0.82359
7,0.1925,1.381156,0.8478,0.852191,0.8478,0.847091
8,0.1645,1.376281,0.8507,0.853034,0.8507,0.849287
9,0.1413,1.381923,0.8612,0.863807,0.8612,0.861553
10,0.1233,1.405284,0.8722,0.875284,0.8722,0.872568


[I 2025-01-11 17:35:06,242] Trial 41 finished with value: 0.8765415304329757 and parameters: {'learning_rate': 0.0004492853036662864, 'weight_decay': 0.001, 'adam_beta1': 0.91, 'lambda_param': 0.8, 'temperature': 6.5}. Best is trial 39 with value: 0.8776587205453099.


Trial 42 with params: {'learning_rate': 0.0002463607584658398, 'weight_decay': 0.0, 'adam_beta1': 0.92, 'lambda_param': 0.9, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7278,1.253034,0.6243,0.63262,0.6243,0.62135
2,0.4379,1.238319,0.7465,0.749025,0.7465,0.745394
3,0.3257,1.316248,0.79,0.798487,0.79,0.789647
4,0.2623,1.345136,0.8163,0.817374,0.8163,0.814526
5,0.2199,1.349074,0.8196,0.830568,0.8196,0.821419
6,0.1819,1.406902,0.8173,0.839249,0.8173,0.818564
7,0.154,1.371623,0.8396,0.84464,0.8396,0.838614
8,0.1298,1.382133,0.8422,0.842194,0.8422,0.840405


[I 2025-01-11 18:01:56,861] Trial 42 pruned. 


Trial 43 with params: {'learning_rate': 0.00046961970042811346, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5877,1.215994,0.6518,0.648491,0.6518,0.648254
2,0.341,1.349505,0.7496,0.75139,0.7496,0.74811
3,0.2559,1.390585,0.7884,0.796989,0.7884,0.785812
4,0.209,1.467788,0.8185,0.82271,0.8185,0.816978
5,0.1751,1.471954,0.8359,0.847928,0.8359,0.838432
6,0.1452,1.518123,0.8291,0.849757,0.8291,0.83087
7,0.1212,1.515039,0.8541,0.857532,0.8541,0.853624
8,0.1009,1.556748,0.8589,0.860569,0.8589,0.858167
9,0.0848,1.49293,0.8677,0.870548,0.8677,0.867616
10,0.0709,1.561019,0.8735,0.875499,0.8735,0.873578


[I 2025-01-11 18:52:28,780] Trial 43 finished with value: 0.8858248729707295 and parameters: {'learning_rate': 0.00046961970042811346, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 6.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 44 with params: {'learning_rate': 0.00025519265205474733, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.98, 'lambda_param': 0.8, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9013,1.257122,0.5707,0.566639,0.5707,0.562672
2,0.554,1.269668,0.7035,0.711764,0.7035,0.701764


[I 2025-01-11 18:59:12,689] Trial 44 pruned. 


Trial 45 with params: {'learning_rate': 0.0003263037810993442, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 0.6000000000000001, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0327,1.334026,0.6375,0.638157,0.6375,0.631509
2,0.6485,1.275128,0.7496,0.754803,0.7496,0.750375
3,0.5024,1.318381,0.7781,0.789375,0.7781,0.776437
4,0.4133,1.297801,0.8199,0.821668,0.8199,0.818518
5,0.3497,1.289945,0.8286,0.841296,0.8286,0.831052
6,0.2958,1.321932,0.8286,0.846048,0.8286,0.829323
7,0.2507,1.310881,0.8479,0.849041,0.8479,0.846796
8,0.2124,1.29381,0.841,0.843695,0.841,0.839662


[I 2025-01-11 19:25:58,616] Trial 45 pruned. 


Trial 46 with params: {'learning_rate': 0.00046060721841702554, 'weight_decay': 0.0, 'adam_beta1': 0.91, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.588,1.243928,0.646,0.653111,0.646,0.643215
2,0.3392,1.397877,0.762,0.760694,0.762,0.759044
3,0.2577,1.355182,0.7886,0.798067,0.7886,0.7862
4,0.2109,1.447465,0.8118,0.817687,0.8118,0.809745


[I 2025-01-11 19:39:24,234] Trial 46 pruned. 


Trial 47 with params: {'learning_rate': 1.595459785536451e-05, 'weight_decay': 0.002, 'adam_beta1': 0.9400000000000001, 'lambda_param': 0.0, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2173,1.797093,0.3255,0.327294,0.3255,0.311943
2,1.8492,1.609195,0.4024,0.400314,0.4024,0.395336
3,1.6852,1.507228,0.4422,0.441575,0.4422,0.434329
4,1.5696,1.420815,0.478,0.471337,0.478,0.471556
5,1.4922,1.382496,0.494,0.495126,0.494,0.487831
6,1.415,1.331288,0.5166,0.510293,0.5166,0.509079
7,1.3578,1.290264,0.5297,0.53309,0.5297,0.528013
8,1.3034,1.274297,0.5435,0.534216,0.5435,0.533715


[I 2025-01-11 20:06:19,216] Trial 47 pruned. 


Trial 48 with params: {'learning_rate': 0.0004625348911823293, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7015,1.238532,0.6451,0.649318,0.6451,0.640392
2,0.4304,1.330368,0.7484,0.748319,0.7484,0.746099
3,0.3299,1.386025,0.7882,0.795579,0.7882,0.786497
4,0.2683,1.436722,0.8163,0.820853,0.8163,0.815899
5,0.2289,1.422635,0.8325,0.845305,0.8325,0.835084
6,0.1929,1.422383,0.8301,0.85185,0.8301,0.831845
7,0.1648,1.530168,0.8497,0.853484,0.8497,0.849078
8,0.1398,1.47831,0.8534,0.855175,0.8534,0.852631
9,0.1189,1.451202,0.861,0.864914,0.861,0.861793
10,0.102,1.45424,0.8659,0.869839,0.8659,0.865782


[I 2025-01-11 20:56:45,972] Trial 48 finished with value: 0.8722818934913524 and parameters: {'learning_rate': 0.0004625348911823293, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 0.9, 'temperature': 6.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 49 with params: {'learning_rate': 0.00035541307282489136, 'weight_decay': 0.002, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6832,1.375139,0.6546,0.655136,0.6546,0.652668
2,0.3853,1.537217,0.7452,0.749158,0.7452,0.744259
3,0.2856,1.640092,0.7857,0.797382,0.7857,0.784355
4,0.2308,1.722376,0.8236,0.825153,0.8236,0.822126
5,0.193,1.618815,0.8209,0.840908,0.8209,0.824038
6,0.1568,1.633891,0.8114,0.845589,0.8114,0.813613
7,0.1304,1.647384,0.8471,0.850539,0.8471,0.846718
8,0.106,1.71509,0.8439,0.846968,0.8439,0.842646
9,0.088,1.647469,0.8599,0.864801,0.8599,0.860777
10,0.073,1.705866,0.8589,0.863402,0.8589,0.859411


[I 2025-01-11 21:47:31,139] Trial 49 finished with value: 0.8652733633674432 and parameters: {'learning_rate': 0.00035541307282489136, 'weight_decay': 0.002, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 4.5}. Best is trial 43 with value: 0.8858248729707295.


Trial 50 with params: {'learning_rate': 1.4906411453668724e-05, 'weight_decay': 0.006, 'adam_beta1': 0.99, 'lambda_param': 0.4, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9229,1.819587,0.3027,0.304969,0.3027,0.283734
2,1.6639,1.762099,0.3716,0.368427,0.3716,0.359411
3,1.5466,1.729057,0.4214,0.418347,0.4214,0.415116
4,1.4549,1.729645,0.4535,0.448427,0.4535,0.444096


[I 2025-01-11 22:01:03,774] Trial 50 pruned. 


Trial 51 with params: {'learning_rate': 0.0004531114112911217, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5725,1.144344,0.6449,0.641533,0.6449,0.639098
2,0.3373,1.262625,0.7615,0.762877,0.7615,0.759994
3,0.2527,1.313183,0.7833,0.791651,0.7833,0.78238
4,0.2062,1.443748,0.8264,0.828571,0.8264,0.824991
5,0.1726,1.399581,0.8238,0.838107,0.8238,0.825815
6,0.1445,1.427893,0.832,0.848881,0.832,0.832978
7,0.1218,1.430148,0.8503,0.854062,0.8503,0.849392
8,0.1006,1.42511,0.8498,0.851575,0.8498,0.848346
9,0.0845,1.422237,0.8594,0.862478,0.8594,0.85987
10,0.0705,1.449125,0.8708,0.872797,0.8708,0.870968


[I 2025-01-11 22:51:59,998] Trial 51 finished with value: 0.8743060530203361 and parameters: {'learning_rate': 0.0004531114112911217, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 52 with params: {'learning_rate': 0.0003831497817546985, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7029,1.261312,0.6501,0.649399,0.6501,0.647205
2,0.4198,1.315597,0.7451,0.746442,0.7451,0.74413
3,0.315,1.347724,0.7951,0.804109,0.7951,0.794107
4,0.258,1.396214,0.8157,0.818347,0.8157,0.814516
5,0.2172,1.410561,0.8325,0.839975,0.8325,0.833942
6,0.1839,1.429918,0.8247,0.846385,0.8247,0.825525
7,0.1565,1.42642,0.8493,0.853378,0.8493,0.848905
8,0.1315,1.446174,0.8477,0.849882,0.8477,0.845879
9,0.1128,1.41973,0.8619,0.86492,0.8619,0.862228
10,0.0967,1.464174,0.8621,0.86575,0.8621,0.86208


[I 2025-01-11 23:42:39,680] Trial 52 finished with value: 0.872400461284441 and parameters: {'learning_rate': 0.0003831497817546985, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 0.9, 'temperature': 6.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 53 with params: {'learning_rate': 0.00045056610967625073, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5664,1.187231,0.6505,0.657811,0.6505,0.649783
2,0.3278,1.306952,0.758,0.762057,0.758,0.757132
3,0.2464,1.322633,0.7883,0.799035,0.7883,0.786175
4,0.2022,1.402987,0.8175,0.821891,0.8175,0.815779
5,0.1689,1.391924,0.8267,0.839976,0.8267,0.829091
6,0.1408,1.388519,0.8254,0.843675,0.8254,0.82563
7,0.1186,1.432646,0.8517,0.855095,0.8517,0.851612
8,0.0986,1.419662,0.8493,0.853423,0.8493,0.847043
9,0.0834,1.41943,0.8623,0.865443,0.8623,0.862749
10,0.0697,1.510705,0.87,0.872533,0.87,0.869931


[I 2025-01-12 00:33:40,285] Trial 53 finished with value: 0.8751704899821069 and parameters: {'learning_rate': 0.00045056610967625073, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 54 with params: {'learning_rate': 0.0002478748406512204, 'weight_decay': 0.001, 'adam_beta1': 0.9, 'lambda_param': 0.9, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7152,1.245915,0.6377,0.634879,0.6377,0.632066
2,0.4216,1.260237,0.7406,0.745826,0.7406,0.740171
3,0.3158,1.324512,0.7739,0.785218,0.7739,0.772164
4,0.2569,1.366827,0.8074,0.812111,0.8074,0.80644
5,0.2164,1.36514,0.8207,0.835425,0.8207,0.823268
6,0.1792,1.388149,0.8182,0.838016,0.8182,0.819217
7,0.1516,1.412604,0.8385,0.841388,0.8385,0.837079
8,0.1268,1.369675,0.8361,0.840232,0.8361,0.83455
9,0.1087,1.426545,0.8474,0.850248,0.8474,0.847944
10,0.0941,1.406598,0.8484,0.853185,0.8484,0.848647


[I 2025-01-12 01:25:18,787] Trial 54 finished with value: 0.856482541042779 and parameters: {'learning_rate': 0.0002478748406512204, 'weight_decay': 0.001, 'adam_beta1': 0.9, 'lambda_param': 0.9, 'temperature': 7.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 55 with params: {'learning_rate': 0.0002777061852751291, 'weight_decay': 0.001, 'adam_beta1': 0.9, 'lambda_param': 0.9, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7154,1.265537,0.6432,0.644334,0.6432,0.641572
2,0.4226,1.233998,0.7416,0.74086,0.7416,0.740116
3,0.3168,1.324529,0.7884,0.796119,0.7884,0.787129
4,0.2568,1.349289,0.822,0.826216,0.822,0.821448
5,0.2118,1.351048,0.8158,0.837939,0.8158,0.820295
6,0.1777,1.411008,0.8155,0.843515,0.8155,0.815654
7,0.1502,1.362862,0.8436,0.849623,0.8436,0.843003
8,0.1254,1.380464,0.8452,0.849028,0.8452,0.844074
9,0.107,1.386908,0.8525,0.856553,0.8525,0.852675
10,0.0928,1.381501,0.8615,0.863874,0.8615,0.861856


[I 2025-01-12 02:15:50,711] Trial 55 finished with value: 0.8683141389636606 and parameters: {'learning_rate': 0.0002777061852751291, 'weight_decay': 0.001, 'adam_beta1': 0.9, 'lambda_param': 0.9, 'temperature': 7.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 56 with params: {'learning_rate': 0.00022602735076646376, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6203,1.183349,0.6213,0.622412,0.6213,0.612671
2,0.3605,1.287261,0.7385,0.742721,0.7385,0.737857
3,0.259,1.354306,0.7841,0.791745,0.7841,0.782518
4,0.2064,1.397598,0.8062,0.812286,0.8062,0.804988
5,0.1691,1.389987,0.8127,0.829802,0.8127,0.815004
6,0.1367,1.416513,0.8245,0.844825,0.8245,0.825883
7,0.113,1.422673,0.837,0.840116,0.837,0.83596
8,0.0917,1.453497,0.8317,0.838889,0.8317,0.829059
9,0.0765,1.394009,0.8509,0.852518,0.8509,0.850864
10,0.065,1.440511,0.8542,0.85772,0.8542,0.854478


[I 2025-01-12 03:06:12,748] Trial 56 finished with value: 0.8597978433223968 and parameters: {'learning_rate': 0.00022602735076646376, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 57 with params: {'learning_rate': 5.787570113829524e-06, 'weight_decay': 0.0, 'adam_beta1': 0.93, 'lambda_param': 1.0, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2402,1.142492,0.2196,0.219743,0.2196,0.192182
2,1.1393,1.24001,0.2782,0.286726,0.2782,0.250406
3,1.0806,1.269627,0.3208,0.315597,0.3208,0.30457
4,1.0476,1.304913,0.3335,0.33258,0.3335,0.317021


[I 2025-01-12 03:19:35,934] Trial 57 pruned. 


Trial 58 with params: {'learning_rate': 1.231563481729302e-06, 'weight_decay': 0.004, 'adam_beta1': 0.98, 'lambda_param': 0.1, 'temperature': 3.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3446,2.17429,0.1056,0.078202,0.1056,0.032554
2,2.3153,2.149431,0.1432,0.105174,0.1432,0.089903


[I 2025-01-12 03:26:17,998] Trial 58 pruned. 


Trial 59 with params: {'learning_rate': 0.0004642383073750495, 'weight_decay': 0.001, 'adam_beta1': 0.92, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5691,1.233639,0.6543,0.651634,0.6543,0.647331
2,0.3358,1.280628,0.7544,0.758939,0.7544,0.754273
3,0.2522,1.385224,0.7938,0.802305,0.7938,0.791473
4,0.2064,1.434259,0.8149,0.81882,0.8149,0.813452
5,0.1717,1.402337,0.8278,0.842897,0.8278,0.830677
6,0.1429,1.446408,0.8235,0.843246,0.8235,0.824278
7,0.1208,1.444575,0.8486,0.852122,0.8486,0.847699
8,0.1002,1.398068,0.8411,0.845437,0.8411,0.839439


[I 2025-01-12 03:53:04,315] Trial 59 pruned. 


Trial 60 with params: {'learning_rate': 2.0114989721951363e-06, 'weight_decay': 0.01, 'adam_beta1': 0.92, 'lambda_param': 0.5, 'temperature': 2.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.9738,1.818494,0.1301,0.159924,0.1301,0.077898
2,1.9324,1.799802,0.1788,0.169433,0.1788,0.126256
3,1.8824,1.786375,0.2096,0.208945,0.2096,0.169384
4,1.8157,1.767383,0.2496,0.239191,0.2496,0.206236
5,1.7505,1.763251,0.2613,0.263678,0.2613,0.223109
6,1.7008,1.768387,0.2856,0.284008,0.2856,0.255128
7,1.672,1.774733,0.3034,0.300129,0.3034,0.279214
8,1.6555,1.788482,0.3052,0.305785,0.3052,0.281433


[I 2025-01-12 04:19:50,784] Trial 60 pruned. 


Trial 61 with params: {'learning_rate': 0.00020547175772221783, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 0.8, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8841,1.277411,0.5992,0.60032,0.5992,0.59339
2,0.5432,1.373044,0.7239,0.729369,0.7239,0.722451
3,0.4132,1.306624,0.7748,0.778992,0.7748,0.773278
4,0.3326,1.357619,0.8089,0.81136,0.8089,0.807764
5,0.2745,1.344297,0.8151,0.824061,0.8151,0.816564
6,0.2294,1.419493,0.7948,0.829683,0.7948,0.796912
7,0.1939,1.429683,0.8264,0.831241,0.8264,0.824633
8,0.1633,1.423889,0.8263,0.830176,0.8263,0.824704
9,0.1384,1.383594,0.8427,0.846476,0.8427,0.843471
10,0.1211,1.386492,0.8437,0.849682,0.8437,0.844546


[I 2025-01-12 05:10:05,955] Trial 61 finished with value: 0.8522741761332213 and parameters: {'learning_rate': 0.00020547175772221783, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 0.8, 'temperature': 6.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 62 with params: {'learning_rate': 0.0003484792854068305, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6093,1.277689,0.6398,0.63567,0.6398,0.631859
2,0.3471,1.315071,0.7595,0.76258,0.7595,0.75816
3,0.2554,1.34734,0.7817,0.794574,0.7817,0.779181
4,0.2057,1.439406,0.8199,0.823123,0.8199,0.818785
5,0.1714,1.440905,0.8356,0.844743,0.8356,0.837133
6,0.1407,1.457467,0.8242,0.84875,0.8242,0.82562
7,0.1177,1.440848,0.8428,0.846955,0.8428,0.841749
8,0.0965,1.484125,0.8531,0.854205,0.8531,0.851812
9,0.0811,1.515239,0.8565,0.859128,0.8565,0.856548
10,0.0678,1.486511,0.8646,0.867725,0.8646,0.864778


[I 2025-01-12 06:00:35,772] Trial 62 finished with value: 0.8670797648612293 and parameters: {'learning_rate': 0.0003484792854068305, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 6.5}. Best is trial 43 with value: 0.8858248729707295.


Trial 63 with params: {'learning_rate': 0.0004029175776852982, 'weight_decay': 0.003, 'adam_beta1': 0.91, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5714,1.220726,0.643,0.644368,0.643,0.638497
2,0.3338,1.327,0.7682,0.772393,0.7682,0.76805
3,0.2476,1.326104,0.7937,0.80051,0.7937,0.791482
4,0.2024,1.406905,0.8141,0.818642,0.8141,0.81253
5,0.1694,1.365042,0.8343,0.844979,0.8343,0.836435
6,0.1404,1.417827,0.822,0.844553,0.822,0.823055
7,0.1183,1.409877,0.8479,0.853095,0.8479,0.8472
8,0.0984,1.430831,0.8463,0.848275,0.8463,0.844538
9,0.0817,1.434605,0.8597,0.863071,0.8597,0.859821
10,0.0681,1.490489,0.8721,0.873145,0.8721,0.871998


[I 2025-01-12 06:52:12,583] Trial 63 finished with value: 0.8777763015151621 and parameters: {'learning_rate': 0.0004029175776852982, 'weight_decay': 0.003, 'adam_beta1': 0.91, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 64 with params: {'learning_rate': 6.864904704881755e-06, 'weight_decay': 0.002, 'adam_beta1': 0.99, 'lambda_param': 1.0, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.11,1.019336,0.2261,0.21996,0.2261,0.187894
2,1.0261,1.102088,0.2889,0.300043,0.2889,0.257785


[I 2025-01-12 06:59:07,496] Trial 64 pruned. 


Trial 65 with params: {'learning_rate': 0.0004363816115314198, 'weight_decay': 0.004, 'adam_beta1': 0.9400000000000001, 'lambda_param': 0.9, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6905,1.241322,0.6494,0.651085,0.6494,0.64741
2,0.4153,1.290467,0.7488,0.752865,0.7488,0.748572
3,0.3178,1.384224,0.7857,0.79175,0.7857,0.784493
4,0.2628,1.337967,0.8147,0.815933,0.8147,0.812813


[I 2025-01-12 07:12:52,478] Trial 65 pruned. 


Trial 66 with params: {'learning_rate': 0.00044959317353520316, 'weight_decay': 0.0, 'adam_beta1': 0.92, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5712,1.212543,0.6584,0.65005,0.6584,0.650909
2,0.3344,1.258147,0.7585,0.763974,0.7585,0.757953
3,0.2496,1.33466,0.7856,0.795855,0.7856,0.784262
4,0.2044,1.40945,0.819,0.819677,0.819,0.817068
5,0.1688,1.380033,0.8296,0.84267,0.8296,0.832439
6,0.1413,1.411902,0.8313,0.853928,0.8313,0.832874
7,0.1198,1.432328,0.8398,0.843922,0.8398,0.838293
8,0.1005,1.429365,0.8517,0.854593,0.8517,0.850597
9,0.0835,1.428589,0.8602,0.86296,0.8602,0.86022
10,0.0707,1.425039,0.862,0.86643,0.862,0.862153


[I 2025-01-12 08:03:52,185] Trial 66 finished with value: 0.8759907173919406 and parameters: {'learning_rate': 0.00044959317353520316, 'weight_decay': 0.0, 'adam_beta1': 0.92, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 67 with params: {'learning_rate': 0.0003956722053375569, 'weight_decay': 0.0, 'adam_beta1': 0.9400000000000001, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7151,1.270366,0.6136,0.622422,0.6136,0.606748
2,0.4317,1.279424,0.7529,0.756936,0.7529,0.7531
3,0.3241,1.377097,0.7965,0.80145,0.7965,0.79549
4,0.264,1.425211,0.8223,0.824222,0.8223,0.820914
5,0.2209,1.434808,0.8286,0.837403,0.8286,0.830154
6,0.1878,1.442715,0.827,0.848017,0.827,0.82889
7,0.1571,1.435834,0.84,0.845464,0.84,0.838882
8,0.1338,1.393293,0.8505,0.854355,0.8505,0.849698
9,0.1137,1.423122,0.8583,0.860655,0.8583,0.858624
10,0.0987,1.44241,0.8695,0.872345,0.8695,0.869997


[I 2025-01-12 08:54:50,257] Trial 67 finished with value: 0.8744338458613091 and parameters: {'learning_rate': 0.0003956722053375569, 'weight_decay': 0.0, 'adam_beta1': 0.9400000000000001, 'lambda_param': 0.9, 'temperature': 6.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 68 with params: {'learning_rate': 6.690323842125524e-05, 'weight_decay': 0.0, 'adam_beta1': 0.93, 'lambda_param': 0.9, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9521,1.142009,0.4485,0.442523,0.4485,0.431446
2,0.6838,1.211815,0.5746,0.575594,0.5746,0.570158
3,0.5462,1.244645,0.644,0.650213,0.644,0.639875
4,0.4585,1.324679,0.6858,0.684489,0.6858,0.683258
5,0.3985,1.285033,0.7065,0.716962,0.7065,0.707751
6,0.3494,1.330732,0.7179,0.733543,0.7179,0.716083
7,0.3081,1.377229,0.7324,0.742102,0.7324,0.732435
8,0.2739,1.410678,0.7343,0.741184,0.7343,0.729315


[I 2025-01-12 09:21:58,405] Trial 68 pruned. 


Trial 69 with params: {'learning_rate': 1.641098718467686e-05, 'weight_decay': 0.01, 'adam_beta1': 0.9400000000000001, 'lambda_param': 0.4, 'temperature': 4.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7646,1.632309,0.3224,0.322723,0.3224,0.310357
2,1.5373,1.576596,0.3956,0.39044,0.3956,0.385793
3,1.4255,1.539807,0.4339,0.433933,0.4339,0.426201
4,1.3366,1.525565,0.4772,0.470579,0.4772,0.467076
5,1.2749,1.508068,0.4846,0.481374,0.4846,0.473924
6,1.2192,1.495919,0.5107,0.506739,0.5107,0.499349
7,1.1733,1.467834,0.5264,0.526857,0.5264,0.52037
8,1.1304,1.475381,0.5386,0.530489,0.5386,0.527916


[I 2025-01-12 09:49:08,163] Trial 69 pruned. 


Trial 70 with params: {'learning_rate': 0.00030182759810243553, 'weight_decay': 0.0, 'adam_beta1': 0.9400000000000001, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.959,1.385059,0.6144,0.621084,0.6144,0.608788
2,0.5983,1.29928,0.7411,0.745286,0.7411,0.74016
3,0.4547,1.332471,0.7864,0.794503,0.7864,0.785328
4,0.3721,1.364244,0.8104,0.813837,0.8104,0.808832
5,0.3166,1.354055,0.8272,0.835394,0.8272,0.828371
6,0.2676,1.361334,0.8223,0.844657,0.8223,0.823464
7,0.2262,1.37748,0.8404,0.843523,0.8404,0.839257
8,0.1927,1.365576,0.8413,0.844824,0.8413,0.839655
9,0.1646,1.383029,0.8515,0.860234,0.8515,0.853449
10,0.1442,1.351121,0.855,0.860718,0.855,0.855991


[I 2025-01-12 10:40:10,311] Trial 70 finished with value: 0.8622894858963633 and parameters: {'learning_rate': 0.00030182759810243553, 'weight_decay': 0.0, 'adam_beta1': 0.9400000000000001, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}. Best is trial 43 with value: 0.8858248729707295.


Trial 71 with params: {'learning_rate': 2.0955490522832793e-06, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.93, 'lambda_param': 0.5, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.7202,1.558227,0.1349,0.244938,0.1349,0.082619
2,1.6821,1.537885,0.1822,0.170344,0.1822,0.135921


[I 2025-01-12 10:46:55,530] Trial 71 pruned. 


Trial 72 with params: {'learning_rate': 0.0001855972168731866, 'weight_decay': 0.0, 'adam_beta1': 0.9400000000000001, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6822,1.173651,0.5773,0.57139,0.5773,0.566525
2,0.4128,1.368703,0.7086,0.710703,0.7086,0.706417


[I 2025-01-12 10:53:42,560] Trial 72 pruned. 


Trial 73 with params: {'learning_rate': 0.00039266426177087855, 'weight_decay': 0.001, 'adam_beta1': 0.96, 'lambda_param': 0.8, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8108,1.296561,0.6309,0.622571,0.6309,0.623694
2,0.4956,1.286179,0.749,0.751653,0.749,0.747022
3,0.3768,1.331921,0.787,0.794385,0.787,0.786713
4,0.309,1.332732,0.813,0.816592,0.813,0.810671


[I 2025-01-12 11:07:13,505] Trial 73 pruned. 


Trial 74 with params: {'learning_rate': 0.0004635618184844074, 'weight_decay': 0.002, 'adam_beta1': 0.9, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5977,1.224767,0.6301,0.62189,0.6301,0.61773
2,0.356,1.363783,0.7474,0.750754,0.7474,0.746903
3,0.2671,1.392329,0.7843,0.791692,0.7843,0.784012
4,0.2196,1.445268,0.8108,0.815622,0.8108,0.809317


[I 2025-01-12 11:20:43,036] Trial 74 pruned. 


Trial 75 with params: {'learning_rate': 0.00017913124680222592, 'weight_decay': 0.001, 'adam_beta1': 0.93, 'lambda_param': 1.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6659,1.149636,0.5741,0.567465,0.5741,0.565611
2,0.3988,1.303488,0.7003,0.711014,0.7003,0.700918
3,0.2917,1.39273,0.7631,0.772425,0.7631,0.76285
4,0.2274,1.387176,0.804,0.80575,0.804,0.802087
5,0.1858,1.33994,0.8031,0.813083,0.8031,0.804849
6,0.1505,1.346369,0.8091,0.831412,0.8091,0.810704
7,0.1235,1.45635,0.8277,0.831613,0.8277,0.826534
8,0.1009,1.409286,0.8294,0.832729,0.8294,0.827832
9,0.0838,1.438518,0.836,0.840919,0.836,0.837246
10,0.0704,1.477541,0.8404,0.846457,0.8404,0.841503


[I 2025-01-12 12:11:24,492] Trial 75 finished with value: 0.8440720677795472 and parameters: {'learning_rate': 0.00017913124680222592, 'weight_decay': 0.001, 'adam_beta1': 0.93, 'lambda_param': 1.0, 'temperature': 7.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 76 with params: {'learning_rate': 0.0003417843930342611, 'weight_decay': 0.001, 'adam_beta1': 0.9400000000000001, 'lambda_param': 1.0, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.6463,1.268328,0.6487,0.646496,0.6487,0.640131
2,0.3657,1.421195,0.7601,0.769011,0.7601,0.759757
3,0.2706,1.456168,0.7791,0.790036,0.7791,0.777236
4,0.2207,1.502619,0.8125,0.814583,0.8125,0.811125


[I 2025-01-12 12:24:56,902] Trial 76 pruned. 


Trial 77 with params: {'learning_rate': 0.0003574884842049779, 'weight_decay': 0.007, 'adam_beta1': 0.91, 'lambda_param': 0.8, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8062,1.288148,0.6489,0.652379,0.6489,0.646976
2,0.4849,1.296116,0.7669,0.766325,0.7669,0.765782
3,0.3701,1.327177,0.7886,0.798807,0.7886,0.787572
4,0.3077,1.380444,0.8185,0.821533,0.8185,0.816768
5,0.2602,1.353925,0.832,0.84178,0.832,0.833907
6,0.219,1.362232,0.8272,0.848201,0.8272,0.828797
7,0.1877,1.382081,0.8515,0.856893,0.8515,0.850756
8,0.1612,1.38978,0.8489,0.851327,0.8489,0.847858
9,0.1381,1.387812,0.8584,0.860952,0.8584,0.858621
10,0.1207,1.393427,0.8578,0.864996,0.8578,0.859107


[I 2025-01-12 13:16:01,394] Trial 77 finished with value: 0.8714966360973179 and parameters: {'learning_rate': 0.0003574884842049779, 'weight_decay': 0.007, 'adam_beta1': 0.91, 'lambda_param': 0.8, 'temperature': 6.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 78 with params: {'learning_rate': 1.4109296750080319e-05, 'weight_decay': 0.009000000000000001, 'adam_beta1': 0.91, 'lambda_param': 0.0, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2372,1.823756,0.3122,0.314047,0.3122,0.299432
2,1.8749,1.642306,0.3825,0.38383,0.3825,0.373883


[I 2025-01-12 13:22:47,540] Trial 78 pruned. 


Trial 79 with params: {'learning_rate': 0.0003115299566108808, 'weight_decay': 0.0, 'adam_beta1': 0.92, 'lambda_param': 0.9, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7114,1.182421,0.6405,0.642176,0.6405,0.638043
2,0.4215,1.265371,0.7556,0.757166,0.7556,0.753627
3,0.3139,1.306841,0.792,0.799257,0.792,0.791819
4,0.2544,1.411318,0.8116,0.813346,0.8116,0.809821


[I 2025-01-12 13:36:19,634] Trial 79 pruned. 


Trial 80 with params: {'learning_rate': 0.0002364627605200556, 'weight_decay': 0.005, 'adam_beta1': 0.9, 'lambda_param': 0.8, 'temperature': 7.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8484,1.258717,0.6249,0.625333,0.6249,0.620909
2,0.5098,1.225958,0.749,0.752207,0.749,0.747513
3,0.3804,1.30163,0.7773,0.789177,0.7773,0.775583
4,0.3094,1.317922,0.8071,0.812966,0.8071,0.805362


[I 2025-01-12 13:49:52,609] Trial 80 pruned. 


Trial 81 with params: {'learning_rate': 0.00044262042328409995, 'weight_decay': 0.002, 'adam_beta1': 0.93, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9173,1.329443,0.6635,0.66381,0.6635,0.659913
2,0.5634,1.334371,0.7624,0.763062,0.7624,0.760846
3,0.4364,1.364579,0.7953,0.805183,0.7953,0.793917
4,0.3648,1.34758,0.8163,0.820467,0.8163,0.815071
5,0.3104,1.348466,0.8295,0.838251,0.8295,0.831725
6,0.2663,1.398615,0.8327,0.848839,0.8327,0.833378
7,0.2274,1.389033,0.8431,0.847162,0.8431,0.841353
8,0.196,1.380251,0.8535,0.857436,0.8535,0.852435
9,0.1681,1.39147,0.8626,0.865162,0.8626,0.863044
10,0.1462,1.360361,0.8689,0.872545,0.8689,0.869535


[I 2025-01-12 14:40:46,051] Trial 81 finished with value: 0.8743726935021687 and parameters: {'learning_rate': 0.00044262042328409995, 'weight_decay': 0.002, 'adam_beta1': 0.93, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}. Best is trial 43 with value: 0.8858248729707295.


Trial 82 with params: {'learning_rate': 0.00015353998736973145, 'weight_decay': 0.004, 'adam_beta1': 0.93, 'lambda_param': 0.8, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9739,1.312528,0.5498,0.55359,0.5498,0.545216
2,0.6156,1.36317,0.6976,0.698589,0.6976,0.696211


[I 2025-01-12 14:47:39,802] Trial 82 pruned. 


Trial 83 with params: {'learning_rate': 0.0001039223534291677, 'weight_decay': 0.004, 'adam_beta1': 0.92, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.7506,1.144625,0.5071,0.501807,0.5071,0.489401
2,0.4858,1.245931,0.6434,0.644518,0.6434,0.640951
3,0.3715,1.294773,0.7101,0.715046,0.7101,0.709105
4,0.3031,1.377183,0.7513,0.749328,0.7513,0.749196
5,0.2534,1.364346,0.7567,0.770094,0.7567,0.75981
6,0.2114,1.415021,0.7655,0.784047,0.7655,0.765305
7,0.1772,1.413678,0.7776,0.785061,0.7776,0.77517
8,0.1479,1.435403,0.7745,0.783802,0.7745,0.771914


[I 2025-01-12 15:14:48,210] Trial 83 pruned. 


Trial 84 with params: {'learning_rate': 0.00041171821898432144, 'weight_decay': 0.0, 'adam_beta1': 0.93, 'lambda_param': 1.0, 'temperature': 6.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5963,1.245507,0.6411,0.641344,0.6411,0.63836
2,0.3488,1.372714,0.7438,0.74726,0.7438,0.743226
3,0.2636,1.356272,0.7855,0.797706,0.7855,0.784463
4,0.2127,1.436002,0.8234,0.826196,0.8234,0.82259
5,0.1771,1.423725,0.8222,0.839238,0.8222,0.825595
6,0.1462,1.480301,0.8245,0.846825,0.8245,0.826766
7,0.1225,1.520857,0.8471,0.850082,0.8471,0.846109
8,0.1028,1.514488,0.8386,0.842734,0.8386,0.837175


[I 2025-01-12 15:42:24,057] Trial 84 pruned. 


Trial 85 with params: {'learning_rate': 0.0003920548272987626, 'weight_decay': 0.004, 'adam_beta1': 0.9500000000000001, 'lambda_param': 0.5, 'temperature': 5.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1331,1.340253,0.6338,0.634431,0.6338,0.631451
2,0.7212,1.286713,0.744,0.744377,0.744,0.743108


[I 2025-01-12 15:49:09,124] Trial 85 pruned. 


Trial 86 with params: {'learning_rate': 1.3242306956820676e-06, 'weight_decay': 0.005, 'adam_beta1': 0.96, 'lambda_param': 0.7000000000000001, 'temperature': 5.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4943,1.337988,0.1096,0.124339,0.1096,0.038882
2,1.4687,1.327524,0.1505,0.117016,0.1505,0.099355


[I 2025-01-12 15:55:54,695] Trial 86 pruned. 


Trial 87 with params: {'learning_rate': 0.0004848843553616508, 'weight_decay': 0.001, 'adam_beta1': 0.93, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.8837,1.347309,0.6713,0.671428,0.6713,0.66856
2,0.5458,1.346245,0.7659,0.768075,0.7659,0.765547
3,0.4267,1.335642,0.7818,0.791926,0.7818,0.780476
4,0.3635,1.294378,0.8236,0.823784,0.8236,0.822226
5,0.3122,1.309952,0.8296,0.838509,0.8296,0.831436
6,0.2664,1.321174,0.8356,0.850761,0.8356,0.83639
7,0.2316,1.306218,0.8505,0.854228,0.8505,0.850018
8,0.1993,1.303053,0.8488,0.851771,0.8488,0.84707
9,0.1721,1.322272,0.8642,0.868085,0.8642,0.86506
10,0.1498,1.310831,0.8668,0.869428,0.8668,0.866835


[I 2025-01-12 16:47:42,711] Trial 87 finished with value: 0.8703901422924485 and parameters: {'learning_rate': 0.0004848843553616508, 'weight_decay': 0.001, 'adam_beta1': 0.93, 'lambda_param': 0.7000000000000001, 'temperature': 6.0}. Best is trial 43 with value: 0.8858248729707295.


Trial 88 with params: {'learning_rate': 0.0004644055221486097, 'weight_decay': 0.004, 'adam_beta1': 0.92, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}


Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9563,1.44413,0.6611,0.659136,0.6611,0.65815
2,0.5875,1.437727,0.7617,0.759724,0.7617,0.758886
3,0.4563,1.454561,0.7875,0.797594,0.7875,0.787066
4,0.3805,1.399281,0.8132,0.815237,0.8132,0.811384
5,0.3243,1.422024,0.8306,0.840237,0.8306,0.832724
6,0.276,1.44269,0.8219,0.841227,0.8219,0.822599
7,0.2398,1.463379,0.8464,0.851225,0.8464,0.845863
8,0.2051,1.438038,0.8492,0.852037,0.8492,0.848075
9,0.1755,1.440405,0.867,0.86881,0.867,0.867352
10,0.1526,1.462412,0.8654,0.867467,0.8654,0.865249


[I 2025-01-12 17:38:25,204] Trial 88 finished with value: 0.8735363122179471 and parameters: {'learning_rate': 0.0004644055221486097, 'weight_decay': 0.004, 'adam_beta1': 0.92, 'lambda_param': 0.7000000000000001, 'temperature': 4.5}. Best is trial 43 with value: 0.8858248729707295.


Trial 89 with params: {'learning_rate': 4.5985694990966176e-06, 'weight_decay': 0.003, 'adam_beta1': 0.9, 'lambda_param': 0.5, 'temperature': 2.5}


Epoch,Training Loss,Validation Loss


[W 2025-01-12 17:40:41,577] Trial 89 failed with parameters: {'learning_rate': 4.5985694990966176e-06, 'weight_decay': 0.003, 'adam_beta1': 0.9, 'lambda_param': 0.5, 'temperature': 2.5} because of the following error: KeyboardInterrupt().
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/optuna/study/_optimize.py", line 197, in _run_trial
    value_or_values = func(trial)
  File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/integration_utils.py", line 250, in _objective
    trainer.train(resume_from_checkpoint=checkpoint, trial=trial)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2171, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2536, in _inner_training_loop
    and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
KeyboardInterrupt
[W 2025-01-12 17:40:41,578] Trial 89 failed with value None.


KeyboardInterrupt: 